Short Notes on Pandas(Part-2/2)| Snehit Vaddi
Welcome to the second part of our Basics of Pandas series. In the first part, we got introduced to Pandas, what it is and its various basic concepts like, creation of series, DataFrames in different methods, indexing, indexers, Ufuncs, and handling of missing data. In this blog, we are going to learn about slightly advanced, but definitely some of the most used concepts of Pandas. So, without any further adieu, let’s begin!
Btw, if you are looking for a Personal or Academic project, ping me. I have a bunch to guide you. Ping me @ v.snehith999@gmail.com 📚
Concatenation:
Let's start booming by importing pandas
and NumPy
libraries!
The concatenation of Panda’s series and DataFrame objects is similar to the concatenation of NumPy arrays, which can be done via np.concatenate function. Let’s have a quick look at NumPy array concatenation.
Please note that here the concatenation happened column-wise by default, since no axis attribute was specified.
Pandas use a function, pd.concat() which acts similar to np.concatenate() with more number of options. The function pd.concat() works for both Series and DataFrame objects.
Concatenation of Pandas Series and DataFrame:
Concatenation of higher-dimensional objects, such as DataFrames:
Pandas also provide flexibility to concatenate in specific axis. By default, the concatenation takes place row-wise within the DataFrame.
Here, the axis is assigned as 1 which represents row-wise concatenation. Similarly, we can concatenate column-wise by specifying the axis as zero (axis=0).
Sometimes, one may ignore the index and prefer a continuous integer index. This can be done by specifying the parameter ignore_index=True flag in concat() function.
Another option is to use the keys option to specify a label for the data sources. This creates a hierarchical index which basically means an index above other indexes.
Concatenation with joins:
In general, data from different sources can have different column names. The function pd.concat() offers several options to deal with such data.
The join parameter in pd.concat() function provides different ways to combine different DataFrames. By default, the join is a union of input columns (join = ‘outer’), but we can change the intersection of the columns using (join = ‘inner’).
https://jovian.ml/v-snehith999/basics-of-pandas-part-2/v/5&cellId=7
Pandas Merge() function:
Joining and merging are very essential functions in databases for data interaction. The pd.merge() function implements various different types of joins like one-to-one, many-to-one, and many-to-many joins.
Merging with specific keys:
Often, DataFrame column names will not match. In such a case, the function pd.merge() is really helpful and provides a variety of options.
Using on
Keyword:
While merging two different DataFrames with different columns, we explicitly specify the name of the column using the on keyword. The following example will make it easier to understand:
Note: While merging DataFrames using the on keyword, one should make sure that both left and right DataFrames have the same column name.
Let’s check what happens if we specify a column that is not common in DataFrames:
The left_on
and right_on
keywords:
In some cases, one may wish to merge two datasets having different column names but when their content is relatable, one can use the left_on and right_on keywords to specify the two-column names.
The left_index
and right_index
Keywords:
Sometimes, rather than merging on a column, one may like to merge on an index. In this case, the index can be used as the key of merging by specifying the left_index and/or right_index flags to True. For example, in such a case the data may look like this:
Simple Aggregation in Pandas :
Pandas series and DataFrames include many common aggregates like:
- count() # Total count of items
- first() # First item
- mean(), median() # Mean and Median
- min(), max() # Minimum and Maximum
- std(), var() # Standard deviation and variance
- prod() # Product of all items
- sum() # Sum of all items
Aggregation on Series:
Aggregation on DataFrames:
Pandas GroupBy function:
A groupby operation involves one of the following operations on the data objects:
- Split
- Apply
- Combine
- The Split step involves splitting data into sets depending on the value of the specified key.
- The Apply step involves operations like
Aggregation
,Transformation
,Filtration
. - The Combine step merges the results of these operations into an output array.
Instead of doing all the above steps manually, Pandas provides a function called groupby that can perform several operations like sum, mean, count, min, or other aggregates in a single step! Doesn’t that sound amazing?
Let’s import a Pandas DataFrame
form Seaborn
datasets to see an example of groupby()
function:
As we can see, no computation is done until we call some aggregate on the object. We got computed values only after specifying the aggregate function.
Also, it is interesting to note that the groupby function gives one the flexibility to provide either a single aggregate or a list of aggregate functions.
Transformations:
The Transformation function on a group or a column returns an object that is indexed as the same size that is being grouped. Thus, the transform should return a result that is the same size as that of a group chunk.
Conclusion:
So here we are at the end of the two-part Pandas Quick Notes series!
We have learned various basic concepts like Concatenation of series, DataFrames, joins, merge function, and its parameters, Aggregation, and GroupBy function. Get in touch and let me know what topics you think we should be discussing the next.
References:📗
- Complete Jupyter Notebook: https://jovian.ml/v-snehith999/basics-of-pandas-part-2
- Pandas official documentation: https://pandas.pydata.org/
- Python Data Science Handbook: https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html
- Python Pandas: https://www.tutorialspoint.com/python_pandas/python_pandas_groupby.htm
- Pandas Tutorial: https://www.geeksforgeeks.org/pandas-tutorial/
- Pandas — powerful Python data analysis toolkit: https://pandas.pydata.org/docs/pandas.pdf
Author:🤠
- Snehit Vaddi
I am a Machine Learning enthusiast. I teach machines how to see, listen, and learn.
Linkedin: https://www.linkedin.com/in/snehit-vaddi/
Github: https://github.com/snehitvaddi