Aggregations and GroupBy in Python
⏱ Estimated reading time: 2 min
In Python, especially in data science, the GroupBy operation in the Pandas library is one of the most powerful tools for data summarization and analysis. It allows you to split, apply, and combine data efficiently. Aggregation refers to computing summary statistics such as sum, mean, count, min, max, etc., on groups of data.
✔ 1. What is GroupBy?
The GroupBy operation involves three steps:
-
Split → Divide the dataset into groups based on one or more keys.
-
Apply → Perform a function (like mean, sum, count) on each group.
-
Combine → Merge the results back into a single output.
This is useful for analyzing large datasets category-wise.
✔ 2. Syntax of GroupBy in Pandas
You can group by multiple columns:
✔ 3. Aggregation Functions
Common aggregation functions:
-
sum()→ Total of values -
mean()→ Average of values -
count()→ Number of entries -
min()/max()→ Minimum or maximum values -
median()→ Median -
std()→ Standard deviation
✔ 4. Basic Aggregation Example
This calculates average salary per department.
✔ 5. Multiple Aggregations
This applies different aggregations on different columns.
✔ 6. Using GroupBy with Custom Functions
This returns highest-paid employee from each department.
✔ 7. GroupBy with Transform
transform() returns the result with the same shape as the original dataset.
This adds a new column showing each employee's department-wise average salary.
✔ 8. Filtering Groups
You can filter out groups based on a condition:
This keeps only departments with more than 5 employees.
✔ 9. Real Example
Consider a dataset:
| Name | Department | Salary |
|---|---|---|
| A | HR | 40000 |
| B | IT | 60000 |
| C | HR | 50000 |
Grouping by department:
Output:
| Department | Salary |
|---|---|
| HR | 90000 |
| IT | 60000 |
Conclusion
The GroupBy and Aggregation operations in Python are essential for any type of data analysis. They help in summarizing large datasets, computing category-wise statistics, filtering meaningful groups, and preparing data for further analysis or machine learning. These operations provide fast, flexible, and powerful ways to understand the structure and distribution of data.
Register Now
Share this Post
← Back to Tutorials