Aggregations and GroupBy in Python

📘 Python for Data Science 👁 121 views 📅 Nov 14, 2025

⏱ Estimated reading time: 2 min

In Python, especially in data science, the GroupBy operation in the Pandas library is one of the most powerful tools for data summarization and analysis. It allows you to split, apply, and combine data efficiently. Aggregation refers to computing summary statistics such as sum, mean, count, min, max, etc., on groups of data.

✔ 1. What is GroupBy?

The GroupBy operation involves three steps:

Split → Divide the dataset into groups based on one or more keys.
Apply → Perform a function (like mean, sum, count) on each group.
Combine → Merge the results back into a single output.

This is useful for analyzing large datasets category-wise.

✔ 2. Syntax of GroupBy in Pandas


df.groupby('column_name')

You can group by multiple columns:


df.groupby(['col1', 'col2'])

✔ 3. Aggregation Functions

Common aggregation functions:

sum() → Total of values
mean() → Average of values
count() → Number of entries
min() / max() → Minimum or maximum values
median() → Median
std() → Standard deviation

✔ 4. Basic Aggregation Example


df.groupby('department')['salary'].mean()

This calculates average salary per department.

✔ 5. Multiple Aggregations


df.groupby('department').agg({
    'salary': ['mean', 'max', 'min'],
    'age': 'median'
})

This applies different aggregations on different columns.

✔ 6. Using GroupBy with Custom Functions


df.groupby('department').apply(lambda x: x.nlargest(1, 'salary'))

This returns highest-paid employee from each department.

✔ 7. GroupBy with Transform

transform() returns the result with the same shape as the original dataset.


df['dept_avg_salary'] = df.groupby('department')['salary'].transform('mean')

This adds a new column showing each employee's department-wise average salary.

✔ 8. Filtering Groups

You can filter out groups based on a condition:


df.groupby('department').filter(lambda x: len(x) > 5)

This keeps only departments with more than 5 employees.

✔ 9. Real Example

Consider a dataset:

Name	Department	Salary
A	HR	40000
B	IT	60000
C	HR	50000

Grouping by department:


df.groupby('Department')['Salary'].sum()

Output:

Department	Salary
HR	90000
IT	60000

Conclusion

The GroupBy and Aggregation operations in Python are essential for any type of data analysis. They help in summarizing large datasets, computing category-wise statistics, filtering meaningful groups, and preparing data for further analysis or machine learning. These operations provide fast, flexible, and powerful ways to understand the structure and distribution of data.

🔒 Some advanced sections are available for Registered Members
Register Now

← Previous

Handling Missing Data in Python

Share this Post

🚀 Want to Test Your Knowledge?

Take quizzes related to this topic and see where you stand!

Start Quiz Now

← Back to Tutorials

Python for Data Science Tutorials