Aggregations and GroupBy in Python

📘 Python for Data Science 👁 66 views 📅 Nov 14, 2025
⏱ Estimated reading time: 2 min

In Python, especially in data science, the GroupBy operation in the Pandas library is one of the most powerful tools for data summarization and analysis. It allows you to split, apply, and combine data efficiently. Aggregation refers to computing summary statistics such as sum, mean, count, min, max, etc., on groups of data.


1. What is GroupBy?

The GroupBy operation involves three steps:

  1. Split → Divide the dataset into groups based on one or more keys.

  2. Apply → Perform a function (like mean, sum, count) on each group.

  3. Combine → Merge the results back into a single output.

This is useful for analyzing large datasets category-wise.


2. Syntax of GroupBy in Pandas

df.groupby('column_name')

You can group by multiple columns:

df.groupby(['col1', 'col2'])

3. Aggregation Functions

Common aggregation functions:

  • sum() → Total of values

  • mean() → Average of values

  • count() → Number of entries

  • min() / max() → Minimum or maximum values

  • median() → Median

  • std() → Standard deviation


4. Basic Aggregation Example

df.groupby('department')['salary'].mean()

This calculates average salary per department.


5. Multiple Aggregations

df.groupby('department').agg({ 'salary': ['mean', 'max', 'min'], 'age': 'median' })

This applies different aggregations on different columns.


6. Using GroupBy with Custom Functions

df.groupby('department').apply(lambda x: x.nlargest(1, 'salary'))

This returns highest-paid employee from each department.


7. GroupBy with Transform

transform() returns the result with the same shape as the original dataset.

df['dept_avg_salary'] = df.groupby('department')['salary'].transform('mean')

This adds a new column showing each employee's department-wise average salary.


8. Filtering Groups

You can filter out groups based on a condition:

df.groupby('department').filter(lambda x: len(x) > 5)

This keeps only departments with more than 5 employees.


9. Real Example

Consider a dataset:

NameDepartmentSalary
AHR40000
BIT60000
CHR50000

Grouping by department:

df.groupby('Department')['Salary'].sum()

Output:

DepartmentSalary
HR90000
IT60000

Conclusion

The GroupBy and Aggregation operations in Python are essential for any type of data analysis. They help in summarizing large datasets, computing category-wise statistics, filtering meaningful groups, and preparing data for further analysis or machine learning. These operations provide fast, flexible, and powerful ways to understand the structure and distribution of data.


🔒 Some advanced sections are available for Registered Members
Register Now

Share this Post


← Back to Tutorials

Popular Competitive Exam Quizzes