Advanced Pandas Operations

πŸ“˜ Python for Data Science πŸ‘ 39 views πŸ“… Nov 14, 2025
⏱ Estimated reading time: 3 min

Β This covers multi-level indexing, window functions, advanced groupby, merges, reshaping, time-series, performance optimization, and more.


1. MultiIndex (Hierarchical Indexing)

Create MultiIndex

arrays = [ ['A', 'A', 'B', 'B'], [1, 2, 1, 2] ] idx = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Subgroup')) df = pd.DataFrame({'value': [10, 20, 30, 40]}, index=idx)

Select data using MultiIndex

df.loc['A'] df.loc[('A', 1)]

Reset index

df.reset_index(inplace=True)

2. Advanced GroupBy Techniques

Multiple aggregations

df.groupby('city').agg({'salary': ['mean', 'max'], 'age': 'median'})

GroupBy + apply (custom function)

df.groupby('city').apply(lambda x: x.nlargest(2, 'salary'))

Transform (broadcast results back)

df['age_mean'] = df.groupby('city')['age'].transform('mean')

Filter groups

df.groupby('city').filter(lambda x: len(x) > 3)

3. Window Functions (Rolling, Expanding, EWMA)

Rolling window

df['roll_mean'] = df['sales'].rolling(window=3).mean()

Expanding window

df['exp_sum'] = df['sales'].expanding().sum()

Exponential weighted moving average

df['ewm'] = df['sales'].ewm(alpha=0.3).mean()

4. Merge, Join, Concat (Advanced)

Merge with multiple keys

pd.merge(df1, df2, on=['id', 'date'], how='inner')

Join using index

df1.join(df2, lsuffix='_L', rsuffix='_R')

Concatenate vertically/horizontally

pd.concat([df1, df2], axis=0) pd.concat([df1, df2], axis=1)

Merge with indicator

pd.merge(df1, df2, on='id', how='outer', indicator=True)

5. Pivot Table & Crosstab

Pivot Table

df.pivot_table(values='sales', index='city', columns='year', aggfunc='sum')

Crosstab

pd.crosstab(df['gender'], df['city'])

6. Melting & Reshaping Data

Melt (wide β†’ long)

pd.melt(df, id_vars=['id'], value_vars=['math','science'])

Pivot (long β†’ wide)

df.pivot(index='id', columns='subject', values='marks')

7. Categorical Data (Fast & Memory-Efficient)

df['city'] = df['city'].astype('category')

Get categories

df['city'].cat.categories

Rename categories

df['city'].cat.rename_categories({'delhi':'Delhi'}, inplace=True)

8. Advanced String Operations

df['name'].str.contains('Ram') df['email'].str.split('@').str[1] df['phone'].str.replace('-', '', regex=False) df['address'].str.extract(r'(\d{6})')

9. Time-Series Operations

Convert to datetime

df['date'] = pd.to_datetime(df['date'])

Set index for time-series

df.set_index('date', inplace=True)

Resampling

df.resample('M').sum() # monthly

Shifting time

df['prev_sales'] = df['sales'].shift(1)

Rolling window on time

df['7d_avg'] = df['sales'].rolling('7D').mean()

10. Query API (SQL-like Filtering)

df.query('age > 30 and city == "delhi"')

11. Eval (Fast Arithmetic Expressions)

df.eval('total = quantity * price', inplace=True)

12. Performance Optimization Techniques

Use efficient dtypes

df['age'] = df['age'].astype('int16')

Use vectorized operations (avoid loops)

df['double'] = df['value'] * 2

Use .loc instead of chained indexing

Avoid apply()β€”use vectorized operations

Use df.itertuples() (faster than iterrows)


Summary Table

FeatureWhat It Solves
MultiIndexMulti-level data
groupby-applyCustom group logic
Window functionsRolling statistics
Merge/JoinCombining datasets
Pivot/MeltReshaping data
Categorical dtypeReduces memory
ResampleTime-series
Query/EvalSQL-like speed
OptimizationHigh performance

πŸ”’ Some advanced sections are available for Registered Members
Register Now

Share this Post


← Back to Tutorials

Popular Competitive Exam Quizzes