Data Cleaning with Pandas

πŸ“˜ Python for Data Science πŸ‘ 45 views πŸ“… Nov 14, 2025
⏱ Estimated reading time: 2 min

Data cleaning is the process of fixing or removing incorrect, incomplete, or duplicate data before analysis or machine learning.


1. Import Pandas

import pandas as pd

2. Load Data

df = pd.read_csv("data.csv")

Common Data Cleaning Tasks


3. Check Data Overview

df.head() df.info() df.describe() df.shape

4. Handling Missing Values (NaN)

βœ” Check missing values

df.isnull().sum()

βœ” Remove rows with missing values

df.dropna(inplace=True)

βœ” Fill missing values

df['age'].fillna(df['age'].mean(), inplace=True) # numerical df['city'].fillna(df['city'].mode()[0], inplace=True) # categorical

βœ” Replace missing values with custom value

df.fillna("Unknown", inplace=True)

5. Handling Duplicates

βœ” Find duplicates

df.duplicated().sum()

βœ” Remove duplicates

df.drop_duplicates(inplace=True)

6. Fixing Incorrect Data

βœ” Replace wrong values

df['gender'].replace({'M':'Male', 'F':'Female'}, inplace=True)

βœ” Correct text cases

df['city'] = df['city'].str.title()

βœ” Remove extra spaces

df['name'] = df['name'].str.strip()

7. Handling Outliers

βœ” Using IQR

Q1 = df['price'].quantile(0.25) Q3 = df['price'].quantile(0.75) IQR = Q3 - Q1 df = df[(df['price'] >= Q1 - 1.5*IQR) & (df['price'] <= Q3 + 1.5*IQR)]

βœ” Capping outliers

df['price'] = df['price'].clip(lower=df['price'].quantile(0.05), upper=df['price'].quantile(0.95))

8. Converting Data Types

βœ” Check data types

df.dtypes

βœ” Convert column type

df['age'] = df['age'].astype(int) df['amount'] = df['amount'].astype(float) df['date'] = pd.to_datetime(df['date'])

9. Standardizing Text

df['product'] = df['product'].str.lower() df['phone'] = df['phone'].str.replace('-', '')

10. Renaming Columns

df.rename(columns={'oldName':'newName'}, inplace=True)

11. Handling Inconsistent Categories

Example: β€œDelhi”, β€œdelhi β€œ, β€œDELHI”

df['city'] = df['city'].str.strip().str.lower()

12. Dropping Unwanted Columns

df.drop(['temp_column', 'unnecessary'], axis=1, inplace=True)

13. Replace Null-like strings ("N/A", "-", "none")

df.replace(['N/A', 'NA', '-', 'None'], pd.NA, inplace=True)

Final Data Cleaning Workflow Example

df = pd.read_csv("data.csv") # Missing Values df['age'].fillna(df['age'].mean(), inplace=True) # Remove Duplicates df.drop_duplicates(inplace=True) # Fix Text df['name'] = df['name'].str.strip().str.title() # Correct Data Types df['date'] = pd.to_datetime(df['date']) # Fix categories df['city'] = df['city'].str.strip().str.lower()

πŸ”’ Some advanced sections are available for Registered Members
Register Now

Share this Post


← Back to Tutorials

Popular Competitive Exam Quizzes