Python for Data Analysis

📘 Python 👁 71 views 📅 Nov 05, 2025
⏱ Estimated reading time: 2 min

Python for Data Analysis

Python is widely used for data analysis, thanks to its rich ecosystem of libraries for data manipulation, visualization, and statistical analysis.


1. Key Libraries for Data Analysis

LibraryPurposeExample
NumPyNumerical computing, arrays, matricesnp.array([1,2,3])
PandasData manipulation, DataFrames, CSV/Excel I/Opd.read_csv('data.csv')
MatplotlibData visualization, plots, chartsplt.plot(x, y)
SeabornStatistical visualization, heatmapssns.heatmap(df.corr())
SciPyScientific computing, statisticsscipy.stats.ttest_ind(a,b)
StatsmodelsStatistical modeling, regression analysissm.OLS(y, X).fit()
OpenPyXL / xlrdExcel file handlingpd.read_excel('file.xlsx')
SQLAlchemyDatabase connectivity and queriesengine = create_engine(db_url)

2. Loading and Inspecting Data

import pandas as pd # Load CSV file df = pd.read_csv('data.csv') # Inspect top rows print(df.head()) # Check columns and data types print(df.info()) # Basic statistics print(df.describe())

3. Data Cleaning and Preparation

# Handling missing values df.dropna(inplace=True) # remove rows with missing values df.fillna(0, inplace=True) # fill missing values # Renaming columns df.rename(columns={'OldName':'NewName'}, inplace=True) # Filtering data filtered = df[df['Age'] > 30] # Creating new columns df['Salary_in_k'] = df['Salary'] / 1000

4. Data Aggregation and Grouping

# Group by a column grouped = df.groupby('Department')['Salary'].mean() print(grouped) # Pivot table pivot = df.pivot_table(values='Salary', index='Department', columns='Gender', aggfunc='mean') print(pivot)

5. Data Visualization

import matplotlib.pyplot as plt import seaborn as sns # Line plot plt.plot(df['Year'], df['Revenue']) plt.show() # Histogram plt.hist(df['Age'], bins=10) plt.show() # Boxplot sns.boxplot(x='Department', y='Salary', data=df) plt.show() # Heatmap sns.heatmap(df.corr(), annot=True) plt.show()

6. Exporting Data

# Export to CSV df.to_csv('cleaned_data.csv', index=False) # Export to Excel df.to_excel('cleaned_data.xlsx', index=False)

7. Key Points

  • Python is powerful for data cleaning, exploration, and visualization.

  • Pandas is the core library for tabular data.

  • NumPy handles numerical operations efficiently.

  • Matplotlib and Seaborn are essential for plotting and visualization.

  • Combining statistical libraries like SciPy and Statsmodels enables advanced analysis.

  • Python integrates well with databases, Excel, and big data tools.


🔒 Some advanced sections are available for Registered Members
Register Now

Share this Post


← Back to Tutorials

Popular Competitive Exam Quizzes