NumPy Basics for Data Science

📘 Python for Data Science 👁 58 views 📅 Nov 14, 2025
⏱ Estimated reading time: 2 min

NumPy (Numerical Python) is the most fundamental Python library for scientific computing and Data Science. It provides fast, efficient operations on arrays, matrices, and numerical data. Almost every Data Science and Machine Learning library (Pandas, Scikit-Learn, TensorFlow, PyTorch) is built on top of NumPy.


Why NumPy Is Important in Data Science?

  1. Fast Computation – much faster than Python lists

  2. Efficient Memory Usage

  3. Supports Vectorized Operations (no loops needed)

  4. Foundation for Pandas, ML, Deep Learning

  5. Easy mathematical and statistical operations


1. Creating NumPy Arrays

import numpy as np arr = np.array([1, 2, 3, 4])

Multi-dimensional Array

matrix = np.array([[1, 2], [3, 4]])

2. NumPy Array Attributes

arr.ndim # Number of dimensions arr.shape # Shape (rows, columns) arr.size # Total elements arr.dtype # Data type

3. Array Initialization Methods

np.zeros((3,3)) # 3x3 matrix of zeros np.ones((2,2)) # 2x2 matrix of ones np.arange(1, 10, 2) # Range array: 1,3,5,7,9 np.linspace(0, 1, 5) # Equal spaced values from 0 to 1

4. Indexing and Slicing

Indexing

arr[0] # First element matrix[1,1] # Row 2, Column 2

Slicing

arr[1:4] # 2nd to 4th element matrix[:, 0] # All rows, first column

5. Vectorized Operations (Very Important)

NumPy performs operations on arrays without loops.

arr = np.array([1, 2, 3, 4]) arr + 5 # adds 5 to every element arr * 2 # multiplies each element arr ** 2 # squares each element

6. Mathematical Functions

np.sum(arr) np.mean(arr) np.max(arr) np.min(arr) np.std(arr) np.sqrt(arr)

7. Array Reshaping

arr = np.arange(12) arr.reshape(3, 4) # Convert to 3x4 matrix

Flattening:

arr.flatten()

8. Joining and Splitting Arrays

np.concatenate([arr1, arr2]) np.vstack((arr1, arr2)) # Vertical stack np.hstack((arr1, arr2)) # Horizontal stack

9. NumPy with Real Data (Data Science Use)

Reading CSV file:

data = np.genfromtxt('data.csv', delimiter=',')

Handling Missing Values:

np.nanmean(data) np.nan_to_num(data)

Normalization:

normalized = (data - np.min(data)) / (np.max(data) - np.min(data))

10. NumPy in Machine Learning

NumPy is used in ML for:

✔ Feature scaling
✔ Distance measurement
✔ Matrix multiplication
✔ Loss functions
✔ Gradient descent
✔ Vectorized model predictions

Example: Dot Product (very important!)

np.dot(vector1, vector2)

Matrix multiplication:

np.matmul(A, B)

Summary

NumPy is the foundation of Data Science in Python:

FeatureWhy Important
Fast arraysMuch faster than lists
VectorizationRemoves loops
Matrix operationsCore of ML & AI
BroadcastingOperates on different shapes
IntegrationWorks with Pandas, ML, AI libraries

🔒 Some advanced sections are available for Registered Members
Register Now

Share this Post


← Back to Tutorials

Popular Competitive Exam Quizzes