NumPy Basics for Data Science

📘 Python for Data Science 👁 114 views 📅 Nov 14, 2025

⏱ Estimated reading time: 2 min

NumPy (Numerical Python) is the most fundamental Python library for scientific computing and Data Science. It provides fast, efficient operations on arrays, matrices, and numerical data. Almost every Data Science and Machine Learning library (Pandas, Scikit-Learn, TensorFlow, PyTorch) is built on top of NumPy.

Why NumPy Is Important in Data Science?

Fast Computation – much faster than Python lists
Efficient Memory Usage
Supports Vectorized Operations (no loops needed)
Foundation for Pandas, ML, Deep Learning
Easy mathematical and statistical operations

1. Creating NumPy Arrays


import numpy as np

arr = np.array([1, 2, 3, 4])

Multi-dimensional Array


matrix = np.array([[1, 2], [3, 4]])

2. NumPy Array Attributes


arr.ndim      # Number of dimensions
arr.shape     # Shape (rows, columns)
arr.size      # Total elements
arr.dtype     # Data type

3. Array Initialization Methods


np.zeros((3,3))           # 3x3 matrix of zeros
np.ones((2,2))            # 2x2 matrix of ones
np.arange(1, 10, 2)        # Range array: 1,3,5,7,9
np.linspace(0, 1, 5)       # Equal spaced values from 0 to 1

4. Indexing and Slicing

Indexing


arr[0]      # First element
matrix[1,1] # Row 2, Column 2

Slicing


arr[1:4]         # 2nd to 4th element
matrix[:, 0]     # All rows, first column

5. Vectorized Operations (Very Important)

NumPy performs operations on arrays without loops.


arr = np.array([1, 2, 3, 4])

arr + 5       # adds 5 to every element
arr * 2       # multiplies each element
arr ** 2      # squares each element

6. Mathematical Functions


np.sum(arr)
np.mean(arr)
np.max(arr)
np.min(arr)
np.std(arr)
np.sqrt(arr)

7. Array Reshaping


arr = np.arange(12)
arr.reshape(3, 4)        # Convert to 3x4 matrix

Flattening:


arr.flatten()

8. Joining and Splitting Arrays


np.concatenate([arr1, arr2])
np.vstack((arr1, arr2))    # Vertical stack
np.hstack((arr1, arr2))    # Horizontal stack

9. NumPy with Real Data (Data Science Use)

Reading CSV file:


data = np.genfromtxt('data.csv', delimiter=',')

Handling Missing Values:


np.nanmean(data)
np.nan_to_num(data)

Normalization:


normalized = (data - np.min(data)) / (np.max(data) - np.min(data))

10. NumPy in Machine Learning

NumPy is used in ML for:

✔ Feature scaling
✔ Distance measurement
✔ Matrix multiplication
✔ Loss functions
✔ Gradient descent
✔ Vectorized model predictions

Example: Dot Product (very important!)


np.dot(vector1, vector2)

Matrix multiplication:


np.matmul(A, B)

Summary

NumPy is the foundation of Data Science in Python:

Feature	Why Important
Fast arrays	Much faster than lists
Vectorization	Removes loops
Matrix operations	Core of ML & AI
Broadcasting	Operates on different shapes
Integration	Works with Pandas, ML, AI libraries

🔒 Some advanced sections are available for Registered Members
Register Now

← Previous

File Handling in Python

Share this Post

🚀 Want to Test Your Knowledge?

Take quizzes related to this topic and see where you stand!

Start Quiz Now

← Back to Tutorials

Python for Data Science Tutorials