Module 2 — Python for Machine Learning beginner 25 min
NumPy Basics
What is NumPy?
NumPy (Numerical Python) is the core library for numerical computing in Python. It provides:
- A powerful N-dimensional array (
ndarray) object - Vectorized math — operations on entire arrays without loops
- Broadcasting — smart arithmetic between different-shaped arrays
- The foundation that Pandas, scikit-learn, and PyTorch build on
import numpy as np # convention: always alias as np
Creating Arrays
import numpy as np
# From a Python list
arr = np.array([1, 2, 3, 4, 5])
print(arr) # [1 2 3 4 5]
print(type(arr)) # <class 'numpy.ndarray'>
# 2D array (matrix)
matrix = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
print(matrix.shape) # (3, 3) — 3 rows, 3 columns
Special Arrays
# Zeros
np.zeros((3, 4)) # 3×4 array of 0.0
# Ones
np.ones((2, 5)) # 2×5 array of 1.0
# Range
np.arange(0, 10, 2) # [0 2 4 6 8] (start, stop, step)
# Evenly spaced
np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]
# Random values between 0 and 1
np.random.rand(3, 3) # 3×3 matrix of random floats
# Random integers
np.random.randint(0, 10, size=(4, 4)) # 4×4 matrix of random ints 0-9
# Identity matrix (diagonal of 1s)
np.eye(3)
# [[1. 0. 0.]
# [0. 1. 0.]
# [0. 0. 1.]]
Array Properties
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # (2, 3) — 2 rows, 3 cols
print(arr.ndim) # 2 — number of dimensions
print(arr.size) # 6 — total elements
print(arr.dtype) # int64 — data type
Vectorized Math
The biggest advantage of NumPy: no loops needed.
a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])
# Element-wise operations
print(a + b) # [11 22 33 44 55]
print(a * b) # [ 10 40 90 160 250]
print(b / a) # [10. 10. 10. 10. 10.]
print(a ** 2) # [ 1 4 9 16 25]
# Scalar operations
print(a * 3) # [ 3 6 9 12 15]
print(a + 100) # [101 102 103 104 105]
Comparison: Python loop vs NumPy
import time
import numpy as np
large = np.random.rand(1_000_000)
# Python loop way
start = time.time()
result = [x ** 2 for x in large]
print(f"Python loop: {time.time()-start:.3f}s") # ~0.15s
# NumPy way
start = time.time()
result = large ** 2
print(f"NumPy: {time.time()-start:.3f}s") # ~0.002s — 75× faster!
NumPy is written in C under the hood — that’s why it’s so fast.
Aggregation Functions
data = np.array([23, 45, 12, 67, 34, 89, 56])
print(np.sum(data)) # 326
print(np.mean(data)) # 46.57
print(np.median(data)) # 45.0
print(np.std(data)) # 25.26 (standard deviation)
print(np.min(data)) # 12
print(np.max(data)) # 89
print(np.argmax(data)) # 5 (index of the maximum)
For 2D arrays, use axis to aggregate over rows or columns:
m = np.array([[1, 2, 3],
[4, 5, 6]])
print(np.sum(m, axis=0)) # [5 7 9] — sum each column
print(np.sum(m, axis=1)) # [ 6 15] — sum each row
Indexing and Slicing
arr = np.array([10, 20, 30, 40, 50])
print(arr[0]) # 10
print(arr[-1]) # 50
print(arr[1:4]) # [20 30 40]
print(arr[::2]) # [10 30 50]
2D Indexing
img = np.array([
[255, 128, 0],
[64, 32, 16],
[200, 100, 50],
])
print(img[0, 0]) # 255 — row 0, col 0
print(img[1, 2]) # 16 — row 1, col 2
print(img[:, 0]) # [255 64 200] — all rows, col 0
print(img[0, :]) # [255 128 0] — row 0, all cols
Boolean Indexing
scores = np.array([78, 92, 55, 88, 43, 96, 71])
# Which scores pass (≥70)?
mask = scores >= 70
print(mask) # [ True True False True False True True]
passing = scores[mask]
print(passing) # [78 92 88 96 71]
# One-liner version
high = scores[scores >= 90]
print(high) # [92 96]
Reshaping Arrays
In ML, you frequently need to reshape data:
# A flat vector of 784 pixel values → 28×28 image
flat_image = np.random.randint(0, 256, size=784)
image_2d = flat_image.reshape(28, 28)
print(image_2d.shape) # (28, 28)
# Batch of 100 images (28×28) → flat
batch = np.random.rand(100, 28, 28)
flat_batch = batch.reshape(100, -1) # -1 means "figure it out"
print(flat_batch.shape) # (100, 784)
Summary
| Operation | Example | Result |
|---|---|---|
| Create | np.array([1,2,3]) | 1D array |
| Shape | arr.shape | (3,) |
| Add | a + b | element-wise |
| Mean | np.mean(arr) | scalar |
| Reshape | arr.reshape(2,3) | new shape |
| Boolean filter | arr[arr > 5] | filtered array |
You have a NumPy array `a = np.array([10, 20, 30, 40, 50])`. What does `a[a > 25]` return?