You're offline — showing cached content
Module 2 — Python for Machine Learning beginner 25 min

NumPy Basics

What is NumPy?

NumPy (Numerical Python) is the core library for numerical computing in Python. It provides:

  • A powerful N-dimensional array (ndarray) object
  • Vectorized math — operations on entire arrays without loops
  • Broadcasting — smart arithmetic between different-shaped arrays
  • The foundation that Pandas, scikit-learn, and PyTorch build on
import numpy as np  # convention: always alias as np

Creating Arrays

import numpy as np

# From a Python list
arr = np.array([1, 2, 3, 4, 5])
print(arr)          # [1 2 3 4 5]
print(type(arr))    # <class 'numpy.ndarray'>

# 2D array (matrix)
matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
print(matrix.shape)  # (3, 3) — 3 rows, 3 columns

Special Arrays

# Zeros
np.zeros((3, 4))          # 3×4 array of 0.0

# Ones
np.ones((2, 5))           # 2×5 array of 1.0

# Range
np.arange(0, 10, 2)       # [0 2 4 6 8]  (start, stop, step)

# Evenly spaced
np.linspace(0, 1, 5)      # [0.   0.25 0.5  0.75 1.  ]

# Random values between 0 and 1
np.random.rand(3, 3)      # 3×3 matrix of random floats

# Random integers
np.random.randint(0, 10, size=(4, 4))  # 4×4 matrix of random ints 0-9

# Identity matrix (diagonal of 1s)
np.eye(3)
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

Array Properties

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape)    # (2, 3)  — 2 rows, 3 cols
print(arr.ndim)     # 2  — number of dimensions
print(arr.size)     # 6  — total elements
print(arr.dtype)    # int64  — data type

Vectorized Math

The biggest advantage of NumPy: no loops needed.

a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])

# Element-wise operations
print(a + b)      # [11 22 33 44 55]
print(a * b)      # [ 10  40  90 160 250]
print(b / a)      # [10. 10. 10. 10. 10.]
print(a ** 2)     # [ 1  4  9 16 25]

# Scalar operations
print(a * 3)      # [ 3  6  9 12 15]
print(a + 100)    # [101 102 103 104 105]

Comparison: Python loop vs NumPy

import time
import numpy as np

large = np.random.rand(1_000_000)

# Python loop way
start = time.time()
result = [x ** 2 for x in large]
print(f"Python loop: {time.time()-start:.3f}s")  # ~0.15s

# NumPy way
start = time.time()
result = large ** 2
print(f"NumPy: {time.time()-start:.3f}s")  # ~0.002s — 75× faster!

NumPy is written in C under the hood — that’s why it’s so fast.


Aggregation Functions

data = np.array([23, 45, 12, 67, 34, 89, 56])

print(np.sum(data))     # 326
print(np.mean(data))    # 46.57
print(np.median(data))  # 45.0
print(np.std(data))     # 25.26  (standard deviation)
print(np.min(data))     # 12
print(np.max(data))     # 89
print(np.argmax(data))  # 5  (index of the maximum)

For 2D arrays, use axis to aggregate over rows or columns:

m = np.array([[1, 2, 3],
              [4, 5, 6]])

print(np.sum(m, axis=0))  # [5 7 9]  — sum each column
print(np.sum(m, axis=1))  # [ 6 15]  — sum each row

Indexing and Slicing

arr = np.array([10, 20, 30, 40, 50])

print(arr[0])     # 10
print(arr[-1])    # 50
print(arr[1:4])   # [20 30 40]
print(arr[::2])   # [10 30 50]

2D Indexing

img = np.array([
    [255, 128, 0],
    [64,  32,  16],
    [200, 100, 50],
])

print(img[0, 0])    # 255  — row 0, col 0
print(img[1, 2])    # 16   — row 1, col 2
print(img[:, 0])    # [255  64 200]  — all rows, col 0
print(img[0, :])    # [255 128   0]  — row 0, all cols

Boolean Indexing

scores = np.array([78, 92, 55, 88, 43, 96, 71])

# Which scores pass (≥70)?
mask = scores >= 70
print(mask)           # [ True  True False  True False  True  True]

passing = scores[mask]
print(passing)        # [78 92 88 96 71]

# One-liner version
high = scores[scores >= 90]
print(high)           # [92 96]

Reshaping Arrays

In ML, you frequently need to reshape data:

# A flat vector of 784 pixel values → 28×28 image
flat_image = np.random.randint(0, 256, size=784)
image_2d = flat_image.reshape(28, 28)
print(image_2d.shape)  # (28, 28)

# Batch of 100 images (28×28) → flat
batch = np.random.rand(100, 28, 28)
flat_batch = batch.reshape(100, -1)   # -1 means "figure it out"
print(flat_batch.shape)  # (100, 784)

Summary

OperationExampleResult
Createnp.array([1,2,3])1D array
Shapearr.shape(3,)
Adda + belement-wise
Meannp.mean(arr)scalar
Reshapearr.reshape(2,3)new shape
Boolean filterarr[arr > 5]filtered array
Knowledge Check

You have a NumPy array `a = np.array([10, 20, 30, 40, 50])`. What does `a[a > 25]` return?