Module 2 — Python for Machine Learning beginner 20 min
Lists & Dictionaries
Why Data Structures Matter in ML
In machine learning you constantly handle collections of data: a list of training images, a set of labels, a dictionary of model hyperparameters. Python’s built-in data structures make this easy.
Lists [] — Ordered Collections
A list is an ordered, changeable collection of items. Items can be of any type.
# Creating lists
scores = [0.92, 0.87, 0.95, 0.91, 0.88]
labels = ["cat", "dog", "bird", "cat", "dog"]
pixel_values = [128, 255, 0, 64, 200]
# Mixed types (valid, but uncommon in ML)
mixed = [1, "hello", True, 3.14]
print(scores) # [0.92, 0.87, 0.95, 0.91, 0.88]
print(len(scores)) # 5
Accessing Items
Python uses zero-based indexing — the first item is at index 0.
fruits = ["apple", "banana", "cherry", "date"]
# index 0 index 1 index 2 index 3
print(fruits[0]) # apple
print(fruits[2]) # cherry
print(fruits[-1]) # date (negative = from the end)
print(fruits[-2]) # cherry
Slicing
Get a subset of a list with [start:end] (end is exclusive):
numbers = [10, 20, 30, 40, 50, 60, 70]
print(numbers[1:4]) # [20, 30, 40] — index 1 to 3
print(numbers[:3]) # [10, 20, 30] — first 3
print(numbers[4:]) # [50, 60, 70] — from index 4 onward
print(numbers[::2]) # [10, 30, 50, 70] — every 2nd item
Modifying Lists
model_scores = [0.80, 0.85, 0.88]
# Add to end
model_scores.append(0.91)
print(model_scores) # [0.80, 0.85, 0.88, 0.91]
# Remove
model_scores.remove(0.85)
print(model_scores) # [0.80, 0.88, 0.91]
# Sort
model_scores.sort()
print(model_scores) # [0.80, 0.88, 0.91]
# Sort descending
model_scores.sort(reverse=True)
print(model_scores) # [0.91, 0.88, 0.80]
Common List Operations in ML
training_losses = [1.2, 0.95, 0.78, 0.65, 0.52, 0.41]
print(f"First loss: {training_losses[0]}") # 1.2
print(f"Final loss: {training_losses[-1]}") # 0.41
print(f"Min loss: {min(training_losses)}") # 0.41
print(f"Max loss: {max(training_losses)}") # 1.2
print(f"Avg loss: {sum(training_losses)/len(training_losses):.3f}") # 0.752
# Check if loss improved
improved = training_losses[-1] < training_losses[0]
print(f"Model improved: {improved}") # True
Dictionaries {} — Key-Value Pairs
A dictionary maps unique keys to values. Think of a real dictionary: you look up a word (key) to find its definition (value).
# Creating a dictionary
model_config = {
"architecture": "ResNet-50",
"learning_rate": 0.001,
"batch_size": 32,
"num_epochs": 100,
"dropout": 0.5,
}
print(model_config["architecture"]) # ResNet-50
print(model_config["learning_rate"]) # 0.001
Accessing & Modifying
hyperparams = {
"lr": 0.01,
"momentum": 0.9,
"weight_decay": 1e-4,
}
# Access
print(hyperparams["lr"]) # 0.01
# Safer access (doesn't crash if key missing)
val = hyperparams.get("dropout", 0.0) # returns 0.0 if key not found
print(val) # 0.0
# Add / update
hyperparams["dropout"] = 0.3
hyperparams["lr"] = 0.005 # update existing key
# Delete
del hyperparams["momentum"]
print(hyperparams)
# {'lr': 0.005, 'weight_decay': 0.0001, 'dropout': 0.3}
Iterating Over a Dictionary
metrics = {
"accuracy": 0.94,
"precision": 0.92,
"recall": 0.95,
"f1_score": 0.935,
}
# Keys only
for key in metrics:
print(key)
# Values only
for val in metrics.values():
print(f"{val:.2f}")
# Both key and value
for key, val in metrics.items():
print(f"{key}: {val:.3f}")
# accuracy: 0.940
# precision: 0.920
# recall: 0.950
# f1_score: 0.935
Combining Lists and Dictionaries
In ML, you often combine both — for example, a list of dicts:
# Dataset as list of dicts
dataset = [
{"image": "img_001.jpg", "label": "cat", "confidence": 0.98},
{"image": "img_002.jpg", "label": "dog", "confidence": 0.87},
{"image": "img_003.jpg", "label": "cat", "confidence": 0.91},
]
# Access
print(dataset[0]["label"]) # cat
print(dataset[1]["confidence"]) # 0.87
# Filter
cats = [item for item in dataset if item["label"] == "cat"]
print(len(cats)) # 2
List Comprehensions (Powerful Shortcut)
# The long way
squared = []
for x in [1, 2, 3, 4, 5]:
squared.append(x ** 2)
# The list comprehension way
squared = [x ** 2 for x in [1, 2, 3, 4, 5]]
print(squared) # [1, 4, 9, 16, 25]
# With condition
losses = [1.2, 0.8, 2.1, 0.5, 1.8]
high_losses = [l for l in losses if l > 1.0]
print(high_losses) # [1.2, 2.1, 1.8]
# Normalize list of values to 0-1 range
raw = [10, 20, 30, 40, 50]
max_val = max(raw)
normalized = [x / max_val for x in raw]
print(normalized) # [0.2, 0.4, 0.6, 0.8, 1.0]
Summary
| Structure | Syntax | Key trait | ML Use |
|---|---|---|---|
| List | [1, 2, 3] | Ordered, indexed | Dataset samples, losses |
| Dict | {"key": val} | Key-value mapping | Configs, metrics |
You have a list `losses = [0.9, 0.7, 0.5, 0.3]`. What does `losses[-1]` return?