Lists & Dictionaries

Why Data Structures Matter in ML

In machine learning you constantly handle collections of data: a list of training images, a set of labels, a dictionary of model hyperparameters. Python’s built-in data structures make this easy.

Lists `[]` — Ordered Collections

A list is an ordered, changeable collection of items. Items can be of any type.

# Creating lists
scores = [0.92, 0.87, 0.95, 0.91, 0.88]
labels = ["cat", "dog", "bird", "cat", "dog"]
pixel_values = [128, 255, 0, 64, 200]

# Mixed types (valid, but uncommon in ML)
mixed = [1, "hello", True, 3.14]

print(scores)    # [0.92, 0.87, 0.95, 0.91, 0.88]
print(len(scores))  # 5

Accessing Items

Python uses zero-based indexing — the first item is at index 0.

fruits = ["apple", "banana", "cherry", "date"]
#          index 0    index 1    index 2   index 3

print(fruits[0])   # apple
print(fruits[2])   # cherry
print(fruits[-1])  # date  (negative = from the end)
print(fruits[-2])  # cherry

Slicing

Get a subset of a list with [start:end] (end is exclusive):

numbers = [10, 20, 30, 40, 50, 60, 70]

print(numbers[1:4])   # [20, 30, 40]  — index 1 to 3
print(numbers[:3])    # [10, 20, 30]  — first 3
print(numbers[4:])    # [50, 60, 70]  — from index 4 onward
print(numbers[::2])   # [10, 30, 50, 70]  — every 2nd item

Modifying Lists

model_scores = [0.80, 0.85, 0.88]

# Add to end
model_scores.append(0.91)
print(model_scores)  # [0.80, 0.85, 0.88, 0.91]

# Remove
model_scores.remove(0.85)
print(model_scores)  # [0.80, 0.88, 0.91]

# Sort
model_scores.sort()
print(model_scores)  # [0.80, 0.88, 0.91]

# Sort descending
model_scores.sort(reverse=True)
print(model_scores)  # [0.91, 0.88, 0.80]

Common List Operations in ML

training_losses = [1.2, 0.95, 0.78, 0.65, 0.52, 0.41]

print(f"First loss: {training_losses[0]}")    # 1.2
print(f"Final loss: {training_losses[-1]}")   # 0.41
print(f"Min loss: {min(training_losses)}")    # 0.41
print(f"Max loss: {max(training_losses)}")    # 1.2
print(f"Avg loss: {sum(training_losses)/len(training_losses):.3f}")  # 0.752

# Check if loss improved
improved = training_losses[-1] < training_losses[0]
print(f"Model improved: {improved}")  # True

Dictionaries `{}` — Key-Value Pairs

A dictionary maps unique keys to values. Think of a real dictionary: you look up a word (key) to find its definition (value).

# Creating a dictionary
model_config = {
    "architecture": "ResNet-50",
    "learning_rate": 0.001,
    "batch_size": 32,
    "num_epochs": 100,
    "dropout": 0.5,
}

print(model_config["architecture"])   # ResNet-50
print(model_config["learning_rate"])  # 0.001

Accessing & Modifying

hyperparams = {
    "lr": 0.01,
    "momentum": 0.9,
    "weight_decay": 1e-4,
}

# Access
print(hyperparams["lr"])          # 0.01

# Safer access (doesn't crash if key missing)
val = hyperparams.get("dropout", 0.0)  # returns 0.0 if key not found
print(val)  # 0.0

# Add / update
hyperparams["dropout"] = 0.3
hyperparams["lr"] = 0.005  # update existing key

# Delete
del hyperparams["momentum"]

print(hyperparams)
# {'lr': 0.005, 'weight_decay': 0.0001, 'dropout': 0.3}

Iterating Over a Dictionary

metrics = {
    "accuracy": 0.94,
    "precision": 0.92,
    "recall": 0.95,
    "f1_score": 0.935,
}

# Keys only
for key in metrics:
    print(key)

# Values only
for val in metrics.values():
    print(f"{val:.2f}")

# Both key and value
for key, val in metrics.items():
    print(f"{key}: {val:.3f}")
# accuracy: 0.940
# precision: 0.920
# recall: 0.950
# f1_score: 0.935

Combining Lists and Dictionaries

In ML, you often combine both — for example, a list of dicts:

# Dataset as list of dicts
dataset = [
    {"image": "img_001.jpg", "label": "cat", "confidence": 0.98},
    {"image": "img_002.jpg", "label": "dog", "confidence": 0.87},
    {"image": "img_003.jpg", "label": "cat", "confidence": 0.91},
]

# Access
print(dataset[0]["label"])  # cat
print(dataset[1]["confidence"])  # 0.87

# Filter
cats = [item for item in dataset if item["label"] == "cat"]
print(len(cats))  # 2

List Comprehensions (Powerful Shortcut)

# The long way
squared = []
for x in [1, 2, 3, 4, 5]:
    squared.append(x ** 2)

# The list comprehension way
squared = [x ** 2 for x in [1, 2, 3, 4, 5]]
print(squared)  # [1, 4, 9, 16, 25]

# With condition
losses = [1.2, 0.8, 2.1, 0.5, 1.8]
high_losses = [l for l in losses if l > 1.0]
print(high_losses)  # [1.2, 2.1, 1.8]

# Normalize list of values to 0-1 range
raw = [10, 20, 30, 40, 50]
max_val = max(raw)
normalized = [x / max_val for x in raw]
print(normalized)  # [0.2, 0.4, 0.6, 0.8, 1.0]

Summary

Structure	Syntax	Key trait	ML Use
List	`[1, 2, 3]`	Ordered, indexed	Dataset samples, losses
Dict	`{"key": val}`	Key-value mapping	Configs, metrics

Knowledge Check

You have a list `losses = [0.9, 0.7, 0.5, 0.3]`. What does `losses[-1]` return?