Module 2 — Python for Machine Learning beginner 18 min
Variables & Data Types
Why Python?
Python has become the #1 language for machine learning and data science. Here’s why:
- 🧩 Simple syntax — reads almost like English
- 📦 Rich ecosystem — NumPy, Pandas, scikit-learn, PyTorch, TensorFlow
- 🔬 Dominant in research — most ML papers include Python code
- 🌐 Huge community — endless tutorials, help, and libraries
Let’s start from absolute zero.
Variables: Storing Information
A variable is a named container that holds a value. Think of it like labeling a box.
# Assign a value to a variable using =
name = "Alice"
age = 28
height = 5.6 # in feet
is_student = True
# Print the variable
print(name) # Alice
print(age) # 28
Naming Rules
- Use lowercase letters and underscores:
learning_rate,num_samples - Don’t start with a number:
❌1_model - No spaces:
❌ →my modelmy_model✓ - Case-sensitive:
Model≠model
The Four Basic Data Types
1. Integer (int) — Whole numbers
num_epochs = 100
batch_size = 32
num_features = 784 # 28×28 pixels in MNIST
print(type(num_epochs)) # <class 'int'>
2. Float (float) — Decimal numbers
learning_rate = 0.001
accuracy = 0.9423
pi = 3.14159
# Floats are used constantly in ML (probabilities, weights, losses)
dropout_rate = 0.5
3. String (str) — Text
model_name = "ResNet-50"
dataset_path = "/data/images/"
# Strings use single or double quotes
greeting = 'Hello, ML!'
message = "I love Python"
# f-strings: embed variables in text
epoch = 5
loss = 0.342
print(f"Epoch {epoch}: loss = {loss}") # Epoch 5: loss = 0.342
4. Boolean (bool) — True or False
is_training = True
has_gpu = False
model_loaded = True
# Booleans are often used for flags and conditions
if is_training:
print("Running training loop...")
Arithmetic Operations
a = 10
b = 3
print(a + b) # 13 — addition
print(a - b) # 7 — subtraction
print(a * b) # 30 — multiplication
print(a / b) # 3.333... — division (always float)
print(a // b) # 3 — floor division (integer result)
print(a % b) # 1 — modulo (remainder)
print(a ** b) # 1000 — exponentiation (10³)
These are used everywhere in ML:
total_steps = num_epochs * steps_per_epoch # multiplication
mean = total / num_samples # division
loss_change = current_loss - previous_loss # subtraction
Type Conversion
Sometimes you need to convert between types:
# String to number
s = "42"
n = int(s) # 42 (integer)
f = float(s) # 42.0 (float)
# Number to string
accuracy = 0.95
print("Accuracy: " + str(accuracy)) # "Accuracy: 0.95"
# Be careful: this will crash!
# print("Result: " + 42) ← TypeError!
# print("Result: " + str(42)) ← Works! ✓
Variables in ML Context
Here’s what variables look like in a real ML script:
# ── Hyperparameters ────────────────────────────────────
learning_rate = 0.001 # how fast the model learns
num_epochs = 50 # how many times to train on all data
batch_size = 64 # samples processed at once
dropout_rate = 0.3 # regularization strength
# ── Dataset info ──────────────────────────────────────
num_classes = 10 # e.g., digits 0-9
input_size = 784 # 28×28 flattened image
train_size = 50000
test_size = 10000
# ── Model state ───────────────────────────────────────
best_accuracy = 0.0
is_training = True
model_saved = False
checkpoint_path = "models/checkpoint.pth"
print(f"Training for {num_epochs} epochs with lr={learning_rate}")
# Training for 50 epochs with lr=0.001
Summary Table
| Type | Example | In ML |
|---|---|---|
int | 32, 100 | Batch size, epochs |
float | 0.001, 0.95 | Learning rate, accuracy |
str | "ResNet" | Model names, file paths |
bool | True | Training flags |
What data type would you use to store the learning rate 0.0001 in Python?