Variables & Data Types

Why Python?

Python has become the #1 language for machine learning and data science. Here’s why:

🧩 Simple syntax — reads almost like English
📦 Rich ecosystem — NumPy, Pandas, scikit-learn, PyTorch, TensorFlow
🔬 Dominant in research — most ML papers include Python code
🌐 Huge community — endless tutorials, help, and libraries

Let’s start from absolute zero.

Variables: Storing Information

A variable is a named container that holds a value. Think of it like labeling a box.

# Assign a value to a variable using =
name = "Alice"
age = 28
height = 5.6  # in feet
is_student = True

# Print the variable
print(name)    # Alice
print(age)     # 28

Naming Rules

Use lowercase letters and underscores: learning_rate, num_samples
Don’t start with a number: ~~1_model~~ ❌
No spaces: ~~my model~~ ❌ → my_model ✓
Case-sensitive: Model ≠ model

The Four Basic Data Types

1. Integer (`int`) — Whole numbers

num_epochs = 100
batch_size = 32
num_features = 784  # 28×28 pixels in MNIST

print(type(num_epochs))  # <class 'int'>

2. Float (`float`) — Decimal numbers

learning_rate = 0.001
accuracy = 0.9423
pi = 3.14159

# Floats are used constantly in ML (probabilities, weights, losses)
dropout_rate = 0.5

3. String (`str`) — Text

model_name = "ResNet-50"
dataset_path = "/data/images/"

# Strings use single or double quotes
greeting = 'Hello, ML!'
message = "I love Python"

# f-strings: embed variables in text
epoch = 5
loss = 0.342
print(f"Epoch {epoch}: loss = {loss}")  # Epoch 5: loss = 0.342

4. Boolean (`bool`) — True or False

is_training = True
has_gpu = False
model_loaded = True

# Booleans are often used for flags and conditions
if is_training:
    print("Running training loop...")

Arithmetic Operations

a = 10
b = 3

print(a + b)   # 13  — addition
print(a - b)   # 7   — subtraction
print(a * b)   # 30  — multiplication
print(a / b)   # 3.333...  — division (always float)
print(a // b)  # 3   — floor division (integer result)
print(a % b)   # 1   — modulo (remainder)
print(a ** b)  # 1000  — exponentiation (10³)

These are used everywhere in ML:

total_steps = num_epochs * steps_per_epoch      # multiplication
mean = total / num_samples                      # division
loss_change = current_loss - previous_loss      # subtraction

Type Conversion

Sometimes you need to convert between types:

# String to number
s = "42"
n = int(s)          # 42 (integer)
f = float(s)        # 42.0 (float)

# Number to string
accuracy = 0.95
print("Accuracy: " + str(accuracy))  # "Accuracy: 0.95"

# Be careful: this will crash!
# print("Result: " + 42)  ← TypeError!
# print("Result: " + str(42))  ← Works! ✓

Variables in ML Context

Here’s what variables look like in a real ML script:

# ── Hyperparameters ────────────────────────────────────
learning_rate = 0.001       # how fast the model learns
num_epochs = 50             # how many times to train on all data
batch_size = 64             # samples processed at once
dropout_rate = 0.3          # regularization strength

# ── Dataset info ──────────────────────────────────────
num_classes = 10            # e.g., digits 0-9
input_size = 784            # 28×28 flattened image
train_size = 50000
test_size = 10000

# ── Model state ───────────────────────────────────────
best_accuracy = 0.0
is_training = True
model_saved = False
checkpoint_path = "models/checkpoint.pth"

print(f"Training for {num_epochs} epochs with lr={learning_rate}")
# Training for 50 epochs with lr=0.001

Summary Table

Type	Example	In ML
`int`	`32`, `100`	Batch size, epochs
`float`	`0.001`, `0.95`	Learning rate, accuracy
`str`	`"ResNet"`	Model names, file paths
`bool`	`True`	Training flags

Knowledge Check

What data type would you use to store the learning rate 0.0001 in Python?