You're offline — showing cached content
Module 2 — Python for Machine Learning beginner 18 min

Variables & Data Types

Why Python?

Python has become the #1 language for machine learning and data science. Here’s why:

  • 🧩 Simple syntax — reads almost like English
  • 📦 Rich ecosystem — NumPy, Pandas, scikit-learn, PyTorch, TensorFlow
  • 🔬 Dominant in research — most ML papers include Python code
  • 🌐 Huge community — endless tutorials, help, and libraries

Let’s start from absolute zero.


Variables: Storing Information

A variable is a named container that holds a value. Think of it like labeling a box.

# Assign a value to a variable using =
name = "Alice"
age = 28
height = 5.6  # in feet
is_student = True

# Print the variable
print(name)    # Alice
print(age)     # 28

Naming Rules

  • Use lowercase letters and underscores: learning_rate, num_samples
  • Don’t start with a number: 1_model
  • No spaces: my model ❌ → my_model
  • Case-sensitive: Modelmodel

The Four Basic Data Types

1. Integer (int) — Whole numbers

num_epochs = 100
batch_size = 32
num_features = 784  # 28×28 pixels in MNIST

print(type(num_epochs))  # <class 'int'>

2. Float (float) — Decimal numbers

learning_rate = 0.001
accuracy = 0.9423
pi = 3.14159

# Floats are used constantly in ML (probabilities, weights, losses)
dropout_rate = 0.5

3. String (str) — Text

model_name = "ResNet-50"
dataset_path = "/data/images/"

# Strings use single or double quotes
greeting = 'Hello, ML!'
message = "I love Python"

# f-strings: embed variables in text
epoch = 5
loss = 0.342
print(f"Epoch {epoch}: loss = {loss}")  # Epoch 5: loss = 0.342

4. Boolean (bool) — True or False

is_training = True
has_gpu = False
model_loaded = True

# Booleans are often used for flags and conditions
if is_training:
    print("Running training loop...")

Arithmetic Operations

a = 10
b = 3

print(a + b)   # 13  — addition
print(a - b)   # 7   — subtraction
print(a * b)   # 30  — multiplication
print(a / b)   # 3.333...  — division (always float)
print(a // b)  # 3   — floor division (integer result)
print(a % b)   # 1   — modulo (remainder)
print(a ** b)  # 1000  — exponentiation (10³)

These are used everywhere in ML:

total_steps = num_epochs * steps_per_epoch      # multiplication
mean = total / num_samples                      # division
loss_change = current_loss - previous_loss      # subtraction

Type Conversion

Sometimes you need to convert between types:

# String to number
s = "42"
n = int(s)          # 42 (integer)
f = float(s)        # 42.0 (float)

# Number to string
accuracy = 0.95
print("Accuracy: " + str(accuracy))  # "Accuracy: 0.95"

# Be careful: this will crash!
# print("Result: " + 42)  ← TypeError!
# print("Result: " + str(42))  ← Works! ✓

Variables in ML Context

Here’s what variables look like in a real ML script:

# ── Hyperparameters ────────────────────────────────────
learning_rate = 0.001       # how fast the model learns
num_epochs = 50             # how many times to train on all data
batch_size = 64             # samples processed at once
dropout_rate = 0.3          # regularization strength

# ── Dataset info ──────────────────────────────────────
num_classes = 10            # e.g., digits 0-9
input_size = 784            # 28×28 flattened image
train_size = 50000
test_size = 10000

# ── Model state ───────────────────────────────────────
best_accuracy = 0.0
is_training = True
model_saved = False
checkpoint_path = "models/checkpoint.pth"

print(f"Training for {num_epochs} epochs with lr={learning_rate}")
# Training for 50 epochs with lr=0.001

Summary Table

TypeExampleIn ML
int32, 100Batch size, epochs
float0.001, 0.95Learning rate, accuracy
str"ResNet"Model names, file paths
boolTrueTraining flags
Knowledge Check

What data type would you use to store the learning rate 0.0001 in Python?