What is a Neuron? — ML Course

Inspired by the Brain

The human brain contains ~86 billion neurons. Each neuron:

Receives signals from other neurons (inputs)
Adds them up (weighted sum)
Fires or doesn’t fire based on the total (activation)

Artificial neural networks mimic this structure.

The Artificial Neuron (Perceptron)

An artificial neuron computes:

$z = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b$

$\text{output} = \text{activation}(z)$

Where:

$x_1, x_2, \ldots, x_n$ = inputs (features)
$w_1, w_2, \ldots, w_n$ = weights (how much each input matters)
$b$ = bias (offset)
activation = a function that transforms $z$ into the output

import numpy as np

def neuron(inputs, weights, bias, activation="relu"):
    # Step 1: Weighted sum
    z = np.dot(inputs, weights) + bias
    
    # Step 2: Apply activation
    if activation == "relu":
        return max(0, z)
    elif activation == "sigmoid":
        return 1 / (1 + np.exp(-z))
    elif activation == "tanh":
        return np.tanh(z)
    else:
        return z  # linear

# Example: detect if a tumor is malignant
# Input features: [cell_size, cell_shape, clump_thickness]
inputs  = np.array([0.8, 0.7, 0.9])
weights = np.array([0.5, 0.3, 0.2])
bias    = -0.4

z      = np.dot(inputs, weights) + bias
output = 1 / (1 + np.exp(-z))   # sigmoid
print(f"z = {z:.4f}")
print(f"Output (probability of malignant): {output:.4f}")

Activation Functions

The activation function is what makes neural networks powerful. Without it, a stack of neurons is just linear regression.

ReLU (Rectified Linear Unit) — Most Common

$\text{ReLU}(x) = \max(0, x)$

x = np.linspace(-3, 3, 100)

relu = np.maximum(0, x)
# Use for hidden layers — simple, fast, avoids vanishing gradient

Zero for negative inputs, linear for positive
Computationally very cheap
Used in almost all modern deep networks

Sigmoid

$\sigma(x) = \frac{1}{1 + e^{-x}}$

Output in (0, 1) — perfect for binary classification output
Prone to vanishing gradients (problem in deep networks)
Use in the final layer for binary classification

Softmax (for multi-class output)

$\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$

def softmax(x):
    e_x = np.exp(x - np.max(x))   # numerical stability
    return e_x / e_x.sum()

logits = np.array([2.0, 1.0, 0.5])   # raw outputs for 3 classes
probs  = softmax(logits)
print(probs)   # [0.629, 0.231, 0.140] — sum = 1.0
# → 62.9% probability for class 0, 23.1% for class 1

Comparison

Activation	Range	Used in
ReLU	[0, ∞)	Hidden layers
Sigmoid	(0, 1)	Binary output
Softmax	(0, 1), sum=1	Multi-class output
Tanh	(-1, 1)	RNNs, some hidden layers
Linear	(-∞, ∞)	Regression output

Weights and Learning

Initially, weights are random. During training, the network adjusts them to reduce error:

# Visualize how weight affects output
import matplotlib.pyplot as plt

x = np.linspace(-3, 3, 100)

plt.figure(figsize=(12, 4))
for w in [-2.0, -0.5, 0.5, 2.0]:
    z = w * x                              # single input, no bias
    output = 1 / (1 + np.exp(-z))         # sigmoid output
    plt.plot(x, output, label=f"w={w}")

plt.xlabel("Input x")
plt.ylabel("Output (sigmoid)")
plt.title("Effect of Weight on Neuron Output")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

A larger positive weight makes the neuron more sensitive to that input. A negative weight makes the neuron suppress that input.

From One Neuron to a Network

A single neuron can only learn linear patterns (with linear activation) or simple non-linear ones. The real power comes from connecting many neurons in layers:

Input Layer        Hidden Layer 1     Hidden Layer 2     Output Layer
  x₁ ──────┐       ○ ○ ○ ○           ○ ○ ○              ○
  x₂ ──────┼──→    ○ ○ ○ ○    →      ○ ○ ○    →         (prediction)  
  x₃ ──────┘       ○ ○ ○ ○           ○ ○ ○

Each neuron in each layer takes all outputs from the previous layer as inputs. This creates a deep neural network — the key idea behind “deep learning.”

Knowledge Check

A neuron computes z = 0.5×2 + 0.3×3 + (-0.4) = 1.5. After applying ReLU, what is the output?