Activation Functions

Back to Neural Network Fundamentals

Nonlinear functions applied to neuron outputs. Without them, stacking layers would only compute linear transformations. The choice of activation affects training dynamics and model capacity.

Key Types

  • ReLU — max(0, x), most common, simple but can “die” (zero gradients)
  • Sigmoid — maps to (0, 1), used in binary output layers
  • Tanh — maps to (-1, 1), zero-centered
  • GELU — smooth ReLU variant, used in Transformers
  • Swish — x * sigmoid(x), self-gated
  • Softmax — converts logits to probability distribution (multi-class output)

deep-learning activation-functions