Residual Connections
← Back to Neural Network Fundamentals
Skip connections that add the input of a layer directly to its output: output = F(x) + x. Enable training of very deep networks (100+ layers) by providing gradient shortcuts. Introduced in ResNet; used in virtually all modern architectures including Transformers.
Related
- Backpropagation (residual connections improve gradient flow)
- CNN Architectures (ResNet)
- Transformers (use residual connections in every block)