Layer Normalization
← Back to Neural Network Fundamentals
Normalizes across the feature dimension for each individual example (rather than across the batch). Preferred in Transformers and RNNs because it does not depend on batch size and works well with variable-length sequences.
Related
- Batch Normalization (normalizes across batch instead)
- Transformers (standard normalization for Transformers)