Scaling Laws
← Back to Transformers
Empirical relationships showing that model performance improves predictably as a power law of model size, dataset size, and compute budget. The Chinchilla scaling laws (2022) showed that most models were undertrained relative to their size, leading to the shift toward more data and smaller models.
Related
- Decoder-Only Models (primary beneficiaries of scaling)
- Training at Scale (infrastructure for scaling)