Scaling Laws

Back to Transformers

Empirical relationships showing that model performance improves predictably as a power law of model size, dataset size, and compute budget. The Chinchilla scaling laws (2022) showed that most models were undertrained relative to their size, leading to the shift toward more data and smaller models.


deep-learning transformers scaling-laws