Scaling Laws

← Back to Transformers

Empirical relationships showing that model performance improves predictably as a power law of model size, dataset size, and compute budget. The Chinchilla scaling laws (2022) showed that most models were undertrained relative to their size, leading to the shift toward more data and smaller models.

Decoder-Only Models (primary beneficiaries of scaling)
Training at Scale (infrastructure for scaling)

deep-learning transformers scaling-laws

Software Engineering KB

Explorer

Scaling Laws

Scaling Laws

Graph View

Table of Contents

Backlinks

Software Engineering KB

Explorer

Scaling Laws

Scaling Laws

Related

Graph View

Table of Contents

Backlinks