Decoder-Only Models
← Back to Transformers
Transformer models using only the decoder with causal (left-to-right) attention. Each token can only attend to previous tokens. Best for generation tasks. Dominant architecture for LLMs: GPT-4, Claude, LLaMA, Gemini, Mistral.
Related
- Encoder-Only Models (bidirectional understanding)
- Autoregressive Models (decoder-only models are autoregressive)
- Scaling Laws (decoder-only models scale predictably)