Decoder-Only Models

← Back to Transformers

Transformer models using only the decoder with causal (left-to-right) attention. Each token can only attend to previous tokens. Best for generation tasks. Dominant architecture for LLMs: GPT-4, Claude, LLaMA, Gemini, Mistral.

Encoder-Only Models (bidirectional understanding)
Autoregressive Models (decoder-only models are autoregressive)
Scaling Laws (decoder-only models scale predictably)

deep-learning transformers decoder-only gpt llm

Software Engineering KB

Explorer

Decoder-Only Models

Decoder-Only Models

Graph View

Table of Contents

Backlinks

Software Engineering KB

Explorer

Decoder-Only Models

Decoder-Only Models

Related

Graph View

Table of Contents

Backlinks