Encoder-Decoder Architecture
← Back to Transformers
The original Transformer architecture (Vaswani et al., 2017, “Attention Is All You Need”). Encoder processes input sequence with bidirectional attention; decoder generates output autoregressively with causal attention and cross-attention to encoder. Used in T5, BART for sequence-to-sequence tasks (translation, summarization).
Related
- Encoder-Only Models (BERT family)
- Decoder-Only Models (GPT family)