Encoder-Decoder Architecture

Back to Transformers

The original Transformer architecture (Vaswani et al., 2017, “Attention Is All You Need”). Encoder processes input sequence with bidirectional attention; decoder generates output autoregressively with causal attention and cross-attention to encoder. Used in T5, BART for sequence-to-sequence tasks (translation, summarization).


deep-learning transformers encoder-decoder