Encoder-Only Models
← Back to Transformers
Transformer models using only the encoder with bidirectional attention. Each token attends to all tokens in the sequence. Best for understanding tasks: classification, NER, semantic similarity. Key models: BERT, RoBERTa, DeBERTa.
Related
- Decoder-Only Models (autoregressive generation)
- Encoder-Decoder Architecture (original Transformer)
- Masked Language Modeling (BERT’s pretraining objective)