Encoder-Only Models

← Back to Transformers

Transformer models using only the encoder with bidirectional attention. Each token attends to all tokens in the sequence. Best for understanding tasks: classification, NER, semantic similarity. Key models: BERT, RoBERTa, DeBERTa.

Decoder-Only Models (autoregressive generation)
Encoder-Decoder Architecture (original Transformer)
Masked Language Modeling (BERT’s pretraining objective)

deep-learning transformers encoder-only bert

Software Engineering KB

Explorer

Encoder-Only Models

Encoder-Only Models

Graph View

Table of Contents

Backlinks

Software Engineering KB

Explorer

Encoder-Only Models

Encoder-Only Models

Related

Graph View

Table of Contents

Backlinks