Positional Encoding
← Back to Transformers
Injects sequence order information into the model since self-attention is permutation-invariant. Original Transformer uses sinusoidal encoding; modern models often use learned positional embeddings, rotary positional embeddings (RoPE), or ALiBi.
Related
- Self-Attention (position-agnostic without encoding)
- Encoder-Decoder Architecture (uses positional encoding)