Software Engineering KB

Home

❯

09 Machine Learning and AI

❯

01 Deep Learning

❯

00 Category

❯

Transformers

Feb 10, 20261 min read

deep-learning
transformers
attention

Transformers

Back: Deep Learning

The dominant architecture in modern deep learning, based on self-attention mechanisms. Transformers process all positions in parallel and capture long-range dependencies, powering LLMs, vision models, and multi-modal systems.

Concepts

Self-Attention
Multi-Head Attention
Positional Encoding
Encoder-Decoder Architecture
Encoder-Only Models
Decoder-Only Models
Scaling Laws
Flash Attention
Mixture of Experts

deep-learning transformers attention

Graph View

Transformers
Concepts

Backlinks

Software Engineering - Map of Content
Decoder-Only Models
Encoder-Decoder Architecture
Encoder-Only Models
Flash Attention
LSTM
Layer Normalization
Mixture of Experts
Multi-Head Attention
Positional Encoding
Residual Connections
Scaling Laws
Self-Attention
Deep Learning
Video Understanding
Vision Transformers

GitHub

Software Engineering KB

Explorer

Transformers

Transformers

Concepts

Graph View

Table of Contents

Backlinks