Software Engineering KB

Home

❯

09 Machine Learning and AI

❯

01 Deep Learning

❯

00 Category

❯

Transformers

Transformers

Feb 10, 20261 min read

  • deep-learning
  • transformers
  • attention

Transformers

Back: Deep Learning

The dominant architecture in modern deep learning, based on self-attention mechanisms. Transformers process all positions in parallel and capture long-range dependencies, powering LLMs, vision models, and multi-modal systems.

Concepts

  • Self-Attention
  • Multi-Head Attention
  • Positional Encoding
  • Encoder-Decoder Architecture
  • Encoder-Only Models
  • Decoder-Only Models
  • Scaling Laws
  • Flash Attention
  • Mixture of Experts

deep-learning transformers attention


Graph View

  • Transformers
  • Concepts

Backlinks

  • Software Engineering - Map of Content
  • Decoder-Only Models
  • Encoder-Decoder Architecture
  • Encoder-Only Models
  • Flash Attention
  • LSTM
  • Layer Normalization
  • Mixture of Experts
  • Multi-Head Attention
  • Positional Encoding
  • Residual Connections
  • Scaling Laws
  • Self-Attention
  • Deep Learning
  • Video Understanding
  • Vision Transformers

Created with Quartz v4.5.2 © 2026

  • GitHub