Software Engineering KB

Home

❯

09 Machine Learning and AI

❯

01 Deep Learning

❯

01 Concept

❯

Flash Attention

Flash Attention

Feb 10, 20261 min read

  • deep-learning
  • transformers
  • flash-attention
  • optimization

Flash Attention

← Back to Transformers

An IO-aware exact attention algorithm that reduces memory usage from O(n^2) to O(n) and significantly speeds up training. Achieves this by tiling the attention computation to minimize reads/writes to GPU high-bandwidth memory (HBM). Now standard in modern Transformer training.

Related

  • Self-Attention (what Flash Attention optimizes)
  • Mixed Precision Training (complementary optimization)

deep-learning transformers flash-attention optimization


Graph View

  • Flash Attention
  • Related

Backlinks

  • Transformers
  • Mixed Precision Training
  • Self-Attention

Created with Quartz v4.5.2 © 2026

  • GitHub