Software Engineering KB

Home

❯

09 Machine Learning and AI

❯

01 Deep Learning

❯

01 Concept

❯

Multi Head Attention

Multi-Head Attention

Feb 10, 20261 min read

  • deep-learning
  • transformers
  • multi-head-attention

Multi-Head Attention

← Back to Transformers

Run multiple self-attention operations in parallel with different learned projections. Each head can attend to different aspects of the input (syntax, semantics, position). Outputs are concatenated and linearly projected.

Related

  • Self-Attention (single attention head)
  • Layer Normalization (applied around attention)

deep-learning transformers multi-head-attention


Graph View

  • Multi-Head Attention
  • Related

Backlinks

  • Transformers
  • Self-Attention

Created with Quartz v4.5.2 © 2026

  • GitHub