Software Engineering KB

Home

❯

09 Machine Learning and AI

❯

01 Deep Learning

❯

01 Concept

❯

Mixed Precision Training

Mixed Precision Training

Feb 10, 20261 min read

  • deep-learning
  • distributed-training
  • mixed-precision

Mixed Precision Training

← Back to Training at Scale

Use lower precision (FP16 or BF16) for most computations while keeping FP32 master weights. Roughly doubles training speed and halves memory usage with minimal accuracy loss. BF16 preferred for its wider dynamic range. Now standard practice.

Related

  • Flash Attention (complementary optimization)
  • Gradient Accumulation (both reduce memory pressure)

deep-learning distributed-training mixed-precision


Graph View

  • Mixed Precision Training
  • Related

Backlinks

  • Training at Scale
  • Flash Attention
  • Gradient Accumulation
  • Inference Optimization

Created with Quartz v4.5.2 © 2026

  • GitHub