Software Engineering KB

Home

❯

09 Machine Learning and AI

❯

01 Deep Learning

❯

01 Concept

❯

Data Parallelism (DL)

Data Parallelism (DL)

Feb 10, 20261 min read

  • deep-learning
  • distributed-training
  • data-parallelism

Data Parallelism (DL)

← Back to Training at Scale

Replicate the entire model on each GPU; split the training data across GPUs. Each GPU computes gradients on its data shard, then gradients are synchronized (all-reduce). The simplest and most common form of distributed training.

Related

  • Model Parallelism (split the model instead)
  • Gradient Accumulation (simulate larger batches)

deep-learning distributed-training data-parallelism


Graph View

  • Data Parallelism (DL)
  • Related

Created with Quartz v4.5.2 © 2026

  • GitHub