Software Engineering KB

Home

❯

09 Machine Learning and AI

❯

01 Deep Learning

❯

01 Concept

❯

Model Parallelism

Model Parallelism

Feb 10, 20261 min read

  • deep-learning
  • distributed-training
  • model-parallelism

Model Parallelism

← Back to Training at Scale

Split the model across multiple GPUs when it is too large to fit on a single device. Two main types: tensor parallelism (split individual layers across GPUs) and pipeline parallelism (assign different layers to different GPUs).

Related

  • Data Parallelism (split data instead of model)
  • Distributed Training Frameworks (implement model parallelism)

deep-learning distributed-training model-parallelism


Graph View

  • Model Parallelism
  • Related

Backlinks

  • Training at Scale
  • Data Parallelism (DL)
  • Distributed Training Frameworks
  • Mixture of Experts

Created with Quartz v4.5.2 © 2026

  • GitHub