Distributed Training Frameworks
← Back to Training at Scale
Tools and libraries for training across multiple GPUs and machines.
Key Frameworks
- DeepSpeed — Microsoft, ZeRO optimizer, efficient large model training
- FSDP (Fully Sharded Data Parallel) — PyTorch native, shards model/gradients/optimizer states
- Megatron-LM — NVIDIA, efficient tensor/pipeline parallelism for LLMs
Related
- Deep Learning Frameworks (underlying frameworks)
- Model Parallelism (what these tools implement)