Training at Scale
Back: Deep Learning
Techniques and infrastructure for training large models across multiple GPUs and machines. Essential for modern foundation models that require enormous compute.
Concepts
- Data Parallelism
- Model Parallelism
- Mixed Precision Training
- Gradient Accumulation
- Distributed Training Frameworks
- Deep Learning Frameworks