Gradient Accumulation

← Back to Training at Scale

Simulate larger batch sizes by accumulating gradients over multiple forward/backward passes before performing a weight update. Enables effective large-batch training on limited GPU memory.

Data Parallelism (alternative way to increase effective batch size)
Mixed Precision Training (complementary memory optimization)

deep-learning distributed-training gradient-accumulation

Software Engineering KB

Explorer

Gradient Accumulation

Gradient Accumulation

Graph View

Table of Contents

Backlinks

Software Engineering KB

Explorer

Gradient Accumulation

Gradient Accumulation

Related

Graph View

Table of Contents

Backlinks