The Autograd Engine

Training a neural network requires answering one question for each of the model’s parameters: if I nudge this number up a tiny bit, does the model get better or worse?

The answer is the parameter’s gradient: the direction and magnitude of change that would reduce the model’s error. Computing gradients by hand for tens of thousands of interconnected parameters would be impossible. Instead, we use automatic differentiation: we build a record of every computation the model performs, then trace backwards through that record to compute all gradients at once.

Keyboard shortcuts

LLMs, the Hard Way

The Autograd Engine