Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Autograd Engine

Training a neural network requires answering one question for each of the model’s parameters: if I nudge this number up a tiny bit, does the model get better or worse?

The answer is the parameter’s gradient: the direction and magnitude of change that would reduce the model’s error. Computing gradients by hand for tens of thousands of interconnected parameters would be impossible. Instead, we use automatic differentiation: we build a record of every computation the model performs, then trace backwards through that record to compute all gradients at once.