Neural Network Primitives
On top of the autograd engine from the previous chapter, we build three operations that the model uses throughout. These are general-purpose building blocks, the same operations found in every neural network framework. A neural network only needs a surprisingly small toolkit:
-
A way to combine inputs. Take several numbers, multiply each by a learned weight, and add the results together. This is just a weighted sum, the same idea as computing a course grade from weighted assignment scores. That is what
lineardoes. -
A way to make decisions. The model needs to express confidence: “I think the next word is 70% likely to be cat and 20% likely to be dog.” Raw scores can be any number, but probabilities must be positive and sum to
- That is what
softmaxdoes. It turns arbitrary scores into a proper probability distribution.
- That is what
-
A way to stay stable. Numbers that pass through dozens of weighted sums tend to either explode toward infinity or shrink toward zero. That is what
rmsnormdoes. It rescales a vector so the values stay in a reasonable range, similar to normalizing a set of exam scores to have a consistent spread.