The Pipeline
Here is the path from raw text to a model that generates new sentences:
training data -> tokenizer -> model -> training -> saved weights
|
tokenizer -> model -> generation <- load weights
We build each piece from scratch, in order. Each chapter introduces one component and ends with a Complete Code page containing the finished source for that stage.
What You Will Build
| Chapter | You will create | Pipeline stage |
|---|---|---|
| The Tokenizer | tokenizer.ts | Turns words into numbers and back |
| The Autograd Engine | autograd.ts | Automatic differentiation (makes training possible) |
| Neural Network Primitives | nn.ts | Linear layers, softmax, normalization |
| The Model | model.ts, rng.ts | The GPT architecture: config, weights, forward pass |
| Training | train.ts | The training loop and optimizer |
| Saving the Model | saveModel, loadModel | Serialize trained weights to disk and load them back |
| Generation | generate.ts | Inference: turning a trained model into sentences |
| Smoke Test | phrases-train.ts, phrases-generate.ts | Entry points to train and generate |
| Fine-Tuning | phrases-fine-tune.ts | Adapt a trained model to new data |