1. Introduction
  2. Building the Model
  3. Prerequisites
    1. Software
    2. The Training Data
    3. The Pipeline
  4. The Tokenizer
    1. Building the Vocabulary
    2. Encoding and Decoding
    3. Why Word-Level Tokens?
    4. Complete Code
  5. The Autograd Engine
    1. The Math You Need
    2. The Value Class
    3. Derived Operations
    4. The Computation Graph
    5. Backward Pass
    6. Complete Code
  6. Neural Network Primitives
    1. Linear
    2. Softmax
    3. RMSNorm
    4. Complete Code
  7. The Model
    1. Configuration
    2. Parameters
    3. Embeddings
    4. Attention
    5. MLP
    6. Residual Connections
    7. The KV Cache
    8. Creating the Model
    9. Running the Model
    10. Complete Code
  8. Training and Inference
  9. Training
    1. The Training Configuration
    2. The Training Loop
    3. Watching It Learn
    4. Complete Code
  10. Saving the Model
    1. What Gets Saved
    2. Loading the Model
  11. Generation
    1. The Generation Loop
    2. Sampling Strategies
    3. The KV Cache
    4. Example Output
    5. Complete Code
  12. Putting It to Work
  13. Smoke Test
    1. Train the Model
    2. Generate Sentences
    3. What You Have Built
    4. Complete Code
  14. Fine-Tuning
    1. The Question Dataset
    2. Run the Fine-Tuning
    3. Generate Questions
    4. Catastrophic Forgetting
    5. Complete Code