Configuration

The configuration defines the shape of the model:

const model = createModel({
  nLayer: 2,       // number of transformer layers
  nEmbd: 32,       // embedding dimension (size of internal vectors)
  blockSize: 16,   // maximum sequence length (longest sentence we can process)
  nHead: 4,        // number of attention heads
  headDim: 8,      // dimension per attention head (nEmbd / nHead)
  vocabSize: 597,  // our tokenizer's vocabulary size
});

These are small numbers. Production models use nEmbd in the thousands and dozens of layers. But the architecture is the same, ours just fits in memory and trains in minutes instead of months.

A note on nHead: with 32 embedding dimensions, 4 heads is a good balance. Each head gets 32 / 4 = 8 dimensions to work with. Two heads would give 16 dims each (fewer distinct attention patterns), and 8 heads would give 4 dims each (very little room per head at this small scale).

Keyboard shortcuts

LLMs, the Hard Way

Configuration