A GPT-2 style transformer language model implemented from scratch in Rust for educational purposes. Companion code for the blog series Building an LLM From Scratch in Rust.
Feste is the fool in Shakespeare's Twelfth Night, known for his wordplay and wit. The model trains on Shakespeare's complete works and generates text in his style, making the name a natural fit.
A complete trainable transformer that demonstrates how language models work by implementing every component from basic operations. No deep learning frameworks are used.
The implementation trains on Shakespeare's works and generates text in similar style, showing clear perplexity improvements as training progresses.
Each part of the blog has a companion doc with configuration details and implementation reference:
# Get training data
curl -o shakespeare.txt https://www.gutenberg.org/files/100/100-0.txt
# Train a small model (10-15 minutes)
cargo run --release --example 06_train_shakespeare_smallThe configurable training example lets you reproduce any experiment from the Part 5 blog post using named presets:
# List available presets
cargo run --release --example train -- --list-presets
# Run a preset
cargo run --release --example train -- --preset pocket-bard
# Override parameters
cargo run --release --example train -- --preset spider --steps 10000
# Fully custom configuration
cargo run --release --example train -- \
--embd 256 --layers 6 --heads 12 --context 448 --vocab 8192See docs/05_TRAINING_EXAMPLES.md for the full preset table, transfer learning instructions, and details on all training examples.
01_train_tokenizers- BPE tokenization at multiple vocab sizes02_tensor_operations- Matrix multiplication and operations03_model_architecture- Transformer architecture exploration04_training_infrastructure- Training loop components
05_train_shakespeare_tiny- 50K parameters, 2-5 minutes06_train_shakespeare_small- 200K parameters, 10-20 minutes07_train_shakespeare_medium- 4M parameters, 1-2 hours08_train_shakespeare_gpt2- 163M parameters (GPT-2 Small), 24-30 hourstrain- Configurable training with blog experiment presets
Apache 2.0