minimal transformer inference in c. ~400 lines total.
make
./download.sh
./tinyllm models/stories15M.bin "Once upon a time"./tinyllm <model.bin> [prompt] [-t temp] [-p topp] [-n steps] [-s seed] [-z tokenizer]
runs llama-style models (tinyllamas, llama2.c format). implements:
- rmsnorm
- rotary position embeddings (rope)
- grouped query attention (gqa)
- swiglu ffn
- kv cache
- top-p sampling
- bpe tokenizer
no dependencies except libc and libm.