██╗ ██╗██████╗ ██████╗ ██████╗ █████╗ ███████╗████████╗ ██╗ ███╗ ███╗
██║ ██║██╔══██╗██╔══██╗██╔══██╗██╔══██╗██╔════╝╚══██╔══╝ ██║ ████╗ ████║
██║ ██║██████╔╝██║ ██║██████╔╝███████║█████╗ ██║ ██║ ██╔████╔██║
██║ ██║██╔═══╝ ██║ ██║██╔══██╗██╔══██║██╔══╝ ██║ ██║ ██║╚██╔╝██║
╚██████╔╝██║ ██████╔╝██║ ██║██║ ██║██║ ██║ ███████╗██║ ╚═╝ ██║
╚═════╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚══════╝╚═╝ ╚═╝
Version 2.0.0 · Built from scratch · Production-Ready Framework
Updraft-LM is a robust, clean, and advanced implementation of a causal language model framework inspired by the LLaMA architecture. Designed for research and production workflows, it utilizes state-of-the-art transformer components including Rotary Positional Embeddings (RoPE), SwiGLU activation, RMSNorm, and Grouped-Query Attention (GQA).
git clone https://github.com/yourusername/updraft-lm.git
cd updraft-lm
pip install -r requirements.txtRun the demo using the built-in minimal model instantiation:
./quickstart.shUpdraft-LM uses a rich terminal user interface for an enhanced monitoring experience during training and generation.
python main.py generate \
--checkpoint checkpoints/pretrained_model.pt \
--prompt "Once upon a time" \
--max-length 100 \
--temperature 0.8 \
--top-k 50Or programmatically in Python:
from generator import load_model_for_inference
generator = load_model_for_inference('checkpoints/pretrained_model.pt')
outputs = generator.generate(
"The future of artificial intelligence",
max_length=100,
temperature=0.8,
top_k=50
)
print(outputs[0])python main.py train \
--dataset wikitext \
--epochs 5 \
--batch-size 64 \
--learning-rate 2.5e-4 \
--max-seq-len 512python main.py interactive \
--checkpoint checkpoints/pretrained_model.pt \
--temperature 0.8┌─────────────────────────────────────────────────────────────┐
│ INPUT TOKEN IDS │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌────────────────┐
│ Token Embedding │
└────────┬───────┘
│
╔══════════════╧═══════════════╗
║ Transformer Block × 12 ║
║ ┌─────────────────────────┐ ║
║ │ RMSNorm │ ║
║ └──────────┬──────────────┘ ║
║ ▼ ║
║ ┌─────────────────────────┐ ║
║ │ Grouped Query Attention │ ║
║ │ (12 q_heads, 4 kv_heads)│ ║
║ │ + Rotary Bias (RoPE) │ ║
║ └──────────┬──────────────┘ ║
║ │ ║
║ ▼ ║
║ ┌─────────────────────────┐ ║
║ │ RMSNorm │ ║
║ └──────────┬──────────────┘ ║
║ ▼ ║
║ ┌─────────────────────────┐ ║
║ │ Feed-Forward Network │ ║
║ │ (SwiGLU Activation) │ ║
║ └─────────────────────────┘ ║
╚══════════════╤═══════════════╝
│
▼
┌────────────────┐
│ RMSNorm │
└────────┬───────┘
│
▼
┌────────────────┐
│ LM Head │
└────────┬───────┘
│
▼
┌──────────────────────────────┐
│ LOGITS │
│ (vocabulary: 50,257) │
└──────────────────────────────┘
| Component | Value | Details |
|---|---|---|
| Layers | 12 | LLaMA-style decoder blocks |
| Attention Heads | 12 | Query heads |
| KV Heads | 4 | Grouped Query Attention (GQA) |
| Embedding Dimension | 768 | d_model |
| Feed-Forward Dimension | 3072 | SwiGLU hidden projection |
| Supported Context | 512 | Configurable up to larger windows via RoPE |
| Vocabulary Size | 50,257 | Configurable context tokens |
| Normalization | 1e-5 | RMSNorm epsilon parameter |
For detailed architectural logic, please view the mathematical derivations in MATH.md.