Transformer Lab

A complete transformer implementation from scratch using only NumPy — no PyTorch, no TensorFlow. Every matrix multiplication, every gradient, every optimization step is explicit and inspectable.

Built for learning and interpretability research: understand exactly how transformers work by building one from raw math.

What's Inside

Full GPT-architecture transformer — token embeddings, positional encoding, multi-head attention, feed-forward networks, layer norm, residual connections
Complete backpropagation — hand-derived gradients for every component
Adam optimizer — from scratch with weight decay, warmup, cosine scheduling
Interpretability probes — attention pattern analysis, head classification, induction head detection, logit attribution, activation caching
Training pipeline — cross-entropy loss, gradient clipping, perplexity tracking

Architecture

Token IDs → Embedding → Positional Encoding
    → [TransformerBlock × N]
        → LayerNorm → MultiHeadAttention → Residual
        → LayerNorm → FeedForward (GELU) → Residual
    → Final LayerNorm → Output Projection → Logits

Every component has .forward(), .backward(), and .parameters — fully differentiable, fully inspectable.

Interpretability Tools

Attention Pattern Analysis

Classify attention heads by behavior:

Positional heads — attend to fixed offsets (previous token, etc.)
Content heads — attend based on token meaning
Induction heads — implement in-context learning by copying patterns

Logit Attribution

Decompose a prediction into per-layer contributions. Which transformer block is responsible for predicting the next token?

Activation Caching

Record intermediate activations at every layer. Track residual stream norms, detect dead neurons, measure saturation.

Quick Start

git clone https://github.com/BabyChrist666/transformer-lab.git
cd transformer-lab
pip install -r requirements.txt

# Run tests (61 passing)
pytest tests/ -v

# Run the full experiment: train + generate + analyze
python -m experiments.train_and_analyze

Experiment Results

Training a 152K-parameter, 3-layer, 4-head transformer on Shakespeare:

Metric	Value
Parameters	152,320
Final Loss	3.24
Final Perplexity	25.5
Training Time	10.4s

Head Analysis

Layer	Head	Type	Entropy	Avg Distance
0	0	mixed	0.511	18.0
0	1	mixed	0.906	9.0
1	0	content	4.006	9.5
1	1	content	3.976	9.0
2	0	content	3.999	9.4
2	1	content	4.011	9.8

Layer 0 heads are focused (low entropy, mixed behavior), while layers 1-2 develop broad content-based attention.

Logit Attribution

For predicting 'b' at position 6 in "To be, or not to be":

Block 1 contributes +4.80 (promotes correct prediction)
Block 0 contributes -8.95 (suppresses)
Block 2 contributes -3.99 (fine-tunes)

Project Structure

transformer_lab/
├── attention.py      # Multi-head attention + causal masking
├── embeddings.py     # Token embeddings + sinusoidal positional encoding
├── model.py          # Full transformer + LayerNorm + FeedForward
├── trainer.py        # Training loop + Adam optimizer + loss
└── probe.py          # Interpretability: attention probes, activation cache, logit attribution

experiments/
└── train_and_analyze.py    # Full training + generation + analysis pipeline

tests/                      # 61 tests
├── test_attention.py
├── test_model.py
├── test_probe.py
└── test_trainer.py

Why NumPy Only?

Using PyTorch or TensorFlow hides the mechanics behind abstractions. This implementation makes every operation visible:

See exactly how Q @ K.T / sqrt(d_k) computes attention scores
Trace gradients through softmax, layer norm, and GELU by hand
Understand why residual connections and layer norm matter for training stability
Inspect how each attention head develops different behaviors

Tech Stack

Python 3.10+
NumPy — all matrix operations
pytest — 61 tests

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
experiments		experiments
tests		tests
transformer_lab		transformer_lab
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Lab

What's Inside

Architecture

Interpretability Tools

Attention Pattern Analysis

Logit Attribution

Activation Caching

Quick Start

Experiment Results

Head Analysis

Logit Attribution

Project Structure

Why NumPy Only?

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformer Lab

What's Inside

Architecture

Interpretability Tools

Attention Pattern Analysis

Logit Attribution

Activation Caching

Quick Start

Experiment Results

Head Analysis

Logit Attribution

Project Structure

Why NumPy Only?

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages