Skip to content

BabyChrist666/transformer-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Lab

A complete transformer implementation from scratch using only NumPy — no PyTorch, no TensorFlow. Every matrix multiplication, every gradient, every optimization step is explicit and inspectable.

Built for learning and interpretability research: understand exactly how transformers work by building one from raw math.

What's Inside

  • Full GPT-architecture transformer — token embeddings, positional encoding, multi-head attention, feed-forward networks, layer norm, residual connections
  • Complete backpropagation — hand-derived gradients for every component
  • Adam optimizer — from scratch with weight decay, warmup, cosine scheduling
  • Interpretability probes — attention pattern analysis, head classification, induction head detection, logit attribution, activation caching
  • Training pipeline — cross-entropy loss, gradient clipping, perplexity tracking

Architecture

Token IDs → Embedding → Positional Encoding
    → [TransformerBlock × N]
        → LayerNorm → MultiHeadAttention → Residual
        → LayerNorm → FeedForward (GELU) → Residual
    → Final LayerNorm → Output Projection → Logits

Every component has .forward(), .backward(), and .parameters — fully differentiable, fully inspectable.

Interpretability Tools

Attention Pattern Analysis

Classify attention heads by behavior:

  • Positional heads — attend to fixed offsets (previous token, etc.)
  • Content heads — attend based on token meaning
  • Induction heads — implement in-context learning by copying patterns

Logit Attribution

Decompose a prediction into per-layer contributions. Which transformer block is responsible for predicting the next token?

Activation Caching

Record intermediate activations at every layer. Track residual stream norms, detect dead neurons, measure saturation.

Quick Start

git clone https://github.com/BabyChrist666/transformer-lab.git
cd transformer-lab
pip install -r requirements.txt

# Run tests (61 passing)
pytest tests/ -v

# Run the full experiment: train + generate + analyze
python -m experiments.train_and_analyze

Experiment Results

Training a 152K-parameter, 3-layer, 4-head transformer on Shakespeare:

Metric Value
Parameters 152,320
Final Loss 3.24
Final Perplexity 25.5
Training Time 10.4s

Head Analysis

Layer Head Type Entropy Avg Distance
0 0 mixed 0.511 18.0
0 1 mixed 0.906 9.0
1 0 content 4.006 9.5
1 1 content 3.976 9.0
2 0 content 3.999 9.4
2 1 content 4.011 9.8

Layer 0 heads are focused (low entropy, mixed behavior), while layers 1-2 develop broad content-based attention.

Logit Attribution

For predicting 'b' at position 6 in "To be, or not to be":

  • Block 1 contributes +4.80 (promotes correct prediction)
  • Block 0 contributes -8.95 (suppresses)
  • Block 2 contributes -3.99 (fine-tunes)

Project Structure

transformer_lab/
├── attention.py      # Multi-head attention + causal masking
├── embeddings.py     # Token embeddings + sinusoidal positional encoding
├── model.py          # Full transformer + LayerNorm + FeedForward
├── trainer.py        # Training loop + Adam optimizer + loss
└── probe.py          # Interpretability: attention probes, activation cache, logit attribution

experiments/
└── train_and_analyze.py    # Full training + generation + analysis pipeline

tests/                      # 61 tests
├── test_attention.py
├── test_model.py
├── test_probe.py
└── test_trainer.py

Why NumPy Only?

Using PyTorch or TensorFlow hides the mechanics behind abstractions. This implementation makes every operation visible:

  • See exactly how Q @ K.T / sqrt(d_k) computes attention scores
  • Trace gradients through softmax, layer norm, and GELU by hand
  • Understand why residual connections and layer norm matter for training stability
  • Inspect how each attention head develops different behaviors

Tech Stack

  • Python 3.10+
  • NumPy — all matrix operations
  • pytest — 61 tests

License

MIT

About

From-scratch transformer with interpretability tools — attention probes, induction head detection, logit attribution, activation caching. NumPy only, no frameworks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages