11L Int5-MLP + TTT-SGD + SmearGate + SWA (1.1455 BPB) by stukenov · Pull Request #264 · openai/parameter-golf

stukenov · 2026-03-20T20:09:28Z

Summary

val_bpb: 1.1455 (seed 1337, single seed — 3-seed validation in progress)
Artifact: 15.94 MB (int5-MLP + int6-attn + zstd-22)

Techniques

Technique	Source	Impact
11 layers (vs 9 baseline)	Funded by int5 savings	More model capacity
Int5 MLP [-16,15] + Int6 attention [-32,31]	Inspired by #180	Saves ~1.9MB, funds 11th layer
Full-model SGD TTT (2 epochs)	Inspired by #152	~0.005 BPB at eval time
SmearGate + BigramHash	From #102/#135	Bigram context injection
SWA (30 checkpoints)	From #89	Better generalization
OrthoInit + muP scaling	From #162	Stable training
Muon WD=0.04	From #60	Quantization-friendly weights
Sliding window eval stride=64	From #50	Full context per token

Architecture

11L / 512d / 8h / 4kv (GQA) / MLP 3x / relu^2 / 2048 seq_len / 26.67M params

Results

Stage	val_bpb
End of training (5197 steps)	1.1583
Post int5/int6 quant + sliding window	1.1507
Post TTT (2 epochs SGD, lr=0.002)	1.1455

Trained on 8xH100 SXM, 600s wallclock, 115ms/step.

Test plan

Single seed run (1337): 1.1455 BPB
3-seed validation (1337, 42, 2025) — in progress
Artifact under 16MB: 15.99 MB total
Training under 10 min: 600s on 8xH100
Eval under 10 min: ~696s (TTT 422s + sliding eval 273s)

11-layer model with mixed int5/int6 quantization, full-model SGD test-time training, SmearGate, BigramHash, SWA, and OrthoInit. Single seed result (1337): val_bpb=1.1455 Artifact: 15.94 MB (under 16MB limit) 3-seed validation in progress.

mohosy · 2026-03-21T00:01:01Z

ttt with int5 mlp is a sick combo, how long does your ttt take during eval? tryna figure out if 3 epochs is worth it over 2

stukenov · 2026-03-21T06:35:29Z

@notapplica i am need runpod credits for 3-seed validation

Stacks XSA (PR openai#265), EMA weight averaging (PR openai#287), Int5-MLP (PR openai#264), MuonWD=0.04 tuned from PR openai#162, seq_len=2048, 11 layers, BigramHash(2048), SmearGate, OrthoInit (PR openai#135), Late-K FP16 on final layer. Single-seed result (seed=1337), ~8903 steps on 8xH100.

Many TTT submissions (openai#136, openai#152, openai#254, openai#264, openai#338, openai#398, openai#417, openai#421, openai#442) flagged as potentially invalid for adapting on eval tokens BEFORE scoring them. Added correct score-then-adapt protocol with implementation guide. https://claude.ai/code/session_01M5XTtyz2Zdq5BDeh9qNn9y

notapplica mentioned this pull request Mar 20, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

HyperPotatoNeo mentioned this pull request Mar 21, 2026

11L + XSA4 + EMA(0.997) + seq2048 + Int5-MLP + MuonWD=0.04 + LateK-FP16 | val_bpb=1.1361 #372

Closed

leloykun mentioned this pull request Mar 22, 2026

Invalid submissions due to information leakage during TTT #402

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

11L Int5-MLP + TTT-SGD + SmearGate + SWA (1.1455 BPB)#264

11L Int5-MLP + TTT-SGD + SmearGate + SWA (1.1455 BPB)#264
stukenov wants to merge 1 commit intoopenai:mainfrom
stukenov:submission/11L-int5-ttt

stukenov commented Mar 20, 2026

Uh oh!

mohosy commented Mar 21, 2026

Uh oh!

stukenov commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stukenov commented Mar 20, 2026

Summary

Techniques

Architecture

Results

Test plan

Uh oh!

mohosy commented Mar 21, 2026

Uh oh!

stukenov commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants