Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) on 11L Production Stack by anantdgoel · Pull Request #487 · openai/parameter-golf

anantdgoel · 2026-03-23T01:13:29Z

val_bpb: 1.1720 | 19.4 MB (unlimited compute) | 1xA6000, 9500 steps, 14.5hr

Summary

Value Residual (ResFormer, arXiv:2410.17897): caches layer-0 V vectors, mixes into subsequent layers via learnable scalars. -0.015 BPB, 22 params added.
Gated Attention (arXiv:2505.06708): per-head sigmoid gate after SDPA, eliminates attention sinks. -0.003 BPB, ~37K params added.
Techniques stack additively (-0.0172 combined), validated via controlled ablation on 9L baseline.
Full community meta-stack: 11L MLP3x + SmearGate + BigramHash(2048) + OrthoInit + WD0.04 + XSA(4) + EMA(0.997) + Partial RoPE + LN Scale + Logit Softcap.
Both techniques independently adopted by 5+ community submissions, including a record-tier entry (1.1101 BPB).

Ablation (9L v1024, 1000 steps, 131K batch, 1x3090)

Config	val_bpb	Delta
Control	1.4697	—
+ Gated Attention	1.4665	-0.0032
+ Value Residual	1.4546	-0.0151
+ Both	1.4525	-0.0172

Production Results

Metric	Value
Pre-quant val_bpb	1.1710
Post-quant val_bpb	1.1720
Quant gap	0.0010
Artifact	19.4 MB

Files

README.md — full writeup with ablations and reproducibility command
submission.json — metadata
train_gpt.py — training script
train.log — complete training log

…) on 11L production stack Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB…

be5355d

…) on 11L production stack Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 23, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) on 11L Production Stack#487

Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) on 11L Production Stack#487
anantdgoel wants to merge 1 commit intoopenai:mainfrom
anantdgoel:non-record-vr-ga-production

anantdgoel commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anantdgoel commented Mar 23, 2026

Summary

Ablation (9L v1024, 1000 steps, 131K batch, 1x3090)

Production Results

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant