11L + XSA + VRL + SWA + seq4096 + cross-doc TTT - val_bpb 1.1839 by carlesonielfa · Pull Request #457 · openai/parameter-golf

carlesonielfa · 2026-03-22T20:13:53Z

Stacks several wins on the 11L dim=512 base:

seq_len=4096: long-context training (single largest contributor)
Exclusive Self-Attention (XSA): removes value-aligned component from attention output on deepest 4 layers
Value Residual Learning (VRL): per-layer learnable residual from layer-0 value vectors
SmearGate: learned token-blending gate at embedding layer
SWA: 24 checkpoints averaged from last 40% of warmdown
Cross-doc TTT: rank-8 LoRA adapters trained per document at eval time
Warmdown-QAT: near-zero quantization penalty

Results (seed=1337, 8xH100, 600s):

post-quant (int8+zlib): 1.2192
post-quant + TTT: 1.1839
model size: 15.35 MB

…=1.1839 11 layers, seq_len=4096, Exclusive Self-Attention (deepest 4 layers), Value Residual Learning, SmearGate, SWA (24 ckpts), cross-doc TTT. Post-quant: 1.2192. With TTT: 1.1839. Model size: 15.35 MB. 13137 steps on 8xH100 in 600s.

…, and PR openai#457 analysis Comprehensive analysis of 4 TTC techniques for Parameter Golf: - Sliding window eval (stride<seq_len for better context) - Depth recurrence (shared layers, more loops at eval) - Longer context eval with NTK RoPE scaling - Checkpoint/depth ensemble strategies Includes detailed analysis of PR openai#457's techniques (XSA, VRL, SmearGate, SWA, cross-doc TTT) which achieves 1.1839 BPB. Cross-doc TTT identified as the single biggest TTC win (+0.035 BPB). https://claude.ai/code/session_01M5XTtyz2Zdq5BDeh9qNn9y

… budgets Side-by-side comparison of 4 architectures: - Baseline dense (17.1M, 1.224 BPB) - Enhanced dense with PR#180/openai#457 techniques (~20.3M) - Zero-cost MoE (same params, fewer FLOPs) - Expanded MoE (34M params via int5/int6 compression) Includes ASCII architecture diagrams, per-component parameter budgets, quantization byte accounting, and step speed estimates. https://claude.ai/code/session_01M5XTtyz2Zdq5BDeh9qNn9y

notapplica mentioned this pull request Mar 22, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

11L + XSA + VRL + SWA + seq4096 + cross-doc TTT - val_bpb 1.1839#457

11L + XSA + VRL + SWA + seq4096 + cross-doc TTT - val_bpb 1.1839#457
carlesonielfa wants to merge 1 commit intoopenai:mainfrom
carlesonielfa:submission/2026-03-22_11L_XSA_VRL_SWA_seq4096

carlesonielfa commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlesonielfa commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant