11L + XSA + VRL + SWA + seq4096 + cross-doc TTT - val_bpb 1.1839#457
Open
carlesonielfa wants to merge 1 commit intoopenai:mainfrom
Open
11L + XSA + VRL + SWA + seq4096 + cross-doc TTT - val_bpb 1.1839#457carlesonielfa wants to merge 1 commit intoopenai:mainfrom
carlesonielfa wants to merge 1 commit intoopenai:mainfrom
Conversation
…=1.1839 11 layers, seq_len=4096, Exclusive Self-Attention (deepest 4 layers), Value Residual Learning, SmearGate, SWA (24 ckpts), cross-doc TTT. Post-quant: 1.2192. With TTT: 1.1839. Model size: 15.35 MB. 13137 steps on 8xH100 in 600s.
ThomAub
pushed a commit
to ThomAub/parameter-golf
that referenced
this pull request
Mar 22, 2026
…, and PR openai#457 analysis Comprehensive analysis of 4 TTC techniques for Parameter Golf: - Sliding window eval (stride<seq_len for better context) - Depth recurrence (shared layers, more loops at eval) - Longer context eval with NTK RoPE scaling - Checkpoint/depth ensemble strategies Includes detailed analysis of PR openai#457's techniques (XSA, VRL, SmearGate, SWA, cross-doc TTT) which achieves 1.1839 BPB. Cross-doc TTT identified as the single biggest TTC win (+0.035 BPB). https://claude.ai/code/session_01M5XTtyz2Zdq5BDeh9qNn9y
ThomAub
pushed a commit
to ThomAub/parameter-golf
that referenced
this pull request
Mar 22, 2026
… budgets Side-by-side comparison of 4 architectures: - Baseline dense (17.1M, 1.224 BPB) - Enhanced dense with PR#180/openai#457 techniques (~20.3M) - Zero-cost MoE (same params, fewer FLOPs) - Expanded MoE (34M params via int5/int6 compression) Includes ASCII architecture diagrams, per-component parameter budgets, quantization byte accounting, and step speed estimates. https://claude.ai/code/session_01M5XTtyz2Zdq5BDeh9qNn9y
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks several wins on the 11L dim=512 base:
Results (seed=1337, 8xH100, 600s):