Record Submission: 1.1078 BPB — XSA6 + BigramHash4K on Hedge Mixer Stack#720
Closed
agalimova wants to merge 1 commit intoopenai:mainfrom
Closed
Record Submission: 1.1078 BPB — XSA6 + BigramHash4K on Hedge Mixer Stack#720agalimova wants to merge 1 commit intoopenai:mainfrom
agalimova wants to merge 1 commit intoopenai:mainfrom
Conversation
Built on PR openai#700 with hyperparameter improvements found via autoresearch-multi combinatorial search: - XSA_LAST_N=6 (extended from 4 to 6 layers) - BIGRAM_VOCAB_SIZE=4096 (doubled from 2048) 3-seed mean: 1.1078 (std 0.0045) Seeds: 42=1.1045, 1337=1.1061, 2025=1.1129 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 26, 2026
dexhunter
added a commit
to dexhunter/parameter-golf
that referenced
this pull request
Mar 27, 2026
Built on PR openai#720 by @agalimova. Novel TTT recipe: - Per-layer LR groups (3x proj, 0.5x fc) - Cosine LR schedule within TTT - 4 epochs (vs 3), freeze 1 block (vs 2) - Skip sliding eval to reclaim time for extra epoch 3-seed results: Seed 1337: 1.0726 BPB (537s eval) Seed 42: 1.0635 BPB (546s eval) Seed 2025: 1.0806 BPB (531s eval) Mean: 1.0722 ± 0.009 All seeds: train <600s, eval <600s, artifact <16MB. Beats merged SOTA (1.1194) by 0.047.
5 tasks
dexhunter
added a commit
to dexhunter/parameter-golf
that referenced
this pull request
Mar 27, 2026
Built on PR openai#720 by @agalimova. Key change: SGD TTT (lr=0.002, momentum=0.9) replaces AdamW, producing -0.041 BPB improvement. 3-seed results: Seed 1337: 1.0312 BPB (540s eval) Seed 42: 1.0503 BPB (533s eval) Seed 2025: 1.0535 BPB (544s eval) Mean: 1.0450 ± 0.012 All seeds: train <600s, eval <600s, artifact <16MB. Score-first legal TTT + backward-looking HedgeMixer.
dexhunter
added a commit
to dexhunter/parameter-golf
that referenced
this pull request
Mar 27, 2026
Built on PR openai#720 by @agalimova. Key improvement: momentum 0.95 (vs 0.9) reduces variance and improves mean by 0.009 BPB. 3-seed results: Seed 1337: 1.0302 BPB (513s eval) Seed 42: 1.0365 BPB (533s eval) Seed 2025: 1.0419 BPB (539s eval) Mean: 1.0362 ± 0.006 Validated via comprehensive hyperparameter sweep: LR: 0.001/0.002/0.003 → 0.002 optimal Freeze: 0/1/2 → 0 optimal Epochs: 3/4/5 → 4 optimal Per-layer LR: 2x/3x/4x proj → 3x optimal Momentum: 0.9/0.95 → 0.95 optimal
Contributor
|
Thanks for your submission! Unfortunately, it's disallowed due to the use of hashed n-gram caches, which do not renormalize correctly / correctly reweight the LM's token distribution, look ahead to the target token to mix probabilities and therefore leak eval tokens. Please refer to the long discussion about this under the issues tab for more details, and please submit more runs in the future! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes from PR #700
XSA_LAST_NBIGRAM_VOCAB_SIZETest plan
🤖 Generated with Claude Code