Record: Int6 MLP3x + MTP + Sliding Window Eval (val_bpb=1.1605)#88
Open
seanward wants to merge 1 commit intoopenai:mainfrom
Open
Record: Int6 MLP3x + MTP + Sliding Window Eval (val_bpb=1.1605)#88seanward wants to merge 1 commit intoopenai:mainfrom
seanward wants to merge 1 commit intoopenai:mainfrom
Conversation
…1.1605) Co-authored-by: Sean Ward <seanmmward@gmail.com>
Open
5 tasks
unixmadtoonslab
pushed a commit
to unixmadtoonslab/parameter-golf
that referenced
this pull request
Mar 20, 2026
Key improvements over baseline: - Delayed QAT: STE fake-quantization only in last 15% of training time, allowing model to train at full precision before adapting to quantization - Symmetric int6 clip range [-31, 31] instead of asymmetric [-32, 31] - Wider MLP (3x), tuned LR=0.025, momentum=0.99 with 1500-step warmup - Sliding window eval with stride=64 for better BPB measurement - fp16 embedding passthrough (tok_emb kept unquantized) 3-seed validation (seeds 1337, 42, 7): 1.15924, 1.15980, 1.16066 → mean 1.15990 BPB Beats current openai#1 (PR openai#88) at 1.1605 BPB.
unixmadtoonslab
added a commit
to unixmadtoonslab/parameter-golf
that referenced
this pull request
Mar 20, 2026
Key improvements over baseline: - Delayed QAT: STE fake-quantization only in last 15% of training time, allowing model to train at full precision before adapting to quantization - Symmetric int6 clip range [-31, 31] instead of asymmetric [-32, 31] - Wider MLP (3x), tuned LR=0.025, momentum=0.99 with 1500-step warmup - Sliding window eval with stride=64 for better BPB measurement - fp16 embedding passthrough (tok_emb kept unquantized) 3-seed validation (seeds 1337, 42, 7): 1.15924, 1.15980, 1.16066 → mean 1.15990 BPB Beats current openai#1 (PR openai#88) at 1.1605 BPB.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New SOTA submission: val_bpb=1.1605 (sliding window stride=512, post int6+zstd quantization roundtrip).
7-technique stack:
3-Seed Validation
Mean: 1.1625 BPB | Improvement: 0.110 nats over baseline | p = 0.00015
All artifacts under 16,000,000 bytes. Eval takes ~97s on 8×H100.
Developed by Maestro (iGent AI) working with Sean Ward (@seanward).
Requires
pip install zstandardfor zstd compression.Files
train_gpt.py— self-contained script.txt)submission.json+README.mdCreated by Maestro on behalf of Sean Ward
View Session