Record Submission: 1.1570 BPB - 73.7M Ternary U-Net + NeoMuon + 4x relu²MLP + Factored Tied Emb + Poly5 Softcap + YaRN2048 + 8192BPE + FP8QAT + Bitmask-LZMA + Stride-16 Sliding#640
Merged
0hq merged 1 commit intoopenai:mainfrom Mar 25, 2026
Conversation
… relu² 4xMLP FP8)
Collaborator
|
Really excellent work! |
0hq
approved these changes
Mar 25, 2026
|
This is incredible work! exactly what I hoped would happen when I submitted #139. The factored embedding and FP8 QAT for non-ternary params are really clever. Congrats on the record. |
Mistobaan
pushed a commit
to Mistobaan/parameter-golf
that referenced
this pull request
Mar 25, 2026
… relu² 4xMLP FP8) (openai#640) Co-authored-by: Ciprian-Florin Ifrim <ciprian-florin.ifrim@Ciprians-Mac-Studio-M1-Max.local>
TimS-ml
referenced
this pull request
in TimS-ml/parameter-golf-autoresearch
Mar 26, 2026
… relu² 4xMLP FP8) (#640) Co-authored-by: Ciprian-Florin Ifrim <ciprian-florin.ifrim@Ciprians-Mac-Studio-M1-Max.local>
nedcut
pushed a commit
to nedcut/parameter-golf
that referenced
this pull request
Mar 26, 2026
… relu² 4xMLP FP8) (openai#640) Co-authored-by: Ciprian-Florin Ifrim <ciprian-florin.ifrim@Ciprians-Mac-Studio-M1-Max.local>
3 tasks
nvemuri4649
pushed a commit
to thanushpatlolla/parameter-golf
that referenced
this pull request
Mar 27, 2026
… relu² 4xMLP FP8) (openai#640) Co-authored-by: Ciprian-Florin Ifrim <ciprian-florin.ifrim@Ciprians-Mac-Studio-M1-Max.local>
anish-krishnan
pushed a commit
to anish-krishnan/parameter-golf
that referenced
this pull request
Mar 30, 2026
… relu² 4xMLP FP8) (openai#640) Co-authored-by: Ciprian-Florin Ifrim <ciprian-florin.ifrim@Ciprians-Mac-Studio-M1-Max.local>
theLightArchitect
added a commit
to theLightArchitect/parameter-golf
that referenced
this pull request
Mar 30, 2026
Proper experimental methodology: start from proven base, add ONE thing. 1. config_pr640_exact.sh — ZERO changes from PR openai#640 (control) 2. config_pr640_plus_eval.sh — + T=0.85 + stride=64 (eval only) 3. config_pr640_plus_brotli.sh — + brotli compression + eval tricks Lesson from Config A failure (3.0 BPB): adding AOL + hash + brotli simultaneously with ternary STE caused divergence. Incremental ablation is required to identify which innovations are ternary-compatible. Co-Authored-By: Kevin Tan <kft@lightarchitects.io> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
theLightArchitect
added a commit
to theLightArchitect/parameter-golf
that referenced
this pull request
Mar 30, 2026
Root cause of 1.1851 BPP: config was based on PR openai#640 (ternary) hyperparameters applied to an int6 model. ALL top 5 merged entries use: - SEQ_LEN=2048 (we had 1024) - BATCH=786K (we had 524K) - MUON_MOMENTUM=0.99 (we had 0.95) - NS5=5 steps (we had 3-4) - SWA=1 + EMA=0.997 (we had SWA=0) - Weight decay 0.04 (we had 0.0) Projected improvement: 1.1851 → ~1.14 BPP. Co-Authored-By: Kevin Tan <kft@lightarchitects.io> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Itssshikhar
pushed a commit
to Itssshikhar/parameter-golf
that referenced
this pull request
Mar 31, 2026
… relu² 4xMLP FP8) (openai#640) Co-authored-by: Ciprian-Florin Ifrim <ciprian-florin.ifrim@Ciprians-Mac-Studio-M1-Max.local>
jimezsa
pushed a commit
to jimezsa/parameter-golf
that referenced
this pull request
Apr 2, 2026
… relu² 4xMLP FP8) (openai#640) Co-authored-by: Ciprian-Florin Ifrim <ciprian-florin.ifrim@Ciprians-Mac-Studio-M1-Max.local>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record: 1.1570 BPB — 73.7M Ternary U-Net Transformer
BitNet b1.58 + 10L + NeoMuon + 4x relu² MLP + Factored Tied Embedding + Poly5 Softcap + YaRN 2048 + 8192 BPE + FP8 QAT + Base-3 LZMA + Stride-16 Sliding Eval
val_bpb: 1.1570 (3-seed mean sliding, std 0.0007) | 15.99 MB max artifact | 8×H100 SXM, 599s
The results document linked here and in my repo showcases all methods and sweeps applied to both Binary and Ternary Bitnets, which unfortunately are incompatible with many methods, such as Tversky Layers, EMA, Muon WD, LM Logit Head ranking and many more. Scaling ratios and applicable/rejected techniques can be useful for other submissions too.
Results (3 seeds, 8×H100 SXM)
Architecture
Key Techniques
Architecture
Training
Evaluation
Compression
Setup and Run
Full run command
Compliance