Record: 11L EMA + Int6 + XSA + LeakyReLU² + Partial RoPE (val_bpb: 1.1309)#493
Open
parinzee wants to merge 1 commit intoopenai:mainfrom
Open
Record: 11L EMA + Int6 + XSA + LeakyReLU² + Partial RoPE (val_bpb: 1.1309)#493parinzee wants to merge 1 commit intoopenai:mainfrom
parinzee wants to merge 1 commit intoopenai:mainfrom
Conversation
…1309) 3-seed validation results: - Seed 42: val_bpb=1.13109, artifact=15,764,564 bytes - Seed 1337: val_bpb=1.13085, artifact=15,626,741 bytes - Seed 2024: val_bpb=1.13067, artifact=15,923,256 bytes - Mean: 1.13087 (std: 0.00017) Key techniques: 11 layers, GQA (8H/4KV), XSA on last 4 layers, LeakyReLU(0.5)², Partial RoPE (16/64), EMA (0.997), int6 quantization, zstd-22 compression, BigramHash(2048,128), warmdown_iters=4500. Built on baseline by @thwu1 (PR openai#180).
sofiabod
added a commit
to sofiabod/parameter-golf
that referenced
this pull request
Mar 23, 2026
- replace relu(x)^2 with leaky_relu(x, 0.5)^2 - PR openai#493 reaches 1.1309 with partial stack using this activation - untried on full openai#414 stack — could give -0.002 to -0.005 BPB - zero param cost, zero speed overhead
mrdavtan
added a commit
to mrdavtan/parameter-golf
that referenced
this pull request
Mar 23, 2026
Key changes from studying PR openai#505 (1.1181) and openai#486 (1.0887): - train_batch_tokens: 524K → 786K (all top entries use this) - bigram_hash_buckets: 4096 → 8192 (PR openai#505 uses 8192, openai#493 uses 10240) - grad_clip_norm: 0.3 → 0.0 (PR openai#505 disables clipping) - Star-ReLU and TrigramHash enabled in all run scripts
abaybektursun
added a commit
to abaybektursun/parameter-golf
that referenced
this pull request
Mar 23, 2026
RoyiRa
added a commit
to RoyiRa/parameter-golf
that referenced
this pull request
Mar 23, 2026
…bpb 1.1178 3-seed mean: 1.1178 BPB (std 0.0005), ~15.75 MB artifact, 8×H100 SXM. Novel contribution: Late Soft-Round QAT — replaces STE identity surrogate with sigmoid soft-round in the backward pass during the final 2% of training, giving bin-aware gradients that settle weights onto int6 grid points. Built on PR openai#414 (base model), PR openai#461 (TTT recipe), PR openai#493 (LeakyReLU²).
6 tasks
This was referenced Mar 25, 2026
Fraser-Greenlee
pushed a commit
to Fraser-Greenlee/parameter-golf
that referenced
this pull request
Mar 25, 2026
- Interleaved draft tokens: soft predictions placed between real tokens for 1-2 token lookahead via standard causal attention - SmearGate and BigramHash naturally gain future context on interleaved seq - Bigram noise curriculum: drafts anneal from GT to realistic noise - Two-pass eval: pass 1 generates drafts, pass 2 refines with interleaving - LeakyReLU(0.5)² activation toggle (free -0.003 BPB from PR openai#493) - W&B logging (opt-in via WANDB_PROJECT env var) - Sweep runner with 13 configs covering baselines, draft variants, and ablations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 25, 2026
6 tasks
srchandrupatla
added a commit
to srchandrupatla/parameter-golf
that referenced
this pull request
Mar 25, 2026
LeakyReLU(0.5)²: preserves negative gradient flow through MLP while maintaining non-negative output. ~0.003 BPB improvement per PR openai#493. Legal TTT (test-time training): at eval time, split val tokens into 32K-token chunks, score each chunk under inference_mode(), then train on the already-scored chunk with SGD. Gives ~0.0025 BPB improvement per PR openai#461. Score-first protocol guarantees no future information leaks into scored tokens.
8 tasks
Mistobaan
pushed a commit
to Mistobaan/parameter-golf
that referenced
this pull request
Mar 25, 2026
TimS-ml
referenced
this pull request
in TimS-ml/parameter-golf-autoresearch
Mar 26, 2026
nedcut
pushed a commit
to nedcut/parameter-golf
that referenced
this pull request
Mar 26, 2026
6 tasks
This was referenced Mar 26, 2026
7 tasks
This was referenced Mar 26, 2026
This was referenced Mar 27, 2026
nvemuri4649
pushed a commit
to thanushpatlolla/parameter-golf
that referenced
this pull request
Mar 27, 2026
4 tasks
5 tasks
anish-krishnan
pushed a commit
to anish-krishnan/parameter-golf
that referenced
this pull request
Mar 30, 2026
Itssshikhar
pushed a commit
to Itssshikhar/parameter-golf
that referenced
this pull request
Mar 31, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
3-Seed Results
Key Changes from Baseline
Run Command
Built on SOTA baseline by @thwu1 (PR #180).