Record: OrthoInit + Int6 MLP3x + BigramHash + SmearGate (val_bpb: 1.1539) by unnir · Pull Request #135 · openai/parameter-golf

unnir · 2026-03-19T22:14:03Z

OrthoInit + Int6 MLP3x + BigramHash + SmearGate

Score: val_bpb = 1.1539 (sliding window, stride=64)

Approach

Six orthogonal improvements stacked on the baseline 9-layer, 512-dim GPT:

1. Orthogonal + muP-scaled Weight Initialization

All large weight matrices initialized with orthogonal init (gain=1.0)
Output projections (attn.proj, mlp.proj) scaled by 1/sqrt(2 * num_layers) following muP
Accelerates early convergence — the model starts closer to a well-conditioned point, giving Muon a head start

2. Int6 Mixed Quantization + zstd-22

Per-row int6 quantization ([-32,31]) on MLP and attention weight matrices
FP16 passthrough for tied embeddings and last 2 layers' Key projections (quantization-sensitive)
zstd level 22 compression (better ratio than zlib-9 on int6 data)

3. 3x MLP Expansion

MLP hidden dimension 1536 (3x model_dim), up from baseline 1024 (2x)
Budget freed by int6 quantization pays for the extra parameters

4. Tuned Optimizer Hyperparameters

matrix_lr=0.02, scalar_lr=0.02, tied_embed_lr=0.03 (halved from baseline)
muon_momentum=0.99 with warmup from 0.92 over 1500 steps
warmdown_iters=3000, grad_clip_norm=0.3
AdamW with weight_decay=0.01 for embedding/scalar params
beta1=0.9, beta2=0.95

5. SmearGate + Bigram Hash Embedding

SmearGate: learned gate blending each token's embedding with the previous token's (~512 params)
Bigram Hash: 4096-bucket hash table (dim=128, projected to 512) injecting token-pair info

6. Training + Evaluation Setup

train_seq_len=2048, train_batch_tokens=786432
Sliding window evaluation with stride=64 at 2048-token windows

Configuration

torchrun --standalone --nproc_per_node=8 train_gpt.py

Key Metrics

Training: 7201 steps in 600s (83.33ms/step)
Model params: 22,368,841
Pre-quant: val_bpb: 1.1696
Int6+zstd roundtrip: val_bpb: 1.1748
Sliding window (stride=64): val_bpb: 1.1539
Artifact: 15,162,375 bytes (under 16MB by 837,625 bytes)

…539)

saml212 · 2026-03-19T22:19:53Z

This is cool. good werk

Stacks XSA (PR openai#265), EMA weight averaging (PR openai#287), Int5-MLP (PR openai#264), MuonWD=0.04 tuned from PR openai#162, seq_len=2048, 11 layers, BigramHash(2048), SmearGate, OrthoInit (PR openai#135), Late-K FP16 on final layer. Single-seed result (seed=1337), ~8903 steps on 8xH100.

Record: OrthoInit + Int6 MLP3x + BigramHash + SmearGate (val_bpb: 1.1…

af51720

…539)

notapplica mentioned this pull request Mar 19, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

dexhunter mentioned this pull request Mar 20, 2026

Community Tool: Parameter Golf Leaderboard Monitor (CLI + Claude Code Skill) #158

Closed

Julz19 mentioned this pull request Mar 20, 2026

Add ContextFuse-2048-BigramSmear submission #174

Open

stukenov mentioned this pull request Mar 20, 2026

11L Int5-MLP + TTT-SGD + SmearGate + SWA (1.1455 BPB) #264

Open

5 tasks

brn-mwai mentioned this pull request Mar 20, 2026

Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence #268

Open

6 tasks

HyperPotatoNeo mentioned this pull request Mar 21, 2026

11L + XSA4 + EMA(0.997) + seq2048 + Int5-MLP + MuonWD=0.04 + LateK-FP16 | val_bpb=1.1361 #372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: OrthoInit + Int6 MLP3x + BigramHash + SmearGate (val_bpb: 1.1539)#135

Record: OrthoInit + Int6 MLP3x + BigramHash + SmearGate (val_bpb: 1.1539)#135
unnir wants to merge 1 commit intoopenai:mainfrom
unnir:v27-submission

unnir commented Mar 19, 2026

Uh oh!

saml212 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

unnir commented Mar 19, 2026

OrthoInit + Int6 MLP3x + BigramHash + SmearGate

Score: val_bpb = 1.1539 (sliding window, stride=64)

Approach

1. Orthogonal + muP-scaled Weight Initialization

2. Int6 Mixed Quantization + zstd-22

3. 3x MLP Expansion

4. Tuned Optimizer Hyperparameters

5. SmearGate + Bigram Hash Embedding

6. Training + Evaluation Setup

Configuration

Key Metrics

Uh oh!

saml212 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants