Record: 9L XSA-all + LeakyReLU² + 5-gram eval cache — val_bpb 1.0909 (3-seed mean) by resouer · Pull Request #740 · openai/parameter-golf

resouer · 2026-03-25T16:34:18Z

Summary

3-seed mean val_bpb: 1.0909 (std=0.0011) | 14.7 MB | 8xH100 SXM

Results

Seed	Pre-ngram BPB	Post-ngram BPB	Artifact
1337	1.1700	1.0898	14.68 MB
42	1.1701	1.0909	14.69 MB
7	1.1700	1.0920	14.68 MB
Mean	1.1700	1.0909 (std 0.0011)

Key Techniques

Training (9L/512d, 17.6M params)

9L transformer, 512d, 8H/4KV GQA, MLP 2x, LeakyReLU(0.5)²
XSA (Exclusive Self-Attention) on all 9 layers
SmearGate, BigramHash(4096), OrthoInit, LN Scale, Partial RoPE (25%)
Muon optimizer, seq2048, batch 786K, warmdown 3500
Int8 per-row quantization + zstd-22 (near-zero quant degradation)

Eval: Online 5-gram Cache (-0.079 BPB)

Hashed 5-gram frequency table (4M buckets) from scored tokens
Fixed-weight linear mixing: mixed = 0.8 * p_model + 0.2 * p_ngram
Score-first, backward-looking, no target-aware gating
132s eval time (well within 600s budget)

Reproduce

SEED=1337 NUM_LAYERS=9 MLP_MULT=2 QUANT_BITS=8 GPTQ_ENABLED=0 PRUNE_PCT=0 NGRAM_ENABLED=1 \
  torchrun --nproc_per_node=8 train_gpt.py

Credits

N-gram eval cache concept: @deanbrr (PR #659), @newjordan (PR #674)

…(3-seed) 3-seed validation: seed 1337: val_bpb=1.0898, 14.68 MB seed 42: val_bpb=1.0909, 14.69 MB seed 7: val_bpb=1.0920, 14.68 MB mean: 1.0909 (std=0.0011) Architecture: 9L/512d, XSA-all, LeakyReLU(0.5)², SmearGate, BigramHash(4096), OrthoInit, LN Scale, Partial RoPE. Int8 quantization + zstd-22. Key technique: Online hashed 5-gram eval cache with fixed-weight linear mixing (alpha=0.20). Gives -0.079 BPB improvement at eval time. 132s eval time. Training: 8xH100 SXM, 600s wallclock, ~6900 steps at 87ms/step.

Improved from 1.0909 to 1.0238 with: - Multi-order backoff (orders 2-7) with separate per-order hash tables - Entropy-adaptive alpha: 0.05 + 0.55*sigmoid(2*(H-4.0)) - Alpha=0.40 base (was 0.20) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 25, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

resouer closed this Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 9L XSA-all + LeakyReLU² + 5-gram eval cache — val_bpb 1.0909 (3-seed mean)#740

Record: 9L XSA-all + LeakyReLU² + 5-gram eval cache — val_bpb 1.0909 (3-seed mean)#740
resouer wants to merge 2 commits intoopenai:mainfrom
resouer:submission/9L-XSA-ngram

resouer commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

resouer commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Key Techniques

Reproduce

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

resouer commented Mar 25, 2026 •

edited

Loading