SmearGate + BigramHash + Int6 + SWA + U-Net Skips (1.1518 BPB) by integrate-your-mind · Pull Request #289 · openai/parameter-golf

integrate-your-mind · 2026-03-21T00:00:32Z

Summary

val_bpb: 1.1518 (int6 sliding window, stride=64, seed 1337)
11-layer GPT, 26.8M params, 15.2MB artifact (int6+zstd-22)
Trained in 600s on 8×H100 SXM (9,906 steps at 60.6ms/step)

Key Techniques

Per-row int6 quantization (MLP + attention) + zstd-22 compression
3× MLP expansion (hidden=1536) with relu² activation
SmearGate: learned token-predecessor blending at input
BigramHash embedding: 2048-bucket hash table (dim=128) for token-pair context
U-Net skip connections: encoder→decoder with learned per-dimension weights
Muon optimizer with WD=0.04, momentum warmup 0.92→0.99
SWA: 7 snapshots every 200 steps during warmdown
Sliding window eval (stride=64) as primary score
TTT LoRA eval also included (1.1535 BPB)

Eval Results (seed 1337)

Method	val_loss	val_bpb	eval_time
Pre-quantization	1.9841	1.1751	—
Int6 roundtrip	2.0027	1.1861	1.9s
Int6 sliding (stride=64)	1.9448	1.1518	97s
Int6 TTT LoRA	1.9476	1.1535	83s

Submission Checklist

Artifact < 16MB (15,202,515 bytes)
Trains in < 10 min on 8×H100 (600s)
Eval < 10 min (sliding 97s + TTT 83s = 180s)
train_gpt.py compiles (1191 lines, under 1500 limit)
README.md with technique descriptions
submission.json with metadata
Training log included

Note on Seeds

Single seed submission. The improvement margin over the current #3 (1.1502) is modest, but the submission includes independently developed techniques (U-Net skips, seq_len=1024 tradeoff for more layers/steps) that may be of interest to the community. Additional seeds available on request.

Differences from Existing Submissions

Developed independently from PR #162 (raahilshah). Key architectural differences:

11 layers (vs 9) with seq_len=1024 (vs 2048) — more layers + more steps
U-Net skip connections with learned weights
BigramHash 2048 buckets (vs 4096)
SWA every 200 steps (vs 50)
Includes TTT LoRA as alternative eval

…l_bpb=1.1518) 11-layer GPT with per-row int6 quantization + zstd-22 compression (15.2MB artifact). Key techniques: SmearGate, BigramHash(2048), 3x MLP with relu², U-Net skip connections, Muon WD=0.04, SWA (7 snapshots), sliding window eval (stride=64). Seed 1337: val_bpb=1.1518 (sliding), 1.1861 (roundtrip), 1.1535 (TTT LoRA). Trained in 600s on 8xH100 SXM, 9906 steps at 60.6ms/step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector · 2026-03-21T00:00:38Z

Note

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

notapplica mentioned this pull request Mar 21, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Christopher-Lee-McClendon mentioned this pull request Mar 24, 2026

Non-Record: BPB 1.1334 — 7000-Step Training + Mixed Int6/Int8 Quantization + Legal TTT #598

Open

This was referenced Mar 25, 2026

Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean) #728

Closed

Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean) #1019

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SmearGate + BigramHash + Int6 + SWA + U-Net Skips (1.1518 BPB)#289

SmearGate + BigramHash + Int6 + SWA + U-Net Skips (1.1518 BPB)#289
integrate-your-mind wants to merge 1 commit intoopenai:mainfrom
integrate-your-mind:submission/2026-03-20_SmearGate_SwiGLU_Int6

integrate-your-mind commented Mar 21, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

integrate-your-mind commented Mar 21, 2026

Summary

Key Techniques

Eval Results (seed 1337)

Submission Checklist

Note on Seeds

Differences from Existing Submissions

Uh oh!

chatgpt-codex-connector bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants