Record: 11L Int6 QAT + SmearGate + WD 0.038 (val_bpb=1.1502) by baudrillardsgh0st · Pull Request #192 · openai/parameter-golf

baudrillardsgh0st · 2026-03-20T07:59:19Z

Summary

val_bpb: 1.1502 (single seed 1337), 15.50MB artifact
11-layer GPT with int6 QAT (STE), SmearGate, decoupled Muon WD=0.038
Int6-in-int8 containers + zstd-22 compression
Sliding window eval (stride=64, batch=32)
7,723 steps at 77ms/step on 8×H100 SXM

Key Techniques

11 layers (512 dim, 8 heads, 4 KV heads, MLP 3x) — more depth enabled by int6 compression
Int6 QAT: STE fake quantization during forward pass, nearly eliminates post-quant degradation
SmearGate: Learned gate blending current + previous token embedding (~513 params)
Decoupled Muon WD=0.038: Keeps weights small for better int6 quantization
Int6-in-int8 + zstd-22: 26.5M params compressed to 15.5MB

Test plan

Single seed run (1337) completed
Additional seeds for statistical significance
Artifact under 16MB (15,495,792 bytes)
Training under 10 minutes (600s wall clock)

🤖 Generated with Claude Code

9L 512dim int6 QAT with STE, SmearGate, Muon weight decay 0.01, int6-in-int8 zstd22 compression. 14.77MB artifact, 9706 steps @ 61.8ms/step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

11-layer GPT with int6 QAT, SmearGate, and decoupled Muon weight decay 0.038. Artifact: 15.50MB (int6+zstd-22). Single seed, 7723 steps at 77ms/step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Key improvements over prior submission (openai#192, 1.1502): - Per-dimension SmearGate (sigmoid(Parameter(dim))) vs scalar gate - Stochastic Weight Averaging every 50 steps over last 50% of training - Result: 1.1453 BPB, beating current SOTA (1.1458) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baudrillardsgh0st and others added 2 commits March 19, 2026 23:59

Record: Int6 QAT + SmearGate + Muon WD (val_bpb=1.1669)

164befc

9L 512dim int6 QAT with STE, SmearGate, Muon weight decay 0.01, int6-in-int8 zstd22 compression. 14.77MB artifact, 9706 steps @ 61.8ms/step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Record: 11L Int6 QAT + SmearGate + WD 0.038 (val_bpb=1.1502)

dcac9b5

11-layer GPT with int6 QAT, SmearGate, and decoupled Muon weight decay 0.038. Artifact: 15.50MB (int6+zstd-22). Single seed, 7723 steps at 77ms/step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

notapplica mentioned this pull request Mar 20, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

baudrillardsgh0st mentioned this pull request Mar 20, 2026

Record: 11L Int6 QAT + SmearGate + SWA + SAM: 1.1480 BPB (3-seed mean) #194

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L Int6 QAT + SmearGate + WD 0.038 (val_bpb=1.1502)#192

Record: 11L Int6 QAT + SmearGate + WD 0.038 (val_bpb=1.1502)#192
baudrillardsgh0st wants to merge 2 commits intoopenai:mainfrom
baudrillardsgh0st:submit/11L-qat-smeargate-wd038

baudrillardsgh0st commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

baudrillardsgh0st commented Mar 20, 2026

Summary

Key Techniques

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant