Record: Int6 + MLP 3x + STE QAT + NorMuon + sliding window (val_bpb 1.1666) by abhishekgahlot2 · Pull Request #137 · openai/parameter-golf

abhishekgahlot2 · 2026-03-19T22:49:06Z

Int6 mixed quantization with STE fake-int6 QAT, 3x MLP expansion, NorMuon optimizer, SWA checkpoint averaging, and sliding window eval.

what changed

MLP 3x expansion (hidden=1536): 21.8M params. Extra capacity paid for by int6 quantization.

STE fake-int6 QAT: weights fake-quantized to int6 via straight-through estimator throughout training. Reduces quantization penalty from ~0.008 to ~0.001 BPB.

NorMuon optimizer: per-neuron row-wise RMS normalization after Newton-Schulz orthogonalization.

SWA checkpoint averaging: collects checkpoints every 200 steps during warmdown and averages them.

Mixed quantization: int6 per-row on MLP and attention weights, fp16 passthrough for tied embedding, zstd-22 compression.

Sliding window eval (stride=64): each token scored with nearly full context.

seq_len=2048, batch=786K, grad_clip=0.3, matrix_lr=0.02, Muon momentum=0.99, Muon WD=0.01, warmdown=3000 iters, logit softcap=15.

results

8xH100 80GB HBM3 (Modal, 10 min wallclock, seed 1337):

metric	val_loss	val_bpb	artifact
pre-quant	2.007	1.1887	—
post-quant (standard)	2.0055	1.1877	15.22 MB
post-quant (sliding window)	1.9697	1.1666	15.22 MB

6,065 steps at 98.9ms/step. Quant loss: 0.001 BPB. Sliding window eval: 156s.

test plan

8xH100 SXM 80GB, 10 min wallclock
final_mixed_roundtrip_exact val_bpb:1.18774689
final_sliding_window_exact val_bpb:1.16658140
Artifact: 15,216,221 bytes (under 16MB)
Full training log included
Additional seed runs for p<0.01

….1666)

Record: Int6 + MLP 3x + STE QAT + NorMuon + sliding window (val_bpb 1…

b6ee3f7

….1666)

notapplica mentioned this pull request Mar 19, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

mrdavtan mentioned this pull request Mar 20, 2026

Non-record: QAT ablation — int8 QAT overhead exceeds quantization gap recovery #145

Closed

4 tasks

Merge branch 'openai:main' into record/int6-qat-normuon

40bd184

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Int6 + MLP 3x + STE QAT + NorMuon + sliding window (val_bpb 1.1666)#137

Record: Int6 + MLP 3x + STE QAT + NorMuon + sliding window (val_bpb 1.1666)#137
abhishekgahlot2 wants to merge 2 commits intoopenai:mainfrom
abhishekgahlot2:record/int6-qat-normuon

abhishekgahlot2 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abhishekgahlot2 commented Mar 19, 2026

what changed

results

test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant