submission: Int6 MLP3x + Late-K Passthrough + SlidingWindow (val_bpb: 1.1605) by takhir-iota · Pull Request #99 · openai/parameter-golf

takhir-iota · 2026-03-19T16:37:56Z

Int6 MLP3x + Late-K Passthrough + SlidingWindow

Summary

This PR is a 10-minute submission for leaderboard placement, not a record claim.

The submitted run is the best under-cap seed on this lane:

seed2025
final_sliding_window_eval_exact stride:64 val_loss:1.95946000 val_bpb:1.16050360
Total submission size quant+zstd: 15,844,924

The lane stacks four practical improvements on the strong 9-layer, 512-dim GPT recipe:

Int6 mixed quantization + zstd: .mlp., .attn.c_q., .attn.c_v., and .attn.proj. are stored in int6, then compressed with zstd.
3x MLP expansion: MLP_MULT=3 keeps the wider hidden layer that materially improves score within the byte budget.
Selective K preservation: blocks.7.attn.c_k.weight and blocks.8.attn.c_k.weight stay in fp16, while the remaining c_k matrices use grouped int8 with group_size=64.
Sliding-window evaluation: EVAL_STRIDE=64 gives near-full context at evaluation time and is the main improvement over the stride-256 variant.

Configuration

VOCAB_SIZE=1024 NUM_LAYERS=9 MODEL_DIM=512 NUM_HEADS=8 NUM_KV_HEADS=4
MLP_MULT=3 TIE_EMBEDDINGS=1
MATRIX_LR=0.02 SCALAR_LR=0.02 TIED_EMBED_LR=0.03
MUON_MOMENTUM=0.99 MUON_MOMENTUM_WARMUP_START=0.92 MUON_MOMENTUM_WARMUP_STEPS=1500
WARMDOWN_ITERS=3000 QK_GAIN_INIT=1.7
TRAIN_BATCH_TOKENS=524288 TRAIN_SEQ_LEN=1024
LOWBIT_BITS=6 LOWBIT_STE=0
LOWBIT_NAME_PATTERNS=.mlp.,.attn.c_q.,.attn.c_v.,.attn.proj.
INT8_KEEP_FLOAT_NAME_PATTERNS=tok_emb.weight,blocks.7.attn.c_k.weight,blocks.8.attn.c_k.weight
INT8_GROUP_OVERRIDES=.attn.c_k.:64
SERIAL_COMPRESSOR=zstd
EVAL_STRIDE=64
MAX_WALLCLOCK_SECONDS=600

Command

torchrun --standalone --nproc_per_node=8 train_gpt.py

Key Metrics

Training stopped at step 12791/20000 due to the 600s wallclock cap
Average training step time: 46.91ms
Model params: 21,778,504
Pre-quant eval: val_loss:2.0047 val_bpb:1.1873
Quantized roundtrip: val_loss:2.01675174 val_bpb:1.19443398
Sliding window (stride=64): val_loss:1.95946000 val_bpb:1.16050360
Sliding-window eval time: 70834ms
Code size: 37,988 bytes
Total submission size: 15,844,924 bytes

Quantization Strategy

The main serializer choice is to spend the remaining bytes on the attention K path rather than on broader fp16 promotion:

tok_emb.weight: fp16 passthrough
blocks.7.attn.c_k.weight and blocks.8.attn.c_k.weight: fp16 passthrough
remaining .attn.c_k. matrices: grouped int8, group_size=64
.mlp., .attn.c_q., .attn.c_v., .attn.proj.: int6
compressor: zstd

This keeps the artifact under 16,000,000 bytes while preserving the highest-value late-layer key projections.

Additional Seeds

The same lane was rerun on two more under-cap seeds:

seed42: val_loss:1.96035715 val_bpb:1.16103494, 15,802,877 bytes
seed4242: val_loss:1.96595032 val_bpb:1.16434753, 15,822,568 bytes

Maintainers can place this submission according to the project’s current eligibility and ranking criteria.

takhir-iota · 2026-03-19T19:42:22Z

The submitted code snapshot is minified because code size counts toward the 16,000,000-byte submission limit. I kept the PR body and submission README detailed so the method is still reviewable, and the included logs/config describe the exact run behavior.

MLP_HIDDEN=1488, 15.93MB. 9918 steps in 570s (57ms/step). LR tuning from PR openai#99: scalar_lr 0.04->0.02, embed_lr 0.05->0.03. Improvement vs baseline: -0.0596 BPB.

Takhir Adilzhanov added 3 commits March 19, 2026 21:37

Add seed2025 top2k stride64 submission

95656d5

Remove over-cap seed1337 log

0186321

Add seed4242 confirmation log

0342280

0hq added valid submission record submission ready for review labels Mar 19, 2026

Refine Top2K submission README

3131670

takhir-iota changed the title ~~submission: Top2K + sliding-window stride64~~ submission: Int6 MLP3x + Late-K Passthrough + SlidingWindow (val_bpb: 1.1605) Mar 19, 2026

rsavitt mentioned this pull request Mar 19, 2026

Record: Int6 MLP3x + STE QAT + Sliding Window (val_bpb=1.1594) #128

Open

5 tasks

notapplica mentioned this pull request Mar 19, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

m0at mentioned this pull request Mar 20, 2026

Int6+zstd MLP1488 + Sliding Window + QAT + Tuned LR (val_bpb=1.1648) #107

Open

5 tasks

cocohearts added does not beat SOTA and removed record submission ready for review labels Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submission: Int6 MLP3x + Late-K Passthrough + SlidingWindow (val_bpb: 1.1605)#99

submission: Int6 MLP3x + Late-K Passthrough + SlidingWindow (val_bpb: 1.1605)#99
takhir-iota wants to merge 4 commits intoopenai:mainfrom
takhir-iota:codex/seed2025-top2k-stride64-submission

takhir-iota commented Mar 19, 2026 •

edited

Loading

Uh oh!

takhir-iota commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

takhir-iota commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Int6 MLP3x + Late-K Passthrough + SlidingWindow

Summary

Configuration

Command

Key Metrics

Quantization Strategy

Additional Seeds

Uh oh!

takhir-iota commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

takhir-iota commented Mar 19, 2026 •

edited

Loading