Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179) by EthanYangTW · Pull Request #545 · openai/parameter-golf

EthanYangTW · 2026-03-23T15:58:50Z

Summary

33.6M parameter model quantized to int5 with GPTQ error compensation, fitting under 16MB. First submission to achieve int5 quantization on a 33.6M model within the artifact size limit.

Architecture: 11L, 512d, MHA 8/8, MLP 3.5x (1792), BigramHash 8192, XSA all layers
Quantization: int5 per-row GPTQ (clip_range=15) + Early QAT (threshold 0.5) + EMA 0.997
TTT: Legal score-first AdamW, chunk=131072, last 2 blocks unfrozen

Results

Seed	Sliding BPB	TTT BPB	Artifact
1337	1.1244	1.1170	15.53 MB
42	1.1249	1.1182	15.36 MB
7	1.1250	1.1184	15.28 MB
Mean	1.1248	1.1179

Logs

See log files in logs/ directory:

Seed 1337: de843ef6-d0df-4872-bc96-cd4600614348.txt
Seed 42: b6560b60-85f0-4623-be20-ec366dd9e6fb.txt
Seed 7: c1c18644-a5a1-4db2-8641-31900ed8057f.txt

Reproduction

pip install --break-system-packages zstandard
pip install --break-system-packages flash-attn --no-build-isolation
python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 80

SEED=1337 PRUNE_PCT=0.02 TTT_EPOCHS=3 TTT_LR=0.0001 \
TTT_OPTIMIZER=adamw TTT_FREEZE_BLOCKS=2 TTT_CHUNK_TOKENS=131072 \
EVAL_STRIDE=32 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

33.6M params (MHA 8/8, BigramHash 8192, MLP 3.5x) quantized to int5 with GPTQ error compensation. Artifact fits under 16MB (15.3-15.5MB). Seeds: 1337 (1.1170), 42 (1.1182), 7 (1.1184)

Seed 1337: de843ef6 (TTT 1.1170) Seed 42: b6560b60 (TTT 1.1182) Seed 7: c1c18644 (TTT 1.1184)

seed1337.log - TTT 1.1170 seed42.log - TTT 1.1182 seed7.log - TTT 1.1184

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

cocohearts · 2026-03-23T17:34:12Z

dont open a pr until ur submission is ready pls; mark as draft until ready for submission
train logs are great but dont change train_gpt.py pls put everything in a self contained folder following other submissions
will merge once this is done

Add proper /records submission with submission.json, README, train_gpt.py, and 3-seed logs.

EthanYangTW · 2026-03-23T23:07:44Z

Closing in favor of properly formatted /records submission.

EthanYangTW · 2026-03-23T23:07:58Z

@cocohearts

dont open a pr until ur submission is ready pls; mark as draft until ready for submission train logs are great but dont change train_gpt.py pls put everything in a self contained folder following other submissions will merge once this is done

sorry for the inconvience,i have updated the new one and closed this

- Replace relu().square() with leaky_relu(0.5).square() in MLP Expected: -0.0015 BPB (5 independent teams confirm) - Switch TTT optimizer from SGD to AdamW(lr=0.0005, wd=0, betas=0.9/0.95) Expected: stronger TTT adaptation per openai#442/openai#503/openai#545

EthanYangTW added 2 commits March 23, 2026 23:50

int5 GPTQ + 33.6M model: 3-seed mean ~1.1179 BPB

20414e0

33.6M params (MHA 8/8, BigramHash 8192, MLP 3.5x) quantized to int5 with GPTQ error compensation. Artifact fits under 16MB (15.3-15.5MB). Seeds: 1337 (1.1170), 42 (1.1182), 7 (1.1184)

Add 3-seed training logs

3df3c5e

Seed 1337: de843ef6 (TTT 1.1170) Seed 42: b6560b60 (TTT 1.1182) Seed 7: c1c18644 (TTT 1.1184)

Copilot AI review requested due to automatic review settings March 23, 2026 15:58

Fix log files: use actual training output, not source code

bdc6da9

seed1337.log - TTT 1.1170 seed42.log - TTT 1.1182 seed7.log - TTT 1.1184

notapplica mentioned this pull request Mar 23, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Copilot AI reviewed Mar 23, 2026

View reviewed changes

cocohearts mentioned this pull request Mar 23, 2026

Update README leaderboard with merged record submissions #561

Merged

Sarimsaljook mentioned this pull request Mar 23, 2026

Record: 11L XSA4 + Multi-Pass Streaming Score-First Legal TTT (3-seed mean val_bpb=1.0523) #573

Closed

cmcdnd mentioned this pull request Mar 23, 2026

Record: Train Larger, Quantize Harder - 33.6M params + int5 GPTQ / (val_bpb: 1.1164) #576

Closed

Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179)

e159597

Add proper /records submission with submission.json, README, train_gpt.py, and 3-seed logs.

EthanYangTW closed this Mar 23, 2026

0hq mentioned this pull request Mar 25, 2026

Illegal submissions megathread #677

Open

raahilshah mentioned this pull request Mar 26, 2026

Record: 11L XSA-all + Full GPTQ (Budget-Legal) + Parallel Muon + Selective Pruning (val_bpb: 1.1178, 3-seed mean) #634

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179)#545

Record: int5 GPTQ + 33.6M model (3-seed mean val_bpb=1.1179)#545
EthanYangTW wants to merge 4 commits intoopenai:mainfrom
EthanYangTW:submission/int5-gptq-33m

EthanYangTW commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

cocohearts commented Mar 23, 2026 •

edited

Loading

Uh oh!

EthanYangTW commented Mar 23, 2026

Uh oh!

EthanYangTW commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EthanYangTW commented Mar 23, 2026

Summary

Results

Logs

Reproduction

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

cocohearts commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EthanYangTW commented Mar 23, 2026

Uh oh!

EthanYangTW commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cocohearts commented Mar 23, 2026 •

edited

Loading