Non-record: QAT Int5/Int6 on #1 architecture (1.14476 BPB) by xuafeng · Pull Request #306 · openai/parameter-golf

xuafeng · 2026-03-21T03:37:30Z

Summary

Non-record submission exploring QAT (Quantization-Aware Training) with STE fake-quantization on top of thwu1's #1 entry.

Best result: val_bpb = 1.14476 (seed 1337, 8xH100 SXM, 600s)

Key Finding

Post-training quantization + SWA outperforms QAT by ~0.002 BPB. The quantization noise from int5/int6 post-training quantization acts as beneficial regularization that QAT removes.

Files

records/track_non_record_16mb/2026-03-21_QAT_Int5_TTT_LoRA_8xH100/
- README.md — Detailed writeup with ablations
- submission.json — Metadata
- train_gpt.py — Reproducible script

Reproducibility

SEED=1337 TRIGRAM_VOCAB_SIZE=0 torchrun --standalone --nproc_per_node=8 train_gpt.py

🤖 Generated with Claude Code

STE fake-quantization during training (int5 MLP, int6 attn) on top of thwu1's openai#1 entry. Best result: 1.14476 BPB (seed 1337). Key finding: post-training quantization + SWA outperforms QAT by ~0.002 BPB — quantization noise acts as beneficial regularization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 21, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: QAT Int5/Int6 on #1 architecture (1.14476 BPB)#306

Non-record: QAT Int5/Int6 on #1 architecture (1.14476 BPB)#306
xuafeng wants to merge 1 commit intoopenai:mainfrom
xuafeng:submission/qat-int5-ttt-lora

xuafeng commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xuafeng commented Mar 21, 2026

Summary

Key Finding

Files

Reproducibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant