Skip to content

Non-record: QAT Int5/Int6 on #1 architecture (1.14476 BPB)#306

Open
xuafeng wants to merge 1 commit intoopenai:mainfrom
xuafeng:submission/qat-int5-ttt-lora
Open

Non-record: QAT Int5/Int6 on #1 architecture (1.14476 BPB)#306
xuafeng wants to merge 1 commit intoopenai:mainfrom
xuafeng:submission/qat-int5-ttt-lora

Conversation

@xuafeng
Copy link
Copy Markdown

@xuafeng xuafeng commented Mar 21, 2026

Summary

Non-record submission exploring QAT (Quantization-Aware Training) with STE fake-quantization on top of thwu1's #1 entry.

Best result: val_bpb = 1.14476 (seed 1337, 8xH100 SXM, 600s)

Key Finding

Post-training quantization + SWA outperforms QAT by ~0.002 BPB. The quantization noise from int5/int6 post-training quantization acts as beneficial regularization that QAT removes.

Files

  • records/track_non_record_16mb/2026-03-21_QAT_Int5_TTT_LoRA_8xH100/
    • README.md — Detailed writeup with ablations
    • submission.json — Metadata
    • train_gpt.py — Reproducible script

Reproducibility

SEED=1337 TRIGRAM_VOCAB_SIZE=0 torchrun --standalone --nproc_per_node=8 train_gpt.py

🤖 Generated with Claude Code

STE fake-quantization during training (int5 MLP, int6 attn) on top
of thwu1's openai#1 entry. Best result: 1.14476 BPB (seed 1337).

Key finding: post-training quantization + SWA outperforms QAT by
~0.002 BPB — quantization noise acts as beneficial regularization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant