11L Int4 MLP QAT + BigramHash(10240) + SWA by aravhawk · Pull Request #314 · openai/parameter-golf

aravhawk · 2026-03-21T06:04:26Z

Summary

11 layer transformer using int4 quantization aware training (QAT) for MLP weights, building on @thwu1's SOTA (10L int5 MLP, 1.14276 bpb). Switching MLP weights from int5 to int4 with STE fake quantization saves ~2MB of compressed artifact space, funding an 11th layer within the 16MB budget.

Core changes from SOTA (7 targeted modifications):

num_layers 10 to 11
MLP quantization clip range 15 to 7 (int5 to int4)
QAT via STE fake quantization in CastedLinear.forward() for MLP layers only
FP16 keep pattern adjusted for 11 layers (blocks.9 instead of blocks.8)
warmdown_iters 3000 to 2500 (fewer total steps with deeper model)
Differentiated magnitude pruning: 5% for MLP (int4 benefits more from zeros), 3% for attention
New fake_quantize_per_row function for STE QAT

All SOTA innovations preserved: SmearGate, BigramHash(10240), orthogonal init, U-Net skip connections, SWA (start_frac=0.4), sliding window eval (stride=64), zstd 22 compression.

Architecture

	SOTA (thwu1)	This submission
Layers	10	11
MLP quant	int5 (clip 15)	int4 (clip 7) + QAT
Attn quant	int6 (clip 31)	int6 (clip 31)
MLP pruning	3%	5%
Warmdown	3000	2500
Est. artifact	~15.9MB	~15.6MB

Expected outcome

Conservative: 1.135 to 1.139 bpb (vs SOTA 1.1428). The extra layer adds ~0.004 to 0.008 bpb, while QAT minimizes int4 degradation. Pending 3 seed eval on 8xH100.

Notes

QAT pattern validated against existing submission in PR #162 (MLP3x QAT Int6 SlidingWindow), confirming torch.compile(fullgraph=True) compatibility. Script is 1252 lines (under 1500 limit), syntax verified.

Adds 11th transformer layer funded by int4 MLP quantization savings, with STE quantization aware training. Built on thwu1 SOTA.

11L Int4 MLP QAT submission (track_10min_16mb)

45f22cb

Adds 11th transformer layer funded by int4 MLP quantization savings, with STE quantization aware training. Built on thwu1 SOTA.

notapplica mentioned this pull request Mar 21, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

11L Int4 MLP QAT + BigramHash(10240) + SWA#314

11L Int4 MLP QAT + BigramHash(10240) + SWA#314
aravhawk wants to merge 1 commit intoopenai:mainfrom
aravhawk:11L-int4-mlp-qat

aravhawk commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aravhawk commented Mar 21, 2026

Summary

Architecture

Expected outcome

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant