Skip to content

11L Int4 MLP QAT + BigramHash(10240) + SWA#314

Open
aravhawk wants to merge 1 commit intoopenai:mainfrom
aravhawk:11L-int4-mlp-qat
Open

11L Int4 MLP QAT + BigramHash(10240) + SWA#314
aravhawk wants to merge 1 commit intoopenai:mainfrom
aravhawk:11L-int4-mlp-qat

Conversation

@aravhawk
Copy link
Copy Markdown

Summary

11 layer transformer using int4 quantization aware training (QAT) for MLP weights, building on @thwu1's SOTA (10L int5 MLP, 1.14276 bpb). Switching MLP weights from int5 to int4 with STE fake quantization saves ~2MB of compressed artifact space, funding an 11th layer within the 16MB budget.

Core changes from SOTA (7 targeted modifications):

  1. num_layers 10 to 11
  2. MLP quantization clip range 15 to 7 (int5 to int4)
  3. QAT via STE fake quantization in CastedLinear.forward() for MLP layers only
  4. FP16 keep pattern adjusted for 11 layers (blocks.9 instead of blocks.8)
  5. warmdown_iters 3000 to 2500 (fewer total steps with deeper model)
  6. Differentiated magnitude pruning: 5% for MLP (int4 benefits more from zeros), 3% for attention
  7. New fake_quantize_per_row function for STE QAT

All SOTA innovations preserved: SmearGate, BigramHash(10240), orthogonal init, U-Net skip connections, SWA (start_frac=0.4), sliding window eval (stride=64), zstd 22 compression.

Architecture

SOTA (thwu1) This submission
Layers 10 11
MLP quant int5 (clip 15) int4 (clip 7) + QAT
Attn quant int6 (clip 31) int6 (clip 31)
MLP pruning 3% 5%
Warmdown 3000 2500
Est. artifact ~15.9MB ~15.6MB

Expected outcome

Conservative: 1.135 to 1.139 bpb (vs SOTA 1.1428). The extra layer adds ~0.004 to 0.008 bpb, while QAT minimizes int4 degradation. Pending 3 seed eval on 8xH100.

Notes

QAT pattern validated against existing submission in PR #162 (MLP3x QAT Int6 SlidingWindow), confirming torch.compile(fullgraph=True) compatibility. Script is 1252 lines (under 1500 limit), syntax verified.

Adds 11th transformer layer funded by int4 MLP quantization savings,
with STE quantization aware training. Built on thwu1 SOTA.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant