Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257) by dannywillowliu-uchi · Pull Request #379 · openai/parameter-golf

dannywillowliu-uchi · 2026-03-22T00:08:47Z

Summary

val_bpb: 1.1257 (sliding window, stride=64) | 8xH100 SXM, 600s

Built on PR #374's SOTA stack with GPTQ-lite: per-layer optimal clip percentile search during int6 quantization.

Novel: GPTQ-lite

Standard int6 quantization uses row-wise absolute max for clipping. GPTQ-lite searches 5 clip percentiles per weight matrix (100%, 99.9%, 99.5%, 99%, 98%) and selects the one minimizing reconstruction error. This reduces quantization degradation at zero training cost.

Metric	Value
Steps	6,733 (89.1ms/step)
Pre-quant val_bpb	1.1417
Sliding window val_bpb (s64)	1.1257

Architecture: 11L, XSA4, Tight SWA, Partial RoPE 16/64, LN Scale, Late QAT, Value Embedding, SmearGate, BigramHash, FA3, int6+zstd-22, WD=0.04.

Full source and experiment history: https://github.com/dannywillowliu-uchi/parameter-golf-entry

Tries 5 clip percentiles (0.9, 0.95, 0.99, 0.999, 0.99999) per row, keeps the one minimizing reconstruction MSE. Zero training cost. Default ON (GPTQ_LITE=1). Inspired by PR openai#379.

From arXiv:2603.09078. Projects out the self-value component from attention output, forcing the network to use contextual information. Applied via GQA-aware zero-alloc view reshape on last 4 of 11 layers. Both top unmerged submissions (PR openai#374 at 1.1246 and PR openai#379 at 1.1260) use XSA as a key technique. Full next-gen stack now includes: 11L, XSA, Partial RoPE 16/64, Late QAT STE, Tight SWA, GPTQ-lite, LN Scale, FA3, SmearGate, BigramHash, int6+zstd, Muon WD, OrthoInit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

original_model.md: - Discard depth recurrence (amplifies quant error 900×, throughput loss) - New direction: eval-time optimization stack (PPM-C + GPTQ-lite) - Document all our experiment results (v3, v4, v4_30m, ringgolf) - Add TTT/XSA interaction findings (PR openai#303: mutually exclusive) - Add PR openai#375 meta-insight (1ms overhead = 0.006 BPB) - 4-phase execution plan targeting PPM-C as original contribution review_pr_records_track_10min_16mb.md: - Add 2026-03-22 update with PRs openai#374, openai#379, openai#390, openai#375, openai#303, openai#363 - New SOTA at 1.1246 (PR openai#374: Tight SWA + VE128) - Document negative results from $500 compute spend (PR openai#375) - Unexplored opportunities: PPM-C, Neural Cache review_records_track_10min_16mb.md: - Add timestamp note (17 records, no changes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Record: 11L GPTQ-lite + Self-Distillation TTT (val_bpb=1.1260)

bb6b90b

notapplica mentioned this pull request Mar 22, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Update: GPTQ-lite only, drop SDTTT (val_bpb=1.1257)

bcd61a1

dannywillowliu-uchi changed the title ~~Record: 11L GPTQ-lite + Self-Distillation TTT (val_bpb=1.1260)~~ Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257) Mar 22, 2026

dentity007 mentioned this pull request Mar 22, 2026

Non-record: 11L XSA4 + EMA + SDTTT (3-seed mean val_bpb=1.1287) #406

Open

5 tasks

This was referenced Mar 23, 2026

Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804) #534

Closed

Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804) #543

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)#379

Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)#379
dannywillowliu-uchi wants to merge 2 commits intoopenai:mainfrom
dannywillowliu-uchi:submission/sdttt-gptq-1.1260

dannywillowliu-uchi commented Mar 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dannywillowliu-uchi commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Novel: GPTQ-lite

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dannywillowliu-uchi commented Mar 22, 2026 •

edited

Loading