Skip to content

Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)#379

Open
dannywillowliu-uchi wants to merge 2 commits intoopenai:mainfrom
dannywillowliu-uchi:submission/sdttt-gptq-1.1260
Open

Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)#379
dannywillowliu-uchi wants to merge 2 commits intoopenai:mainfrom
dannywillowliu-uchi:submission/sdttt-gptq-1.1260

Conversation

@dannywillowliu-uchi
Copy link
Copy Markdown

@dannywillowliu-uchi dannywillowliu-uchi commented Mar 22, 2026

Summary

val_bpb: 1.1257 (sliding window, stride=64) | 8xH100 SXM, 600s

Built on PR #374's SOTA stack with GPTQ-lite: per-layer optimal clip percentile search during int6 quantization.

Novel: GPTQ-lite

Standard int6 quantization uses row-wise absolute max for clipping. GPTQ-lite searches 5 clip percentiles per weight matrix (100%, 99.9%, 99.5%, 99%, 98%) and selects the one minimizing reconstruction error. This reduces quantization degradation at zero training cost.

Metric Value
Steps 6,733 (89.1ms/step)
Pre-quant val_bpb 1.1417
Sliding window val_bpb (s64) 1.1257

Architecture: 11L, XSA4, Tight SWA, Partial RoPE 16/64, LN Scale, Late QAT, Value Embedding, SmearGate, BigramHash, FA3, int6+zstd-22, WD=0.04.

Full source and experiment history: https://github.com/dannywillowliu-uchi/parameter-golf-entry

mrdavtan added a commit to mrdavtan/parameter-golf that referenced this pull request Mar 22, 2026
Tries 5 clip percentiles (0.9, 0.95, 0.99, 0.999, 0.99999) per row,
keeps the one minimizing reconstruction MSE. Zero training cost.
Default ON (GPTQ_LITE=1). Inspired by PR openai#379.
anthony-maio added a commit to anthony-maio/parameter-golf that referenced this pull request Mar 22, 2026
From arXiv:2603.09078. Projects out the self-value component from
attention output, forcing the network to use contextual information.
Applied via GQA-aware zero-alloc view reshape on last 4 of 11 layers.

Both top unmerged submissions (PR openai#374 at 1.1246 and PR openai#379 at 1.1260)
use XSA as a key technique.

Full next-gen stack now includes: 11L, XSA, Partial RoPE 16/64,
Late QAT STE, Tight SWA, GPTQ-lite, LN Scale, FA3, SmearGate,
BigramHash, int6+zstd, Muon WD, OrthoInit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dannywillowliu-uchi dannywillowliu-uchi changed the title Record: 11L GPTQ-lite + Self-Distillation TTT (val_bpb=1.1260) Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257) Mar 22, 2026
rarce added a commit to rarce/parameter-golf that referenced this pull request Mar 22, 2026
original_model.md:
- Discard depth recurrence (amplifies quant error 900×, throughput loss)
- New direction: eval-time optimization stack (PPM-C + GPTQ-lite)
- Document all our experiment results (v3, v4, v4_30m, ringgolf)
- Add TTT/XSA interaction findings (PR openai#303: mutually exclusive)
- Add PR openai#375 meta-insight (1ms overhead = 0.006 BPB)
- 4-phase execution plan targeting PPM-C as original contribution

review_pr_records_track_10min_16mb.md:
- Add 2026-03-22 update with PRs openai#374, openai#379, openai#390, openai#375, openai#303, openai#363
- New SOTA at 1.1246 (PR openai#374: Tight SWA + VE128)
- Document negative results from $500 compute spend (PR openai#375)
- Unexplored opportunities: PPM-C, Neural Cache

review_records_track_10min_16mb.md:
- Add timestamp note (17 records, no changes)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant