Record: CROWN-Q + GPTQ + Legal TTT — val_bpb 1.1174 (3-seed mean) by EthanYangTW · Pull Request #1129 · openai/parameter-golf

EthanYangTW · 2026-03-30T09:49:05Z

Summary

11L GQA + XSA-all + full Cholesky GPTQ + score-first AdamW TTT. Sqrt cooldown schedule holds LR higher during warmdown, improving post-quantization TTT quality.

val_bpb: 1.1174 (3-seed mean, std 0.0004)

Results

Seed	TTT BPB	Artifact
1337	1.1170	15,961,751 bytes
42	1.1176	15,850,151 bytes
7	1.1176	15,844,080 bytes
Mean	1.1174
Std	0.0004

Key Techniques

Full Cholesky GPTQ: Hessian-aware quantization with act-order, self-generated calibration (no val data).
Score-first AdamW TTT: Each token scored before any gradient update using it. Last 2 blocks unfrozen (4.7M/27M params), 3 epochs.
Sqrt cooldown: sqrt(x) schedule during warmdown instead of linear. Holds LR higher longer.
CROWN-Q: Curvature-weighted quantization variance penalty during warmdown.

Architecture

11L, 512d, GQA 8/4, MLP 3x relu²
XSA on all 11 layers, BigramHash 2048
Partial RoPE 16/64, SmearGate + OrthoInit
EMA 0.997, SWA, Late QAT at 50% warmdown
26.99M params, int6 + zstd-22

Timing

Training: 600s wallclock (8xH100 SXM), 89ms/step (FA3 Hopper), ~6650 steps
Eval: sliding window stride=32 (~150s) + TTT 3 epochs (~460s) ≈ 610s

Compliance

Training ≤ 600s wallclock
GPTQ calibration: self-generated tokens (no validation data used)
TTT: legal score-first (every token scored before any gradient update)
All artifacts < 16,000,000 bytes

…174 (3-seed mean)

Copilot

Pull request overview

Adds a new Track “10min/16MB” record bundle (V38) that captures a 3-seed run and the corresponding end-to-end training/quantization/eval script implementing sqrt warmdown, CROWN-Q, GPTQ, and post-quant score-first TTT.

Changes:

Added V38 training script (train_gpt.py) implementing sqrt cooldown warmdown, CROWN-Q penalty, full Cholesky GPTQ quantization, and post-quant score-first TTT sliding-window eval.
Added per-seed training logs and consolidated submission metadata for the 3-seed result.
Added a README describing architecture, training, quantization/eval, and compliance claims for the record.

Reviewed changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
records/track_10min_16mb/2026-03-30_V38_SqrtCooldown_3seed/train_gpt.py	New V38 end-to-end script (train → average → quantize → eval/TTT) used to generate the record.
records/track_10min_16mb/2026-03-30_V38_SqrtCooldown_3seed/train_seed1337.log	Seed 1337 run log capturing training, quantization, and TTT eval outputs.
records/track_10min_16mb/2026-03-30_V38_SqrtCooldown_3seed/train_seed42.log	Seed 42 run log capturing training, quantization, and TTT eval outputs.
records/track_10min_16mb/2026-03-30_V38_SqrtCooldown_3seed/train_seed7.log	Seed 7 run log capturing training, quantization, and TTT eval outputs.
records/track_10min_16mb/2026-03-30_V38_SqrtCooldown_3seed/submission.json	Submission metadata (aggregate + per-seed metrics and sizes).
records/track_10min_16mb/2026-03-30_V38_SqrtCooldown_3seed/README.md	Human-readable explanation/results/compliance notes for the record.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T09:54:18Z