Record: VRL + Full GPTQ + 5-gram Cache + Hidden-State kNN-LM (3-seed mean val_bpb=1.0970) by gowtham0992 · Pull Request #738 · openai/parameter-golf

gowtham0992 · 2026-03-25T16:25:31Z

Summary

3-seed mean val_bpb: 1.0970 (std 0.0006) | ≤15.74 MB | 8×H100 SXM, 598s training | No TTT

Key Innovations

Hidden-State kNN-LM (novel — first in competition): Stores 512-dim hidden states from already-scored tokens in a GPU ring buffer. For uncertain tokens, finds k=32 nearest neighbors by L2 distance and builds a non-parametric distribution via RBF kernel. Based on Khandelwal et al. 2019 (ICLR 2020). Captures semantic repetition that n-grams miss. Additive -0.007 BPB on top of n-gram cache.

Online 5-gram cache with adaptive lambda: Backward-looking n-gram cache with backoff. Pre-committed confidence gate (no safety gate / oracle selection per #677 ruling). Adaptive lambda scales mixing weight by model uncertainty.

Results

Seed	Post-cache bpb	Artifact
42	1.0976	15.68 MB
1337	1.0965	15.74 MB
2024	1.0970	15.55 MB
Mean	1.0970

vs SOTA (1.1194): improvement = 0.0224 nats

Compliance (per #677)

GPTQ calibration inside 600s training budget (total_train_time ~598s in all logs)
No safety gate / oracle selection — pre-committed confidence gate
No training data accessed at eval time
N-gram + kNN caches strictly backward-looking
All artifacts under 16MB, all eval under 600s

Reproduction

pip install sentencepiece zstandard 
pip install flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291 --no-deps 
python3 data/cached_challenge_fineweb.py --variant sp1024 
SEED=42 python3 -m torch.distributed.run --nproc_per_node=8 train_gpt.py



## Credits

- kNN-LM: Khandelwal et al. 2019 (ICLR 2020)
- 5-gram cache concept: PR #659 by @deanbrr (our implementation uses pre-committed gate)
- VRL: arxiv:2410.17897, PR #569 by @gowtham0992
- Full GPTQ: IST-DASLab/gptq (ICLR 2023)
- LeakyReLU²: PR #493 by @parinzee
- Base stack: PR #414 by @signalrush

…mean val_bpb=1.0970)

- Submission train_gpt.py with all 32 techniques from the execution plan, each gated by environment variables (disabled by default) - Optuna-based search framework with validate mode (per-technique smoke test) and search mode (TPE over joint technique + model size space) - Ablation infrastructure (ablation.py, shell scripts) for tracking experiments - PR source files for reference (openai#505, openai#569, openai#576, openai#727, openai#738) - Execution plan document Techniques span architecture (activations, HybridNorm, SmearGate, DiffAttn, PoPE, WaveletGPT, VGA, XSA), training (EMA, SWA, QAT, MTP), quantization (variable bit-width, OptRot, GPTQ, pruning, entropy coding), and eval-time (TTT-LoRA, n-gram cache, kNN-LM, TurboQuant KV compression). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gowtham0992 mentioned this pull request Mar 25, 2026

Record: 11L VRL + LeakyReLU² + Full GPTQ (3-seed mean val_bpb=1.1175) #569

Closed

gowtham0992 force-pushed the submission/VRL-FullGPTQ-NgramKNN-1.0970 branch from e48d96e to e2b0749 Compare March 25, 2026 16:31

Record: VRL + Full GPTQ + 5-gram Cache + Hidden-State kNN-LM (3-seed …

c458cef

…mean val_bpb=1.0970)

gowtham0992 force-pushed the submission/VRL-FullGPTQ-NgramKNN-1.0970 branch from e2b0749 to c458cef Compare March 25, 2026 16:34

notapplica mentioned this pull request Mar 25, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

gowtham0992 closed this Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: VRL + Full GPTQ + 5-gram Cache + Hidden-State kNN-LM (3-seed mean val_bpb=1.0970)#738

Record: VRL + Full GPTQ + 5-gram Cache + Hidden-State kNN-LM (3-seed mean val_bpb=1.0970)#738
gowtham0992 wants to merge 1 commit intoopenai:mainfrom
gowtham0992:submission/VRL-FullGPTQ-NgramKNN-1.0970

gowtham0992 commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gowtham0992 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Innovations

Results

Compliance (per #677)

Reproduction

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gowtham0992 commented Mar 25, 2026 •

edited

Loading