Record: VRL + Full GPTQ + 5-gram Cache + Hidden-State kNN-LM (3-seed mean val_bpb=1.0970)#738
Closed
gowtham0992 wants to merge 1 commit intoopenai:mainfrom
Closed
Conversation
e48d96e to
e2b0749
Compare
…mean val_bpb=1.0970)
e2b0749 to
c458cef
Compare
MichaelMcCulloch
pushed a commit
to MichaelMcCulloch/parameter-golf
that referenced
this pull request
Mar 29, 2026
- Submission train_gpt.py with all 32 techniques from the execution plan, each gated by environment variables (disabled by default) - Optuna-based search framework with validate mode (per-technique smoke test) and search mode (TPE over joint technique + model size space) - Ablation infrastructure (ablation.py, shell scripts) for tracking experiments - PR source files for reference (openai#505, openai#569, openai#576, openai#727, openai#738) - Execution plan document Techniques span architecture (activations, HybridNorm, SmearGate, DiffAttn, PoPE, WaveletGPT, VGA, XSA), training (EMA, SWA, QAT, MTP), quantization (variable bit-width, OptRot, GPTQ, pruning, entropy coding), and eval-time (TTT-LoRA, n-gram cache, kNN-LM, TurboQuant KV compression). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
3-seed mean val_bpb: 1.0970 (std 0.0006) | ≤15.74 MB | 8×H100 SXM, 598s training | No TTT
Key Innovations
Hidden-State kNN-LM (novel — first in competition): Stores 512-dim hidden states from already-scored tokens in a GPU ring buffer. For uncertain tokens, finds k=32 nearest neighbors by L2 distance and builds a non-parametric distribution via RBF kernel. Based on Khandelwal et al. 2019 (ICLR 2020). Captures semantic repetition that n-grams miss. Additive -0.007 BPB on top of n-gram cache.
Online 5-gram cache with adaptive lambda: Backward-looking n-gram cache with backoff. Pre-committed confidence gate (no safety gate / oracle selection per #677 ruling). Adaptive lambda scales mixing weight by model uncertainty.
Results
vs SOTA (1.1194): improvement = 0.0224 nats
Compliance (per #677)
Reproduction