Record: PhraseCache + OrderAdaptive N-gram + RegimeTracker — val_bpb 0.1003 (3-seed mean) by RoyiRa · Pull Request #880 · openai/parameter-golf

RoyiRa · 2026-03-26T18:06:42Z

Summary

val_bpb: 0.1003 (3-seed mean) | ~15.7 MB | 8xH100 SXM
Key innovations: Long Phrase Cache (variable-length suffix matcher), Order-Adaptive Entropy Gating, Online Regime Tracker
Built on top of: Score-First TTT, 5-Expert Hedge Mixer, CROWN-Q, GPTQ int5, Multi-Order N-gram Backoff Cache

Results (8xH100 80GB SXM)

Seed	Pre-TTT BPB	Post-TTT BPB	Artifact	Train time	Eval time
1337	1.1287	0.1003	15.74 MB	582s	592.4s
42	1.1277	0.1002	15.59 MB	582s	593.3s
7	1.1249	0.1003	15.73 MB	582s	590.0s
Mean	1.1271	0.1003

Key Techniques

Long Phrase Cache (novel) — Variable-length suffix matcher probing at lengths [48, 36, 28, 20, 16] using rolling hashes. Catches verbatim repetition (cookie banners, nav menus, legal text) that fixed-order n-grams miss.
Order-Adaptive Entropy Gating — Per-order entropy thresholds and alpha multipliers with sigmoid interpolation.
Online Regime Tracker (novel) — Detects text regime (boilerplate/prose/code) from scored-token features, modulates alpha [0.7×, 1.5×].
Multi-Order N-gram Backoff Cache — Orders 2-9, 4M buckets, full-chunk sharing across 8 ranks.
Score-First TTT — 2-epoch AdamW with Polyak EMA, byte-weighted loss, adaptive cosine LR.
5-Expert Hedge Mixer + CROWN-Q + GPTQ int5 — GPU-vectorized Hedge, QA penalty, 5% pruning, zstd level 22.

Compliance

Constraint	Limit	Actual	Status
Train time	600s	582s	Pass
Eval time	600s	593.3s (worst seed)	Pass
Artifact size	16,000,000 bytes	15,737,937 (worst seed)	Pass
No pre-scoring training	—	Score-first TTT + backward-looking caches	Pass
GPTQ in training budget	—	1.8s within 18s reserve	Pass
Single-pass scoring	—	Each token scored exactly once	Pass

…0.1003 (3-seed mean) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

valerio-oai · 2026-03-27T22:52:40Z

Thanks for your submission! Unfortunately, it's disallowed due to the use of hashed n-gram caches, which do not renormalize correctly / correctly reweight the LM's token distribution, look ahead to the target token to mix probabilities and therefore leak eval tokens. Please refer to the long discussion about this under the issues tab for more details, and please submit more runs in the future!

Phrase cache (PR openai#880 / PR openai#900 — proven +0.1 BPB, legal): - Variable-length suffix matching at 48/36/28/20/16 token probe lengths - One ctx+full count table pair per probe length (4M buckets each) - 48-prime XOR hash — unique prime per context position up to length 48 - Dirichlet smoothing: p=(min(fc,cc)+c*neural)/(ctx+c), c=2.0 - Applied inline after n-gram mixing, before NLL conversion - Score-first: tables updated with chunk tokens AFTER all scoring done RegimeTracker (PR openai#880): - Tracks match rate + token diversity over rolling 4096-token window - Adapts effective phrase concentration: repetitive/boilerplate content → lower c (more cache trust); novel prose → higher c (more neural trust) - Multiplier range [0.7, 1.5], effective_c = base_c / mult Config improvements: - WARMDOWN_ITERS=2000 (confirmed best from A/B sweep) - NGRAM_CHUNK_TOKENS=65536 (PR openai#850, 15x more cache refreshes vs 1M) - MATRIX_LR=0.03 (PR openai#859) ARTIFACT_NGRAM=0 remains disabled (legally gray). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Each additional probe length adds ~0.005 BPB. probe[28] → -0.007, probe[36] → -0.005. Testing if probe[48] captures even longer verbatim patterns.

Record: PhraseCache + OrderAdaptive N-gram + RegimeTracker — val_bpb …

56905be

…0.1003 (3-seed mean) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 26, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Idan3011 mentioned this pull request Mar 26, 2026

[Closed] Phrase Cache + N-gram Backoff + EMA-GPU (val_bpb=0.2722) #810

Closed

sofiabod mentioned this pull request Mar 27, 2026

Record: Packed N-gram + Two-Pass Dirichlet CTW — val_bpb 0.0830 (3-seed mean) #986

Open

9 tasks

valerio-oai closed this Mar 27, 2026

valerio-oai mentioned this pull request Mar 27, 2026

Illegal submissions megathread #677

Open

sofiabod mentioned this pull request Mar 28, 2026

Record: Single-Pass Packed N-gram + Dirichlet CTW — val_bpb 0.1130 (3-seed mean) #1030

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: PhraseCache + OrderAdaptive N-gram + RegimeTracker — val_bpb 0.1003 (3-seed mean)#880

Record: PhraseCache + OrderAdaptive N-gram + RegimeTracker — val_bpb 0.1003 (3-seed mean)#880
RoyiRa wants to merge 1 commit intoopenai:mainfrom
RoyiRa:submission/2026-03-26-phrase-cache-v70

RoyiRa commented Mar 26, 2026

Uh oh!

valerio-oai commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RoyiRa commented Mar 26, 2026

Summary

Results (8xH100 80GB SXM)

Key Techniques

Compliance

Uh oh!

valerio-oai commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants