Record: Order-13 Full-Rescore N-gram + 11L Int6 GPTQ — val_bpb 0.0939 (3-seed mean) by TimPietrusky · Pull Request #921 · openai/parameter-golf

TimPietrusky · 2026-03-27T01:37:38Z

Record Summary

Final submitted score (full-rescore n-gram): val_bpb 0.09391 (3-seed mean, std 0.00002)

Reference neural score (standard quantized roundtrip): mean val_bpb ~1.124

Hardware/limits: 8xH100 SXM, train ~600s, eval <=600s, max submission size ~15.8 MB.

3-Seed Results

Seed	final val_bpb	eval_s	artifact_bytes
1337	0.09391	247	~15.8MB
42	0.09393	246	~15.8MB
2025	0.09389	250	~15.8MB
Mean	0.09391
Std	0.00002

What changed

Model (11L gated-attention + value-residual)

11 layers, 512 dim, 8 heads, 4 KV heads (GQA)
MLP 3.5x with LeakyReLU(0.5)² activation
Gated attention + value residual + XSA-all
Partial RoPE (64 dims), value embeddings on layers 8-10
BigramHash (1024 vocab, 256 dim), tied embeddings
EMA(0.997), warmdown=3500, matrix_lr=0.05

Quantization (Int6 GPTQ + lzma)

Int6 GPTQ with descending actorder + dead column handling
lzma(8) compression (enables int6 to fit in 16MB)
5% magnitude pruning with retry loop

N-gram Eval Cache (the key innovation)

Two-pass order-13 backward-looking n-gram cache with entropy-adaptive mixing:

Pass 1 (score-first, legal):

Process validation tokens in 1M-token sequential chunks
For each chunk: model forward pass → score tokens → update n-gram cache
Cache only contains already-scored tokens (backward-looking)
Captures per-token model probabilities and entropy for Pass 2

Pass 2 (full-rescore, no new forward passes):

Rescore all tokens using the COMPLETE n-gram cache
Entropy-adaptive mixing: α = sigmoid(scale × (entropy - center)) with order-shifted centers
Per-order multipliers: 0.3x for bigram/trigram, 2x for 5-gram+
α_min=0.05, α_max=0.60, entropy_center=3.0, entropy_scale=2.0

Implementation:

Pure NumPy with vectorized batch operations (no C extensions)
XOR-of-products hashing with 14 primes
4M buckets (power-of-2 masking, collisions act as beneficial smoothing)
np.bincount for O(n) bulk cache updates

Submission Checklist

One new folder under records/track_10min_16mb/
Included README.md
Included submission.json
Included train_gpt.py
Included 3 train logs (train_seed1337.log, train_seed42.log, train_seed2025.log)
Eval <= 600s on 8xH100 (max ~250s)
Submission size <= 16,000,000 bytes
No tokenizer/dataset modifications
Score-first evaluation maintained

… (3-seed mean)

valerio-oai · 2026-03-27T22:46:41Z

Thanks for your submission! Unfortunately, it's disallowed due to the use of hashed n-gram caches, which do not renormalize correctly / correctly reweight the LM's token distribution, look ahead to the target token to mix probabilities and therefore leak eval tokens. Please refer to the long discussion about this under the issues tab for more details, and please submit more runs in the future!

Recursive Bayesian smoothing (PR openai#900 / Teh 2006 / Willems CTW): each order's posterior becomes the next order's prior. p = (c * p_prev + count) / (c + total), lowest to highest order. Key changes: - NgramCache.lookup_hierarchical: iterates orders 2-13 bottom-up - Concentration c=5.0 (matching PR openai#900), phrase c=min(c,2.0) - Extend n-gram order from 9 to 13 (validated by PR openai#921: 0.0939)

Extend n-gram to order-13 (PR openai#921 validates higher orders: 0.0939). Trim phrase to [36,28,20,16] to fit eval budget. Flat Dirichlet c=1.0 (highest match only — avoids hierarchical overhead).

Record: Order-13 Full-Rescore N-gram + 11L Int6 GPTQ — val_bpb 0.0939…

e6e96b3

… (3-seed mean)

notapplica mentioned this pull request Mar 27, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

valerio-oai closed this Mar 27, 2026

valerio-oai mentioned this pull request Mar 27, 2026

Illegal submissions megathread #677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Order-13 Full-Rescore N-gram + 11L Int6 GPTQ — val_bpb 0.0939 (3-seed mean)#921

Record: Order-13 Full-Rescore N-gram + 11L Int6 GPTQ — val_bpb 0.0939 (3-seed mean)#921
TimPietrusky wants to merge 1 commit intoopenai:mainfrom
TimPietrusky:submit/fullrescore-order13-ngram

TimPietrusky commented Mar 27, 2026

Uh oh!

valerio-oai commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TimPietrusky commented Mar 27, 2026

Record Summary

3-Seed Results

What changed

Model (11L gated-attention + value-residual)

Quantization (Int6 GPTQ + lzma)

N-gram Eval Cache (the key innovation)

Submission Checklist

Uh oh!

valerio-oai commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants