Skip to content

Record: Order-13 Full-Rescore N-gram + 11L Int6 GPTQ — val_bpb 0.0939 (3-seed mean)#921

Closed
TimPietrusky wants to merge 1 commit intoopenai:mainfrom
TimPietrusky:submit/fullrescore-order13-ngram
Closed

Record: Order-13 Full-Rescore N-gram + 11L Int6 GPTQ — val_bpb 0.0939 (3-seed mean)#921
TimPietrusky wants to merge 1 commit intoopenai:mainfrom
TimPietrusky:submit/fullrescore-order13-ngram

Conversation

@TimPietrusky
Copy link
Copy Markdown

Record Summary

Final submitted score (full-rescore n-gram): val_bpb 0.09391 (3-seed mean, std 0.00002)

Reference neural score (standard quantized roundtrip): mean val_bpb ~1.124

Hardware/limits: 8xH100 SXM, train ~600s, eval <=600s, max submission size ~15.8 MB.

3-Seed Results

Seed final val_bpb eval_s artifact_bytes
1337 0.09391 247 ~15.8MB
42 0.09393 246 ~15.8MB
2025 0.09389 250 ~15.8MB
Mean 0.09391
Std 0.00002

What changed

Model (11L gated-attention + value-residual)

  • 11 layers, 512 dim, 8 heads, 4 KV heads (GQA)
  • MLP 3.5x with LeakyReLU(0.5)² activation
  • Gated attention + value residual + XSA-all
  • Partial RoPE (64 dims), value embeddings on layers 8-10
  • BigramHash (1024 vocab, 256 dim), tied embeddings
  • EMA(0.997), warmdown=3500, matrix_lr=0.05

Quantization (Int6 GPTQ + lzma)

  • Int6 GPTQ with descending actorder + dead column handling
  • lzma(8) compression (enables int6 to fit in 16MB)
  • 5% magnitude pruning with retry loop

N-gram Eval Cache (the key innovation)

Two-pass order-13 backward-looking n-gram cache with entropy-adaptive mixing:

Pass 1 (score-first, legal):

  • Process validation tokens in 1M-token sequential chunks
  • For each chunk: model forward pass → score tokens → update n-gram cache
  • Cache only contains already-scored tokens (backward-looking)
  • Captures per-token model probabilities and entropy for Pass 2

Pass 2 (full-rescore, no new forward passes):

  • Rescore all tokens using the COMPLETE n-gram cache
  • Entropy-adaptive mixing: α = sigmoid(scale × (entropy - center)) with order-shifted centers
  • Per-order multipliers: 0.3x for bigram/trigram, 2x for 5-gram+
  • α_min=0.05, α_max=0.60, entropy_center=3.0, entropy_scale=2.0

Implementation:

  • Pure NumPy with vectorized batch operations (no C extensions)
  • XOR-of-products hashing with 14 primes
  • 4M buckets (power-of-2 masking, collisions act as beneficial smoothing)
  • np.bincount for O(n) bulk cache updates

Submission Checklist

  • One new folder under records/track_10min_16mb/
  • Included README.md
  • Included submission.json
  • Included train_gpt.py
  • Included 3 train logs (train_seed1337.log, train_seed42.log, train_seed2025.log)
  • Eval <= 600s on 8xH100 (max ~250s)
  • Submission size <= 16,000,000 bytes
  • No tokenizer/dataset modifications
  • Score-first evaluation maintained

@valerio-oai
Copy link
Copy Markdown
Contributor

Thanks for your submission! Unfortunately, it's disallowed due to the use of hashed n-gram caches, which do not renormalize correctly / correctly reweight the LM's token distribution, look ahead to the target token to mix probabilities and therefore leak eval tokens. Please refer to the long discussion about this under the issues tab for more details, and please submit more runs in the future!

sofiabod added a commit to sofiabod/parameter-golf that referenced this pull request Mar 28, 2026
Recursive Bayesian smoothing (PR openai#900 / Teh 2006 / Willems CTW):
each order's posterior becomes the next order's prior.
p = (c * p_prev + count) / (c + total), lowest to highest order.

Key changes:
- NgramCache.lookup_hierarchical: iterates orders 2-13 bottom-up
- Concentration c=5.0 (matching PR openai#900), phrase c=min(c,2.0)
- Extend n-gram order from 9 to 13 (validated by PR openai#921: 0.0939)
sofiabod added a commit to sofiabod/parameter-golf that referenced this pull request Mar 28, 2026
Extend n-gram to order-13 (PR openai#921 validates higher orders: 0.0939).
Trim phrase to [36,28,20,16] to fit eval budget.
Flat Dirichlet c=1.0 (highest match only — avoids hierarchical overhead).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants