Skip to content

10L + PPM Full-Rescore Order-12 N-gram (0.3461 BPB)#912

Closed
Bortlesboat wants to merge 6 commits intoopenai:mainfrom
Bortlesboat:submission/v8-ppm-fullrescore
Closed

10L + PPM Full-Rescore Order-12 N-gram (0.3461 BPB)#912
Bortlesboat wants to merge 6 commits intoopenai:mainfrom
Bortlesboat:submission/v8-ppm-fullrescore

Conversation

@Bortlesboat
Copy link
Copy Markdown

Record submission

val_bpb: 0.3461 (mean of 3 seeds, std 0.0015)

Seed val_bpb artifact_bytes
42 0.3440 15,340,000
1337 0.3468 15,300,000
2024 0.3475 15,630,000

What's novel

PPM-style all-order blend. Instead of hard backoff where only the highest matching order contributes, this blends ALL matching orders (2-12) using escape probabilities from PPM compression theory. Each order's contribution is weighted by its escape probability: escape = beta / (ctx_count + beta). The neural model absorbs remaining mass. More principled than single-order interpolation.

Leave-one-out self-exclusion. Each token's own (context, target) count is subtracted from the cache before scoring, eliminating self-inclusion bias in full-rescore.

Eval pipeline

  1. Pass 1 (120s): GPU sliding window, stores per-token model_p and entropy
  2. Cache build (52s): vectorized np.bincount over all tokens, orders 2-12
  3. Pass 2 (12s): PPM all-order rescore of ALL tokens with leave-one-out
  4. Total: 185s (well within 600s)

Architecture

  • 10L d=512, GQA 8H/4KV, LeakyReLU(0.5)^2, Partial RoPE, LN Scale, XSA last 4, Value Residual
  • Mixed int5/int6 + zstd-22, EMA(0.997), Muon(lr=0.03)

Compliance

  • Score-first: pass 1 stores probabilities, no n-gram blending during forward
  • Full cache built after all tokens scored
  • Leave-one-out: own count subtracted before matching
  • No target-aware gating: blending uses model entropy + matched order only
  • Artifact <= 16MB (15.3-15.6 MB)
  • Train <= 600s
  • Eval <= 600s (185s)
  • 3-seed validation

Explores stacking eval-time techniques (neural cache, LoRA TTT) and
quantization-aware training on top of the openai#1 recipe. QAT has an export
mismatch bug resulting in high quantization penalty — submitting as
non-record to document the approach for iteration.
Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6
quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4).
Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.
10L d=512, GQA 8H/4KV, LeakyReLU(0.5)^2, Partial RoPE, LN Scale,
XSA last 4, Value Residual, EMA(0.997). Mixed int5/int6 + zstd-22.
Eval: multi-order hashed n-gram backoff (orders 2-7) with entropy-
adaptive alpha. Mean of 3 seeds: 0.9123 +/- 0.0003 BPB.
Renamed to reflect actual technique (n-gram backoff + entropy alpha).
Removed old 1.1507 BPB seed logs. Added explicit compliance/legality
section per competition conventions.
Two-pass eval: pass 1 builds order 2-11 n-gram cache with order-adaptive
entropy gating, pass 2 rescores cold-cache early windows with full cache.
Mean of 3 seeds: 0.5863 +/- 0.0002 BPB. All artifacts under 16MB.
Total eval: 331s on 8xH100.
PPM-style all-order blend (orders 2-12) with escape probabilities
instead of hard backoff. Decoupled two-pass: pass 1 stores model_p,
pass 2 rescores ALL tokens with leave-one-out against full cache.
np.bincount for fast cache build. Mean of 3 seeds: 0.3461 +/- 0.0015.
@Bortlesboat
Copy link
Copy Markdown
Author

Replacing with a cleaner PR that only touches this submission's folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant