Gravity Tokenizer: 1.0321 BPB via ablation leverage vocabulary optimization by dcrow85 · Pull Request #755 · openai/parameter-golf

dcrow85 · 2026-03-25T18:44:05Z

Summary

val_bpb: 1.0321 (3-seed mean, std 0.0011) — beats current SOTA (1.1194) by 0.0873 BPB
Replaces 659/765 merge tokens by ablation leverage scoring (β=1.0)
Vanilla 12L 384d transformer — no SmearGate, no BigramHash, no XSA, no EMA, no TTT, no sliding window eval
The vocabulary alone accounts for the entire improvement
15.6 MB artifact, ~591s training time, all constraints met with margin

3-Seed Results

Seed	val_bpb	artifact_bytes	training_time
42	1.0310	15,629,267	590,898 ms
137	1.0321	15,625,195	590,980 ms
3	1.0331	15,625,147	591,082 ms
Mean	1.0321
Std	0.0011

Approach

At 1024 vocabulary tokens, every merge slot matters. Standard BPE allocates by frequency. The Gravity Tokenizer allocates by ablation leverage — the downstream loss increase when a token is shattered back to bytes. This is a measurement of structural importance, not frequency.

The scoring pipeline uses a frozen GPT-2 reference model to measure each candidate token's leverage across 100 FineWeb contexts. The top 765 candidates by gravity score replace the BPE merge tokens. The vocabulary size stays exactly 1024.

Tokenizer Correctness

The val_bpb calculation uses the competition's own build_sentencepiece_luts() and eval_val() functions with zero modifications. The gravity tokenizer's lower compression ratio (1.05 vs 2.45 bytes/token) results in a higher tokens_per_byte multiplier, which penalizes the gravity tokenizer. The improvement is entirely in per-token prediction quality. Detailed correctness documentation included in tokenizer_scrutiny_doc.md.

Setup

bash setup.sh   # Downloads stock FineWeb + retokenizes with gravity vocabulary

The train_gpt.py is the unmodified competition baseline. All config via env vars.

Test plan

3 seeds with p << 0.01 statistical significance
All artifacts under 16,000,000 bytes
All runs under 600 seconds on 8×H100 SXM
Tokenizer correctness documented and defended
Retokenization is deterministic and reproducible from stock FineWeb

🤖 Generated with Claude Code

…zation Replaces 659/765 merge tokens by structural importance scoring. Vanilla 12L 384d transformer, no architectural novelties. 3-seed mean: 1.0321 (std 0.0011). All artifacts under 16MB, all runs under 600s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- The horizontal lensing hypothesis was tested and killed (56% RoPE artifact) - Replaced with the depth efficiency law (p=0.00005, length-matched) - Added Qwen 2.5-72B frontier probe results (80 layers, same physics) - Link to full probe data and DEPTH_EFFICIENCY.md writeup - Honest framing: reported what survived the controls and what didn't Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Eppie · 2026-03-29T21:39:58Z

This is another one that I struggled to find an issue with for a while, but upon closer inspection of the tokenization, the reported val_bpb is invalid because the total_bytes denominator is artificially inflated. This is exactly the bug described in #897 (nice find, @riccardoalberghi!).

Also, @NoesisGenesis, how does this one fit into your 4 categories?

Some additional details from Opus:

The gravity tokenizer lacks a standalone ▁ (U+2581) token. The baseline BPE tokenizer has it as token 939, so build_sentencepiece_luts() correctly strips it and counts 1 byte for the space. But when ▁ isn't in the vocabulary, SentencePiece's byte fallback decomposes it into <0xE2>, <0x96>, <0x81> — 3 bytes counted for 1 ASCII space.

Since BPB = (val_loss / ln2) × (total_tokens / total_bytes), inflating total_bytes deflates the reported BPB.

NoesisGenesis · 2026-03-30T04:18:40Z

All four information-theoretic conditions are satisfied. This submission just requires correct BPB computation.

notapplica mentioned this pull request Mar 25, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gravity Tokenizer: 1.0321 BPB via ablation leverage vocabulary optimization#755

Gravity Tokenizer: 1.0321 BPB via ablation leverage vocabulary optimization#755
dcrow85 wants to merge 2 commits intoopenai:mainfrom
dcrow85:submission/2026-03-25_GravityTokenizer_AblationLeverage

dcrow85 commented Mar 25, 2026

Uh oh!

Eppie commented Mar 29, 2026

Uh oh!

NoesisGenesis commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dcrow85 commented Mar 25, 2026

Summary

3-Seed Results

Approach

Tokenizer Correctness

Setup

Test plan

Uh oh!

Eppie commented Mar 29, 2026

Uh oh!

NoesisGenesis commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants