Non-record: Meta-TTT + Cache/OGD Eval Stacking + Tokenizer Ablation by anantdgoel · Pull Request #384 · openai/parameter-golf

anantdgoel · 2026-03-22T01:26:16Z

Non-record: Meta-TTT + Cache/OGD Eval Stacking + Tokenizer Ablation

val_bpb: 1.2882 (sliding window) | 13.2 MB | 1xA40 Secure, 2000 steps, 524K batch

Three novel research directions with controlled ablations, each exploring techniques no other submissions have investigated.

Contributions

1. MAML-style Meta-TTT (negative result)
First-order MAML during training to optimize initialization for TTT adaptation. Controlled A/B test (same 524K batch, same arch):

Control: 1.2882 BPB sliding window
Meta-TTT: 1.3733 BPB (+0.085 worse)
Root cause: meta-loss weight (0.5) too aggressive, competing with LM loss. PR [Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645) #296's Reptile (lighter touch, last 20% only) is the right approach.

2. Eval-time technique stacking (positive result)
Unigram cache mixture + online gradient descent on vocab bias, stacked on SGD TTT:

TTT alone: 1.3563 BPB
TTT + cache + OGD: 1.3529 BPB (-0.003 additive)
Zero artifact cost, backward-looking compliant

3. Tokenizer optimization (null result)
First submission to modify the tokenizer. BPE with split_digits=False, max_len=64 gives -5.7% fewer tokens/byte at v8192, but no BPB improvement (+0.0006 worse). Longer merged tokens are harder to predict per-token, offsetting compression gains. Explains why community converged on stock v1024.

Files

README.md — Full writeup with ablation tables
submission.json — Metadata
train_gpt.py — Modified script with cache mixture, OGD, Meta-TTT
train.log — Training log from A40 control run

Three novel research contributions with controlled ablations: 1. MAML Meta-TTT (negative: -0.085 BPB, hyperparams too aggressive) 2. Eval-time cache mixture + OGD vocab bias (-0.004 BPB additive on TTT) 3. Tokenizer optimization (null: -5.7% tokens/byte, no BPB gain) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 22, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Meta-TTT + Cache/OGD Eval Stacking + Tokenizer Ablation#384

Non-record: Meta-TTT + Cache/OGD Eval Stacking + Tokenizer Ablation#384
anantdgoel wants to merge 1 commit intoopenai:mainfrom
anantdgoel:novel-eval-submission

anantdgoel commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anantdgoel commented Mar 22, 2026