[Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645) by sseanliu · Pull Request #296 · openai/parameter-golf

sseanliu · 2026-03-21T00:28:41Z

Summary

Non-record research submission exploring test-time adaptation strategies for compressed language models at 16MB scale.

Key findings

Reptile meta-learning improves SmearGate models by 0.011 BPB — 10x better than naive TTT (+0.001), partially overcoming the SmearGate/TTT redundancy
Error-guided TTT is a negative result — concentrating adaptation on highest-loss tokens does not improve val_loss, indicating these tokens are genuinely unpredictable
13 layers beat 10 layers on 8xH100 (1.1884 vs 1.2090) despite 23% fewer training steps
Per-token loss distribution on full 62M val set: hardest 2.7% of tokens account for ~15% of total loss

Score

val_bpb: 1.1645 (sliding window, stride=64)
Artifact: 12.7MB

See README for full methodology and analysis.

…n-record)

- Add leaderboard table: jfprincz 1.1271 is new target; mohosy racing same stack - Add Reptile meta-TTT finding (PR openai#296): 10x better than naive TTT with SmearGate; error-guided TTT is negative; 13L crossover point identified - Add SWA checkpoint count finding (PR openai#238): 84 checkpoints reverses quant gap; explains why our WD=1200 SWA showed no effect - Update jfprincz entry to include PR openai#287 results (1.1271) - Add meta-lessons 10 and 11

Combines PR openai#287 (XSA + EMA + Int6 QAT) with PR openai#254 TTT adaptation. Changes: FA2 fallback import, TTT hyperparameters, ttt_adapt function, TTT call before torch.compile in eval section.

Add MetaTTT v2: Reptile meta-learning + error-guided TTT analysis (no…

9605e98

…n-record)

notapplica mentioned this pull request Mar 21, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Add XSA + EMA + TTT merged train_gpt.py

e3a7958

Combines PR openai#287 (XSA + EMA + Int6 QAT) with PR openai#254 TTT adaptation. Changes: FA2 fallback import, TTT hyperparameters, ttt_adapt function, TTT call before torch.compile in eval section.

This was referenced Mar 21, 2026

[Non-record] XSA + EMA + TTT: Negative interaction study (val_bpb=1.1436) #303

Open

Neural Cache: Cross-Window KV Caching for Extended Eval Context (research proposal) #318

Open

anantdgoel mentioned this pull request Mar 22, 2026

Non-record: Meta-TTT + Cache/OGD Eval Stacking + Tokenizer Ablation #384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645)#296

[Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645)#296
sseanliu wants to merge 2 commits intoopenai:mainfrom
sseanliu:submission/metattt-v2

sseanliu commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sseanliu commented Mar 21, 2026

Summary

Key findings

Score

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant