Skip to content

[Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645)#296

Open
sseanliu wants to merge 2 commits intoopenai:mainfrom
sseanliu:submission/metattt-v2
Open

[Non-record] Meta-Learned TTT + Error-Guided Adaptation Analysis (val_bpb=1.1645)#296
sseanliu wants to merge 2 commits intoopenai:mainfrom
sseanliu:submission/metattt-v2

Conversation

@sseanliu
Copy link
Copy Markdown

Summary

Non-record research submission exploring test-time adaptation strategies for compressed language models at 16MB scale.

Key findings

  1. Reptile meta-learning improves SmearGate models by 0.011 BPB — 10x better than naive TTT (+0.001), partially overcoming the SmearGate/TTT redundancy
  2. Error-guided TTT is a negative result — concentrating adaptation on highest-loss tokens does not improve val_loss, indicating these tokens are genuinely unpredictable
  3. 13 layers beat 10 layers on 8xH100 (1.1884 vs 1.2090) despite 23% fewer training steps
  4. Per-token loss distribution on full 62M val set: hardest 2.7% of tokens account for ~15% of total loss

Score

  • val_bpb: 1.1645 (sliding window, stride=64)
  • Artifact: 12.7MB

See README for full methodology and analysis.

mrdavtan added a commit to mrdavtan/parameter-golf that referenced this pull request Mar 21, 2026
- Add leaderboard table: jfprincz 1.1271 is new target; mohosy racing same stack
- Add Reptile meta-TTT finding (PR openai#296): 10x better than naive TTT with SmearGate;
  error-guided TTT is negative; 13L crossover point identified
- Add SWA checkpoint count finding (PR openai#238): 84 checkpoints reverses quant gap;
  explains why our WD=1200 SWA showed no effect
- Update jfprincz entry to include PR openai#287 results (1.1271)
- Add meta-lessons 10 and 11
Combines PR openai#287 (XSA + EMA + Int6 QAT) with PR openai#254 TTT adaptation.
Changes: FA2 fallback import, TTT hyperparameters, ttt_adapt function,
TTT call before torch.compile in eval section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant