Record: FarnsworthEngine v1 — TTT + 11L Int6 MLP3x, val_bpb=1.1303 by timowhite88 · Pull Request #254 · openai/parameter-golf

timowhite88 · 2026-03-20T19:15:10Z

No description provided.

timowhite88 · 2026-03-20T19:46:08Z

@0hq Ready for review — 3 seeds complete with tight reproducibility:

Seed 1337: 1.1303
Seed 42: 1.1312
Seed 7: 1.1323
Mean: 1.1313

15.88 MB artifact, 600s train, 129s eval. Full logs for all 3 seeds included. This supersedes our earlier PRs #152 and #178.

timowhite88 · 2026-03-20T20:00:39Z

@notapplica 3 seeds submitted now, Mean is posted , all logs contained, Ready for @0hq review

…mean: 1.1313)

himanalot · 2026-03-20T20:24:30Z

this is aura

mohosy · 2026-03-20T23:59:31Z

interesting that freezing early blocks during ttt helps stability, have you experimented with freezing more or less blocks to see where the sweet spot is

Matching PR #254 (1.1313 BPB) TTT approach: - SGD optimizer instead of Adam (better for non-stationary TTT) - 3 epochs per document (more adaptation) - lr=0.002, momentum=0.9 - Freeze first 2 blocks' LoRA (stable features don't need adaptation) New env vars: TTT_EPOCHS, TTT_OPTIMIZER, TTT_MOMENTUM, TTT_FREEZE_FIRST_N Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Combines PR openai#287 (XSA + EMA + Int6 QAT) with PR openai#254 TTT adaptation. Changes: FA2 fallback import, TTT hyperparameters, ttt_adapt function, TTT call before torch.compile in eval section.

sharpobject · 2026-03-21T17:55:33Z

"If it isn't abundantly obvious: You can't cheat on your test loss. You can't cheat by training on the validation set before you evaluate on the validation set. The validation language around test-time training has been confusing people: you are only allowed to test-time train on validation set tokens you've already evaluated your model on, since those tokens have already been graded!"

@timowhite88

11L Int6 MLP3x + SmearGate + BigramHash + OrthoInit + TTT SGD 3ep Exact reproduction of @timowhite88's FarnsworthEngine recipe. No modifications — run as-is to validate baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

himanalot · 2026-03-21T19:52:11Z

newjordan just out here blatantly copying people lol

#1 untried combination from competition commentary: TTT (from #254) + XSA (from #265) = estimated 1.117-1.121 BPB XSA_LAST_N=3 excludes self-attention in final 3 layers. Zero extra params, frees attention capacity for cross-token focus. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

exp_a: Multi-Token Prediction (MTP_NUM_HEADS=2, excluded from export) exp_b: SwiGLU MLP replacing ReLU² (hidden=1024, same param count) exp_c: Vocab 1536 tokenizer for better bytes-per-token ratio All based on PR #254 SOTA clone (1.1303 BPB). Priority: exp_c first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Many TTT submissions (openai#136, openai#152, openai#254, openai#264, openai#338, openai#398, openai#417, openai#421, openai#442) flagged as potentially invalid for adapting on eval tokens BEFORE scoring them. Added correct score-then-adapt protocol with implementation guide. https://claude.ai/code/session_01M5XTtyz2Zdq5BDeh9qNn9y

hegdeadithyak · 2026-03-23T13:03:36Z

records/track_10min_16mb/2026-03-20_FarnsworthEngine_TTT_11L_Int6_MLP3x/train_gpt.py

+    restore_low_dim_params_to_fp32(eval_model)
+    eval_model.load_state_dict(deq_state, strict=True)
+
+    # TTT: adapt model on validation data before eval


I don't think this is how it should be done :)

@timowhite88

11L Int6 MLP3x + SmearGate + BigramHash + OrthoInit + TTT SGD 3ep Exact reproduction of @timowhite88's FarnsworthEngine recipe. No modifications — run as-is to validate baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

openai#1 untried combination from competition commentary: TTT (from openai#254) + XSA (from openai#265) = estimated 1.117-1.121 BPB XSA_LAST_N=3 excludes self-attention in final 3 layers. Zero extra params, frees attention capacity for cross-token focus. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

exp_a: Multi-Token Prediction (MTP_NUM_HEADS=2, excluded from export) exp_b: SwiGLU MLP replacing ReLU² (hidden=1024, same param count) exp_c: Vocab 1536 tokenizer for better bytes-per-token ratio All based on PR openai#254 SOTA clone (1.1303 BPB). Priority: exp_c first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@timowhite88

11L Int6 MLP3x + SmearGate + BigramHash + OrthoInit + TTT SGD 3ep Exact reproduction of @timowhite88's FarnsworthEngine recipe. No modifications — run as-is to validate baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

openai#1 untried combination from competition commentary: TTT (from openai#254) + XSA (from openai#265) = estimated 1.117-1.121 BPB XSA_LAST_N=3 excludes self-attention in final 3 layers. Zero extra params, frees attention capacity for cross-token focus. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

exp_a: Multi-Token Prediction (MTP_NUM_HEADS=2, excluded from export) exp_b: SwiGLU MLP replacing ReLU² (hidden=1024, same param count) exp_c: Vocab 1536 tokenizer for better bytes-per-token ratio All based on PR openai#254 SOTA clone (1.1303 BPB). Priority: exp_c first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 20, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

This was referenced Mar 20, 2026

Add TTT (Test-Time Training) submission: 1.1767 BPB #152

Closed

Add Nuclear Stack submission: 1.16668 BPB (seed 2884431328) #178

Closed

Record: FarnsworthEngine v1 — TTT + 11L Int6 MLP3x (val_bpb: 1.1303, …

479b8bc

…mean: 1.1313)

timowhite88 force-pushed the farnsworth-engine-v1 branch from 18aa3cc to 479b8bc Compare March 20, 2026 20:12

timowhite88 mentioned this pull request Mar 20, 2026

Title: Record submission: FarnsworthEngine v1 — val_bpb=1.1303 (mean 1.1313, 3 seeds) #270

Closed

ibarrajo mentioned this pull request Mar 20, 2026

flash_attn_interface (FA3) missing from runpod/parameter-golf:latest image #280

Open

charmquark1984 mentioned this pull request Mar 20, 2026

Non-record: val_bpb=1.1374, FA2+SWA adaptation of Farnsworth #281

Closed

ibarrajo mentioned this pull request Mar 21, 2026

Record: 11L + Partial XSA + TTT + BatchOpt (val_bpb=1.1354) #290

Open

7 tasks

JackYoung27 mentioned this pull request Mar 21, 2026

Non-record: 11L int5/int6 + XSA + online TTT w/ decay prior (single-run val_bpb=1.1520) #302

Open

sseanliu mentioned this pull request Mar 21, 2026

[Non-record] XSA + EMA + TTT: Negative interaction study (val_bpb=1.1436) #303

Open

newjordan mentioned this pull request Mar 22, 2026

Record: Sponge Bath — TTT 8ep eval-only improvement (val_bpb: 1.1295) #390

Closed

5 tasks

leloykun mentioned this pull request Mar 22, 2026

Invalid submissions due to information leakage during TTT #402

Open

hegdeadithyak reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: FarnsworthEngine v1 — TTT + 11L Int6 MLP3x, val_bpb=1.1303#254

Record: FarnsworthEngine v1 — TTT + 11L Int6 MLP3x, val_bpb=1.1303#254
timowhite88 wants to merge 1 commit intoopenai:mainfrom
timowhite88:farnsworth-engine-v1

timowhite88 commented Mar 20, 2026

Uh oh!

timowhite88 commented Mar 20, 2026

Uh oh!

timowhite88 commented Mar 20, 2026

Uh oh!

himanalot commented Mar 20, 2026

Uh oh!

mohosy commented Mar 20, 2026

Uh oh!

sharpobject commented Mar 21, 2026 •

edited

Loading

Uh oh!

himanalot commented Mar 21, 2026

Uh oh!

hegdeadithyak Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

timowhite88 commented Mar 20, 2026

Uh oh!

timowhite88 commented Mar 20, 2026

Uh oh!

timowhite88 commented Mar 20, 2026

Uh oh!

himanalot commented Mar 20, 2026

Uh oh!

mohosy commented Mar 20, 2026

Uh oh!

sharpobject commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

himanalot commented Mar 21, 2026

Uh oh!

hegdeadithyak Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sharpobject commented Mar 21, 2026 •

edited

Loading