Skip to content

Record: FarnsworthEngine v1 — TTT + 11L Int6 MLP3x, val_bpb=1.1303#254

Open
timowhite88 wants to merge 1 commit intoopenai:mainfrom
timowhite88:farnsworth-engine-v1
Open

Record: FarnsworthEngine v1 — TTT + 11L Int6 MLP3x, val_bpb=1.1303#254
timowhite88 wants to merge 1 commit intoopenai:mainfrom
timowhite88:farnsworth-engine-v1

Conversation

@timowhite88
Copy link
Copy Markdown

No description provided.

@timowhite88
Copy link
Copy Markdown
Author


@0hq Ready for review — 3 seeds complete with tight reproducibility:

  • Seed 1337: 1.1303
  • Seed 42: 1.1312
  • Seed 7: 1.1323
  • Mean: 1.1313

15.88 MB artifact, 600s train, 129s eval. Full logs for all 3 seeds included. This supersedes our earlier PRs #152 and #178.

@timowhite88
Copy link
Copy Markdown
Author

@notapplica 3 seeds submitted now, Mean is posted , all logs contained, Ready for @0hq review

@timowhite88 timowhite88 force-pushed the farnsworth-engine-v1 branch from 18aa3cc to 479b8bc Compare March 20, 2026 20:12
@himanalot
Copy link
Copy Markdown

this is aura

@mohosy
Copy link
Copy Markdown

mohosy commented Mar 20, 2026

interesting that freezing early blocks during ttt helps stability, have you experimented with freezing more or less blocks to see where the sweet spot is

newjordan referenced this pull request in newjordan/parameter-golf Mar 21, 2026
Matching PR #254 (1.1313 BPB) TTT approach:
- SGD optimizer instead of Adam (better for non-stationary TTT)
- 3 epochs per document (more adaptation)
- lr=0.002, momentum=0.9
- Freeze first 2 blocks' LoRA (stable features don't need adaptation)

New env vars: TTT_EPOCHS, TTT_OPTIMIZER, TTT_MOMENTUM, TTT_FREEZE_FIRST_N

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sseanliu added a commit to sseanliu/parameter-golf that referenced this pull request Mar 21, 2026
Combines PR openai#287 (XSA + EMA + Int6 QAT) with PR openai#254 TTT adaptation.
Changes: FA2 fallback import, TTT hyperparameters, ttt_adapt function,
TTT call before torch.compile in eval section.
@sharpobject
Copy link
Copy Markdown

sharpobject commented Mar 21, 2026

"If it isn't abundantly obvious: You can't cheat on your test loss. You can't cheat by training on the validation set before you evaluate on the validation set. The validation language around test-time training has been confusing people: you are only allowed to test-time train on validation set tokens you've already evaluated your model on, since those tokens have already been graded!"

newjordan referenced this pull request in newjordan/parameter-golf Mar 21, 2026
11L Int6 MLP3x + SmearGate + BigramHash + OrthoInit + TTT SGD 3ep
Exact reproduction of @timowhite88's FarnsworthEngine recipe.
No modifications — run as-is to validate baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@himanalot
Copy link
Copy Markdown

newjordan just out here blatantly copying people lol

newjordan referenced this pull request in newjordan/parameter-golf Mar 21, 2026
#1 untried combination from competition commentary:
TTT (from #254) + XSA (from #265) = estimated 1.117-1.121 BPB
XSA_LAST_N=3 excludes self-attention in final 3 layers.
Zero extra params, frees attention capacity for cross-token focus.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan referenced this pull request in newjordan/parameter-golf Mar 21, 2026
exp_a: Multi-Token Prediction (MTP_NUM_HEADS=2, excluded from export)
exp_b: SwiGLU MLP replacing ReLU² (hidden=1024, same param count)
exp_c: Vocab 1536 tokenizer for better bytes-per-token ratio

All based on PR #254 SOTA clone (1.1303 BPB). Priority: exp_c first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ThomAub pushed a commit to ThomAub/parameter-golf that referenced this pull request Mar 22, 2026
Many TTT submissions (openai#136, openai#152, openai#254, openai#264, openai#338, openai#398, openai#417, openai#421, openai#442)
flagged as potentially invalid for adapting on eval tokens BEFORE scoring them.
Added correct score-then-adapt protocol with implementation guide.

https://claude.ai/code/session_01M5XTtyz2Zdq5BDeh9qNn9y
restore_low_dim_params_to_fp32(eval_model)
eval_model.load_state_dict(deq_state, strict=True)

# TTT: adapt model on validation data before eval
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is how it should be done :)

newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 23, 2026
11L Int6 MLP3x + SmearGate + BigramHash + OrthoInit + TTT SGD 3ep
Exact reproduction of @timowhite88's FarnsworthEngine recipe.
No modifications — run as-is to validate baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 23, 2026
openai#1 untried combination from competition commentary:
TTT (from openai#254) + XSA (from openai#265) = estimated 1.117-1.121 BPB
XSA_LAST_N=3 excludes self-attention in final 3 layers.
Zero extra params, frees attention capacity for cross-token focus.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 23, 2026
exp_a: Multi-Token Prediction (MTP_NUM_HEADS=2, excluded from export)
exp_b: SwiGLU MLP replacing ReLU² (hidden=1024, same param count)
exp_c: Vocab 1536 tokenizer for better bytes-per-token ratio

All based on PR openai#254 SOTA clone (1.1303 BPB). Priority: exp_c first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 23, 2026
11L Int6 MLP3x + SmearGate + BigramHash + OrthoInit + TTT SGD 3ep
Exact reproduction of @timowhite88's FarnsworthEngine recipe.
No modifications — run as-is to validate baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 23, 2026
openai#1 untried combination from competition commentary:
TTT (from openai#254) + XSA (from openai#265) = estimated 1.117-1.121 BPB
XSA_LAST_N=3 excludes self-attention in final 3 layers.
Zero extra params, frees attention capacity for cross-token focus.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 23, 2026
exp_a: Multi-Token Prediction (MTP_NUM_HEADS=2, excluded from export)
exp_b: SwiGLU MLP replacing ReLU² (hidden=1024, same param count)
exp_c: Vocab 1536 tokenizer for better bytes-per-token ratio

All based on PR openai#254 SOTA clone (1.1303 BPB). Priority: exp_c first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants