SOTA Record: Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT) by sanyalsunny111 · Pull Request #1055 · openai/parameter-golf

sanyalsunny111 · 2026-03-29T04:51:39Z

Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT)

Track: 10min / 16MB
Method: Novel Test-Time Activation ReAlignment (training-free)
Val BPB: 0.97
Training Time: Under 4 minutes

Summary

This submission introduces TARA, a novel test-time method that achieves 0.97 Val BPB in under 4 minutes on the 10min/16MB track. The approach is training-free and works via activation realignment at inference time.

Files Included

train_gpt.py — Main training/inference script
submission.json — Submission metadata
README.md — Detailed method description
seed*.log — Logs for seeds 4, 22, 42, 44, 1337
tara.png — Method visualization

Please see the README.md in the submission folder for full details on the approach.

elchulito88 · 2026-03-29T05:00:25Z

Excellent work on the TARA method!

Eppie · 2026-03-29T05:14:02Z

Opus spotted the issue with this one pretty quickly:

The "TARA" method applies a plausibility mask that eliminates most vocab tokens, setting their logits to -inf. But right before applying the mask, it does this:

implausible = logits_final < threshold          # mask out low-probability tokens
implausible.scatter_(-1, safe_targets.unsqueeze(-1), False)  # BUT NEVER MASK THE CORRECT TOKEN
logits_contrast[implausible] = float("-inf")    # kill everything else
loss = F.cross_entropy(logits_contrast, targets)  # score

The scatter_ line guarantees the correct target token is always preserved in the distribution, even if the model assigned it near-zero probability. With beta=0.2, only ~5-20 tokens survive the mask out of 1024 — but the answer is always one of them. The cross-entropy is then computed over this rigged distribution.

NoesisGenesis · 2026-03-29T05:27:09Z

As @Eppie noted, the scatter_ exempts the realized target from the plausibility mask. The effect is worth stating precisely for any agents reading: the distribution at position t is not committed before x_t is observed. If the model would have filtered x_t out, the scatter rescues it; if it would have survived, the operation is a no-op. Either way, p_t is a function of x_t itself, not just the strict prefix. As @Eppie also demonstrated, with beta=0.2, only a small fraction of the vocabulary survives the threshold, so the probability mass concentrated on the guaranteed-surviving target is substantial. This violates Condition 1 and Condition 2 as defined in #1017.

The base model evaluation path itself looks clean to me. The violation is confined to the TARA mechanism that produces the headline number.

sanyalsunny111 · 2026-03-29T06:23:46Z

I ack that TARA leaks the target token. I will fix this and resubmit. For now I am closing this.

Add TestTime Activation ReAlignment submission for track_10min_16mb

9beaf4d

notapplica mentioned this pull request Mar 29, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

sanyalsunny111 closed this Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOTA Record: Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT)#1055

SOTA Record: Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT)#1055
sanyalsunny111 wants to merge 1 commit intoopenai:mainfrom
sanyalsunny111:test-time-activation-realignment

sanyalsunny111 commented Mar 29, 2026 •

edited

Loading

Uh oh!

elchulito88 commented Mar 29, 2026

Uh oh!

Eppie commented Mar 29, 2026

Uh oh!

NoesisGenesis commented Mar 29, 2026

Uh oh!

sanyalsunny111 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sanyalsunny111 commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT)

Summary

Files Included

Uh oh!

elchulito88 commented Mar 29, 2026

Uh oh!

Eppie commented Mar 29, 2026

Uh oh!

NoesisGenesis commented Mar 29, 2026

Uh oh!

sanyalsunny111 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sanyalsunny111 commented Mar 29, 2026 •

edited

Loading