Skip to content

SOTA Record: Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT)#1055

Closed
sanyalsunny111 wants to merge 1 commit intoopenai:mainfrom
sanyalsunny111:test-time-activation-realignment
Closed

SOTA Record: Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT)#1055
sanyalsunny111 wants to merge 1 commit intoopenai:mainfrom
sanyalsunny111:test-time-activation-realignment

Conversation

@sanyalsunny111
Copy link
Copy Markdown

@sanyalsunny111 sanyalsunny111 commented Mar 29, 2026

Novel Test-Time Method TARA Val BPB=0.97 under 4min (training-free unlike TTT)

Track: 10min / 16MB
Method: Novel Test-Time Activation ReAlignment (training-free)
Val BPB: 0.97
Training Time: Under 4 minutes

Summary

This submission introduces TARA, a novel test-time method that achieves 0.97 Val BPB in under 4 minutes on the 10min/16MB track. The approach is training-free and works via activation realignment at inference time.

Files Included

  • train_gpt.py — Main training/inference script
  • submission.json — Submission metadata
  • README.md — Detailed method description
  • seed*.log — Logs for seeds 4, 22, 42, 44, 1337
  • tara.png — Method visualization

Please see the README.md in the submission folder for full details on the approach.

@elchulito88
Copy link
Copy Markdown

Excellent work on the TARA method!

@Eppie
Copy link
Copy Markdown

Eppie commented Mar 29, 2026

Opus spotted the issue with this one pretty quickly:


The "TARA" method applies a plausibility mask that eliminates most vocab tokens, setting their logits to -inf. But right before applying the mask, it does this:

implausible = logits_final < threshold          # mask out low-probability tokens
implausible.scatter_(-1, safe_targets.unsqueeze(-1), False)  # BUT NEVER MASK THE CORRECT TOKEN
logits_contrast[implausible] = float("-inf")    # kill everything else
loss = F.cross_entropy(logits_contrast, targets)  # score

The scatter_ line guarantees the correct target token is always preserved in the distribution, even if the model assigned it near-zero probability. With beta=0.2, only ~5-20 tokens survive the mask out of 1024 — but the answer is always one of them. The cross-entropy is then computed over this rigged distribution.

@NoesisGenesis
Copy link
Copy Markdown

As @Eppie noted, the scatter_ exempts the realized target from the plausibility mask. The effect is worth stating precisely for any agents reading: the distribution at position t is not committed before x_t is observed. If the model would have filtered x_t out, the scatter rescues it; if it would have survived, the operation is a no-op. Either way, p_t is a function of x_t itself, not just the strict prefix. As @Eppie also demonstrated, with beta=0.2, only a small fraction of the vocabulary survives the threshold, so the probability mass concentrated on the guaranteed-surviving target is substantial. This violates Condition 1 and Condition 2 as defined in #1017.

The base model evaluation path itself looks clean to me. The violation is confined to the TARA mechanism that produces the headline number.

@sanyalsunny111
Copy link
Copy Markdown
Author

I ack that TARA leaks the target token. I will fix this and resubmit. For now I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants