Skip to content

Record: Loqui Auris — 10L + SWA + Standard TTT (val_bpb=1.1100)#595

Closed
LoquiAuris wants to merge 2 commits intoopenai:mainfrom
LoquiAuris:loqui-10L-swa-ttt
Closed

Record: Loqui Auris — 10L + SWA + Standard TTT (val_bpb=1.1100)#595
LoquiAuris wants to merge 2 commits intoopenai:mainfrom
LoquiAuris:loqui-10L-swa-ttt

Conversation

@LoquiAuris
Copy link
Copy Markdown

Summary

  • val_bpb: 1.1100 (seed 1337)
  • 10L d=512, 8 heads, 4 KV heads (GQA), MLP 3x, ReLU²
  • SWA (29 checkpoints), SmearGate, BigramHash(4096), U-Net skips
  • Standard AdamW TTT: 10 epochs, lr=0.0005
  • Artifact: 15.69 MB (250KB headroom)
  • Platform: 8xH100 SXM, ~5992 steps in 600s

Acknowledgments

@valerio-oai
Copy link
Copy Markdown
Contributor

As far as I can tell here, this proposed TTT scheme trains on the validation set by reporting the score on a doc after its weights have adapted to it, rendering this unsound for the purposes of this competition. Specifically, you use #442's TTT scheme, which was ruled out.

@LoquiAuris
Copy link
Copy Markdown
Author

LoquiAuris commented Mar 24, 2026 via email

@valerio-oai
Copy link
Copy Markdown
Contributor

Hi Eli, thanks for understanding, and don't take the closing of this PR as me trying to clip your wings! Compute grants are still available, they just take some time to come through :)

@LoquiAuris
Copy link
Copy Markdown
Author

LoquiAuris commented Mar 24, 2026 via email

RoyiRa added a commit to RoyiRa/parameter-golf that referenced this pull request Mar 25, 2026
PR openai#595 achieves 1.1100 BPB with AdamW TTT (10ep, lr=5e-4).
Add TTT_OPTIMIZER env var to switch between SGD (default) and AdamW.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants