Add LeakyReLU² + 4ep Legal TTT submission by yufengli-oai · Pull Request #1039 · openai/parameter-golf

yufengli-oai · 2026-03-28T21:37:17Z

Summary

A solution generated by codex, not sure about its performance

add a new 1.1189 bpb (3-seed mean, std 0.0006) record submission based on 2026-03-23_LeakyReLU_LegalTTT_ParallelMuon
increase legal TTT to lr=0.0025 and 4 epochs
skip diagnostic pre-TTT evals to keep eval under 10 minutes
add eval-only checkpoint loading for fast TTT sweeps

Validation

seed 2025: 1.11835341 bpb, 83.8ms/step, 545.4s TTT
seed 1337: 1.11903472 bpb, 83.9ms/step, 548.2s TTT
seed 42: 1.11944510 bpb, 84.0ms/step, 541.6s TTT
3-seed mean: 1.11894441 bpb, std 0.00055142

PR openai#1039 claims 1.1184 BPB with just TTT_LR=0.0025, TTT_EPOCHS=4 (vs SOTA's 0.002/3ep). This is a potential record from a 2-line change. TTT sweep now tests 4 configs: A: SOTA (lr=0.002, 3ep) — baseline reproduction B: PR openai#1039 (lr=0.0025, 4ep) — claimed 1.1184 BPB C: 5 epochs (lr=0.002, 5ep) — deeper adaptation D: Aggressive (lr=0.003, 4ep) — higher LR + more epochs Also from PR review: - DeltaNet "Medusa" achieves 0.77 BPB single seed (different arch) - Bayesian posterior packets show early TTT chunks hit 1.109 then drift - Block 7 c_k has kurtosis 11.9 (quantization outlier) - AdamW TTT confirmed catastrophic (SGD is correct) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR openai#1043 found early TTT chunks achieve 1.109 BPB (below SOTA!) but accumulated SGD updates cause drift to 1.126 by late chunks. Fix: periodically reset model weights to the original checkpoint. This prevents catastrophic drift while preserving local adaptation. Implementation: - TTT_RESET_EVERY=N: reset weights every N chunks (0=disabled) - Resets both weights and optimizer momentum state - Uses in-place copy (no reallocation, parameter references preserved) H100 sweep now tests 11 configurations: 6 temperatures × sliding eval 5 TTT configs: A: SOTA baseline (lr=0.002, 3ep) B: PR openai#1039 (lr=0.0025, 4ep) C: 5 epochs (lr=0.002, 5ep) D: PR openai#1039 + reset/100 (anti-drift) E: PR openai#1039 + reset/50 (anti-drift) If early chunks consistently hit 1.109 and reset prevents drift, the mean across all chunks could drop from 1.119 toward 1.110-1.114. That's record territory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Competition moved while we were experimenting locally: PR openai#634: 1.1178 BPB (Full GPTQ + XSA-all + selective pruning) PR openai#1060: 1.1122 BPB (+ coprime loader + BigramHash 2816) Our contribution: TTT periodic reset on the PR openai#1060 base. PR openai#1060 found TTT unnecessary with Full GPTQ, but they didn't test TTT with anti-drift reset. If TTT drift was the reason it stopped helping, reset could unlock further gains. Files: train_gpt_ours.py — PR openai#1060 + TTT reset mechanism train_gpt_pr634.py — Full GPTQ reference (for study) train_gpt_pr1060.py — Original PR openai#1060 (for comparison) run_h100.sh — Train once, sweep 4 TTT configs TTT configs tested: A: SOTA (lr=0.002, 3ep) — baseline TTT B: PR openai#1039 (lr=0.0025, 4ep) — tuned TTT C: B + reset/100 — anti-drift, moderate D: B + reset/50 — anti-drift, aggressive Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…11l-sdpa # Conflicts: # README.md

Add 4ep LeakyReLU legal TTT submission

d929917

Add 3-seed logs for 4ep legal TTT submission

8fc315d

yufengli-oai marked this pull request as ready for review March 30, 2026 19:03

Merge remote-tracking branch 'origin/main' into codex/parameter-golf-…

39f7ba7

…11l-sdpa # Conflicts: # README.md

yufengli-oai requested a review from valerio-oai March 30, 2026 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LeakyReLU² + 4ep Legal TTT submission#1039

Add LeakyReLU² + 4ep Legal TTT submission#1039
yufengli-oai wants to merge 3 commits intomainfrom
codex/parameter-golf-11l-sdpa

yufengli-oai commented Mar 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yufengli-oai commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yufengli-oai commented Mar 28, 2026 •

edited

Loading