Record: SLOT + LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1154 (3-seed mean) val_bpb = 1.1154 (3-seed mean, std 0.0002) | ~15.9 MB | 8×H100 SXM#1128
Conversation
…PB (1.1185, 3-seed mean)
…result on PR openai#549 stack First SLOT (Sample-specific LM Optimization at Test-time) entry in Parameter Golf. SLOT optimizes a delta vector at the last hidden layer inside the TTT scoring loop. SLOT results (3-seed): seed 1337: 1.1188 BPB | seed 42: 1.1185 BPB | seed 2025: 1.1183 BPB mean: 1.1185 (std 0.0003) vs baseline 1.1193 — consistent -0.0008 improvement Also documents CTW as a negative result across 3 implementation iterations: v1 (naive n-gram lookup): +0.005 worse, 46 min eval v2 (proper recursive weighting + entropy gating): not runnable in time budget v3 (vectorized entropy gate): still worse, killed early Root cause: signal redundancy — transformer already captures all n-gram patterns Base: PR openai#549 by @abaybektursun (LeakyReLU² + Legal TTT + Parallel Muon)
…4 (3-seed mean) First SLOT (Sample-specific LM Optimization at Test-time) entry in Parameter Golf. Optimizes 512-dim delta vector at last hidden layer per-batch during TTT scoring. AdamW lr=0.003, 5 steps. Splits forward_logits() into forward_hidden() + compute_logits(). 3-seed results (8xH100 SXM): seed 1337: 1.1153 BPB | seed 42: 1.1156 BPB | seed 2025: 1.1153 BPB mean: 1.1154 (std 0.0002) | val_loss mean: 1.8833 vs SOTA PR openai#549: -0.0083 nats (>0.005 required) ✅ Base: PR openai#549 by @abaybektursun SLOT paper: Hu et al., arXiv:2505.12392v2
|
Hi @AnubhavBharadwaaj -- constructive observation about SLOT legality that might be worth considering. After reviewing the organizer's enforcement pattern on Issue #677, I noticed that SLOT may fall under the same "adapt on validation before the reported eval pass" pattern that led to 33+ PR closures (valerio-oai, 2026-03-27):
This differs from the legal score-first TTT in PR #549, where chunk N is scored first (under No organizer has ruled on SLOT specifically, so this may be fine -- but I wanted to flag it so the community can discuss before multiple PRs build on this technique. An organizer clarification on Issue #677 or #1017 would help everyone. (We had a SLOT-based submission at 1.1015 that we self-closed for this reason: PR #1172.) |
Record: SLOT + LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1154 (3-seed mean)
val_bpb = 1.1154 (3-seed mean, std 0.0002) | ~15.9 MB | 8×H100 SXM
3-Seed Results (8×H100 80GB SXM)
vs Previous SOTA (PR #549)
Key Innovation: SLOT (Sample-specific LM Optimization at Test-time)
First SLOT-based entry in Parameter Golf. SLOT optimizes a single additive δ ∈ ℝ^512 vector at the last hidden layer during TTT scoring, adapting the model's hidden-to-logit mapping per-batch.
Source: Hu et al., arXiv:2505.12392v2, "SLOT: Sample-specific Language Model Optimization at Test-time" (Westlake University, 2025)
How SLOT Works
The model's
forward_logits()is split intoforward_hidden()+compute_logits(). During TTT Phase 1 (scoring), SLOT optimizes δ between the two:Why SLOT Works
SLOT and TTT address complementary bottlenecks:
TTT gives SLOT better hidden states; SLOT gives TTT-adapted representations a final per-batch correction. The two stack because they operate at different granularities (chunk vs batch) and different model depths (all layers vs last layer only).
SLOT Properties
SLOT_ENABLED=0reproduces PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549 baseline exactlySLOT Hyperparameters
Hyperparameter Ablation (seed 1337)
Also Tested: CTW — Negative Result
Context Tree Weighting (Willems et al., 1995) was integrated and tested across three progressively improved implementations. All degraded BPB.
Root cause: The 11-layer transformer at 1.12 BPB already captures all n-gram patterns a depth-4 Markov model knows. Mixing in a weaker predictor adds noise regardless of implementation quality.
Also Tested: Stacking Hacks — Negative Results
Base Architecture (PR #549 by @abaybektursun)
Run Command
Credits
@0hq or @valerio-oai
Hey @0hq, I've applied for the Development grant several times but no response yet. GitHub: AnubhavBharadwaaj. Could you help check the status?