Skip to content

Rat Rod Green SOTA: 1.1129 sliding / 0.4489 ngram#2

Open
newjordan wants to merge 1 commit intomainfrom
ratrod-green-sota-1.1129
Open

Rat Rod Green SOTA: 1.1129 sliding / 0.4489 ngram#2
newjordan wants to merge 1 commit intomainfrom
ratrod-green-sota-1.1129

Conversation

@newjordan
Copy link
Copy Markdown
Owner

Summary

Base model optimization campaign on 8xH100 (600s wallclock). Rat Rod Green achieves 1.1129 BPB sliding window / 0.4489 BPB n-gram eval — our strongest neural model to date.

Config: Parallel Muon (PR#609) + XSA-all-11 + BigramHash 2048 + RoPE 16 + SWA 50 + entropy-adaptive n-gram eval (orders 2-9). Steps: 6882 at 87.20ms/step.

Ablation log (2026-03-26 → 2026-03-27)

All experiments A/B tested against v1 (1.1129). Nine levers tested — none beat baseline:

# Lever Result Delta
1 TRIGRAM=1 (v2) WASH +0.0003
2 LATE_QAT_THRESHOLD=0 (v2) WASH ~0
3 Synapse v1 / CPU n-gram bridge (v4) DEAD +15ms overhead (102ms/step)
4 Synapse v2 / GPU-native hash bridge (v5) DEAD +0.017 sliding, +0.005 ngram
5 VALUE_RESIDUAL=1 (200s) WORSE +0.0012 sliding
6 WARMDOWN_ITERS=2000 (200s) WINNER at 200s −0.0087 sliding (not yet validated at 600s)
7 SWA_EVERY=100 (200s) WASH +0.0005
8 Siphon / ensemble-objective loss α=0.50 DEAD +0.151 catastrophic
9 COMPLEMENT_ALPHA=0.5 (v7, 600s) WORSE +0.004 sliding

Key findings

  • 87ms/step is near H100 hardware ceiling — gains must come from algorithmic changes
  • Loss modification doesn't work for our architecture — both Siphon (ensemble loss) and complement weighting failed. Model learns best with uniform CE.
  • Warmdown shape matters — 2000 >> 3500 >> 5000 at 200s, pending 600s validation

Next steps — pressing into novel territory

  • Warmdown schedule shapes (Jitter/Swirl/Cascade — zero ms/step cost)
  • N-gram eval parameter sweep on existing checkpoints (free BPB)
  • Learned mixer head (frontier technique from PR#834/#859)
  • Architecture exploration: gated attention, MTP variants

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant