Skip to content

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731

Open
pentxayc wants to merge 1 commit intoopenai:mainfrom
pentxayc:submission/hedge-mixer-vrl-1.0410
Open

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731
pentxayc wants to merge 1 commit intoopenai:mainfrom
pentxayc:submission/hedge-mixer-vrl-1.0410

Conversation

@pentxayc
Copy link
Copy Markdown

Summary

  • 1.0400 BPB (seed 42, 2 additional seeds pending)
  • 11L transformer (26.99M params) with Value Residual Learning (VRL), LeakyReLU(0.5)², XSA-4
  • 5-expert Hedge Mixer during eval: neural model + unigram + bigram + trigram (64K hashed) + entropy
  • Hedge algorithm (eta=0.1) with deferred between-chunk weight updates (legal score-first)
  • AdamW TTT (lr=0.0005) + Polyak EMA (decay=0.998) + byte-weighted loss + adaptive cosine LR
  • Freeze first 9/11 blocks during TTT, unfreeze last 2 + norms/scales
  • Int6 mixed quantization + lzma compression
  • Artifact: 15,999,919 bytes (under 16MB limit)
  • Training: 6104 steps in 600s on 8xH100 SXM
  • Eval (TTT + Hedge): 404s / 600s budget

Legality

All eval-time adaptations are strictly score-first:

  1. Hedge weights for chunk N computed from chunks 0..N-1 only (deferred update after all windows scored)
  2. N-gram tables updated after chunk scoring completes
  3. Polyak EMA uses fixed decay, no snapshot selection
  4. TTT trains only on already-scored chunks
  5. No validation data during training; no training data during evaluation

Test plan

  • Seed 42: 1.0400 BPB
  • Seed 1337: pending
  • Seed 2024: pending

🤖 Generated with Claude Code

5-expert Hedge Mixer (neural + unigram + bigram + trigram + entropy) with
deferred between-chunk weight updates, combined with AdamW TTT + Polyak EMA
+ byte-weighted loss + adaptive cosine LR on an 11L VRL + LeakyReLU² + XSA-4
base. Seed 42 = 1.0400 BPB. Two additional seeds pending.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant