Skip to content

Record: [track_10min_16mb] XSA7 + BigramHash + ValueResidual + Legal TTT — val_bpb=1.1227#1182

Open
adityakm24 wants to merge 1 commit intoopenai:mainfrom
adityakm24:submission/run36-1.1227
Open

Record: [track_10min_16mb] XSA7 + BigramHash + ValueResidual + Legal TTT — val_bpb=1.1227#1182
adityakm24 wants to merge 1 commit intoopenai:mainfrom
adityakm24:submission/run36-1.1227

Conversation

@adityakm24
Copy link
Copy Markdown

@adityakm24 adityakm24 commented Mar 31, 2026

Summary

  • 11-layer parameter-banking GPT with XSA on 7 layers, BigramHash(2048×96) + TrigramHash(1024×128), ValueResidual, ValueEmbedding at layers 5/9/10, LeakyReLU(0.5)², int6+lzma compression, and legal score-first TTT
  • Best run: val_bpb=1.12265 (legal TTT), val_bpb=1.12468 (sliding window stride=64)
  • 3-seed mean: val_bpb=1.12327 ± 0.00082
  • Artifact: 15,944,685 bytes (under 16,000,000 cap)
  • Training: 600s on 8×H100 SXM (~90.8 ms/step, 6,487 steps)
  • Evaluation: <600s (quant roundtrip ~90s + sliding window ~300s + TTT ~475s)

3-Seed Evidence

Seed Steps legal_ttt_val_bpb final_val_bpb (sliding window)
1337 6,487 1.12265 1.12468
2025 6,547 1.12295 1.12514
27182 6,281 1.12421 1.12616
Mean 1.12327 1.12533
Std 0.00082 0.00075

Submission Checklist

  • One new folder under records/track_10min_16mb/
  • Included README.md
  • Included submission.json
  • Included train_gpt.py
  • Included train.log
  • Artifact ≤ 16,000,000 bytes (15,944,685)
  • Training completes in <600s on 8×H100 SXM (600,069 ms)
  • Evaluation completes in <600s on 8×H100 SXM
  • Legal score-first TTT (no two-pass full-rescore leakage)
  • No tokenizer/dataset modifications
  • 3-seed statistical evidence provided
  • No other files modified

Key Techniques

  • Flash Attention 3 (Hopper kernel) for ~90ms/step
  • Parallel Muon optimizer with parameter banking and batched Newton-Schulz
  • Cross-Sequence Attention (XSA) on last 7 layers
  • BigramHash + TrigramHash n-gram hash embeddings
  • Value Residual (ResFormer-style) connections
  • Value Embedding token identity reinjection
  • SWA + EMA + Late QAT for quantization-friendly convergence
  • Legal score-first TTT with SGD (lr=0.002, 4 epochs, all blocks unfrozen)

11-layer parameter-banking GPT with XSA on 7 layers, BigramHash(2048),
TrigramHash(1024), ValueResidual, ValueEmbedding, int6+lzma compression,
and legal score-first TTT. 3-seed mean val_bpb=1.12327 on 8xH100 under
600s training + 600s eval budget. Artifact size 15,944,685 bytes.

Made-with: Cursor
@adityakm24 adityakm24 changed the title [track_10min_16mb] XSA7 + BigramHash + ValueResidual + Legal TTT — val_bpb=1.1227 RECORD [track_10min_16mb] XSA7 + BigramHash + ValueResidual + Legal TTT — val_bpb=1.1227 Mar 31, 2026
@adityakm24 adityakm24 changed the title RECORD [track_10min_16mb] XSA7 + BigramHash + ValueResidual + Legal TTT — val_bpb=1.1227 Record: [track_10min_16mb] XSA7 + BigramHash + ValueResidual + Legal TTT — val_bpb=1.1227 Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant