Skip to content

Record: 11L + Partial XSA + TTT + BatchOpt (val_bpb=1.1354)#290

Open
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:record-xsa-ttt-submission
Open

Record: 11L + Partial XSA + TTT + BatchOpt (val_bpb=1.1354)#290
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:record-xsa-ttt-submission

Conversation

@ibarrajo
Copy link
Copy Markdown

Summary

  • val_bpb: 1.1354 (sliding window, stride=64)
  • 15.85 MB artifact (int6 + zstd-22, under 16MB)
  • 8xH100 SXM, 8,945 steps in 600s + 132s eval

Approach

Four improvements stacked on the PR #198 base:

  1. Partial XSA (last 3 layers) — efficient GQA-aware self-attention debiasing (PR Record: 11L + Efficient Partial XSA (val_bpb: 1.1307)  #265, arXiv:2603.09078)
  2. TTT (3-epoch full-model SGD, freeze first 2 blocks) — eval-time adaptation (PR Record: FarnsworthEngine v1 — TTT + 11L Int6 MLP3x, val_bpb=1.1303 #254)
  3. Batch=524K — 22% more gradient updates (PR Record: 11L Int6 + SmearGate + Batch Optimization (val_bpb=1.1400) #236 finding)
  4. RoPE base 50K — extended positional encoding (PR Record: Int6 STE + SmearGate + Seq2048 + OrthoInit + RoPE50K + SWA/100 (mean val_bpb=1.1507) #206)

Key Metrics

Metric Value
Sliding val_bpb (stride=64) 1.1354
Standard roundtrip val_bpb 1.1583
Artifact size 15,851,371 bytes
Training steps 8,945
TTT time 50s
Eval time 80s

Note

Uses PyTorch SDPA fallback (FA3 not in RunPod image — see #280). With FA3, expect ~600 more training steps and slightly better BPB.

Test plan

  • Artifact under 16MB (15.85MB)
  • Trains in 600s on 8xH100
  • Eval completes in <600s (132s total)
  • Post-quant roundtrip verified
  • train_gpt.py runs from records/ folder
  • Train log included
  • Multi-seed validation (budget constrained — single seed)

🤖 Generated with Claude Code

Stacks Partial XSA (last 3 layers), TTT (3-epoch SGD), batch=524K,
and RoPE50K on the PR openai#198 base. 8,945 steps on 8xH100 in 600s.
15.85MB artifact (int6+zstd-22).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant