Skip to content

Record: 11L XSA4 + EMA + LoRA TTT + Partial RoPE + dim480 — val_bpb 1.13112 (3-seed)#1127

Open
dentity007 wants to merge 1 commit intoopenai:mainfrom
dentity007:submission/dentity007-dim480-1.1311
Open

Record: 11L XSA4 + EMA + LoRA TTT + Partial RoPE + dim480 — val_bpb 1.13112 (3-seed)#1127
dentity007 wants to merge 1 commit intoopenai:mainfrom
dentity007:submission/dentity007-dim480-1.1311

Conversation

@dentity007
Copy link
Copy Markdown

11L XSA4 + EMA + LoRA TTT + Partial RoPE + GPTQ-lite (dim480)

val_bpb: 1.13112 (3-seed mean, std 0.00051) | ~15.5 MB | 8×H100 SXM Reykjavík Iceland

PR #462 architecture compressed to fit 16MB with MODEL_DIM=480.

3-seed validation

Seed Val BPB Steps ms/step Size
1337 1.13041826 7,847 76.47 15,489,698
42 1.13161931 7,958 75.40 15,462,345
7 1.13133583 7,935 75.62 15,436,056
Mean 1.13112 (std 0.00051)

Key components

  • 11L U-Net, MODEL_DIM=480, NUM_KV_HEADS=4, MLP_HIDDEN=1536 (3×)
  • EMA decay=0.9985
  • Partial RoPE (16/64 dims)
  • Late QAT int6 STE (threshold 0.15)
  • Single-pass LoRA TTT (rank=8, lr=0.01, 1 epoch)
  • XSA on deepest 4 layers
  • BigramHash (8192 buckets, dim=128) + SmearGate
  • int6 + zstd-22 compression (3.83× ratio)

Compliance

  • Training: ≤600s on 8×H100 SXM
  • Artifact: ~15.5MB (under 16,000,000 bytes)
  • 3-seed verified

Note on reproduction

Current runpod/parameter-golf:latest (PyTorch 2.9.1+cu128) requires manual FA3 install:

pip install flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant