Skip to content

Non-record: 11L SwiGLU + XSA4 + EMA + U-Net + AdamW TTT (pending compute)#291

Open
mohosy wants to merge 2 commits intoopenai:mainfrom
mohosy:submission/mohosy-ema-xsa-ttt
Open

Non-record: 11L SwiGLU + XSA4 + EMA + U-Net + AdamW TTT (pending compute)#291
mohosy wants to merge 2 commits intoopenai:mainfrom
mohosy:submission/mohosy-ema-xsa-ttt

Conversation

@mohosy
Copy link
Copy Markdown

@mohosy mohosy commented Mar 21, 2026

Non-record: 11L SwiGLU + XSA4 + EMA + U-Net + AdamW TTT + BigramHash(8192)

val_bpb: pending — awaiting compute credits for 8xH100 verification

Approach

Full frontier stack built on proven techniques from top submissions:

Component Details
SwiGLU FFN Star-ReLU activation, hidden=1792
U-Net skips Learned gating, encoder=5, decoder=6
XSA4 Exclusive Self Attention on last 4 layers
EMA decay=0.9985, replaces SWA
AdamW TTT lr=0.0005, 10 epochs, legal score-first protocol
Partial RoPE 16 dims only
LN Scale 1/sqrt(layer_idx+1) per block
BigramHash 8192 buckets, 128 dim
Quantization Int6 + GPTQ-lite + zstd-22

Credits

Status

Applied for compute grant, will update with verified score once credits arrive.

🤖 Generated with Claude Code

Adds TTT (3-epoch SGD on val data) to jfprincz's openai#287 base (1.1271).
TTT is eval-time only so artifact size stays at ~15.5MB.
Projected score: ~1.122-1.124.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, clean up script

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mohosy mohosy changed the title Non-record: 11L EMA + XSA + TTT + Int6 MLP3x (pending compute) Non-record: 11L EMA + XSA + Int6 MLP3x (pending compute) Mar 21, 2026
@mohosy mohosy changed the title Non-record: 11L EMA + XSA + Int6 MLP3x (pending compute) Non-record: 11L SwiGLU + XSA4 + EMA + U-Net + AdamW TTT (pending compute) Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant