Skip to content

Ternary Universal Transformer — 15.6MB, bfloat16, Muon optimizerAdd ternary Universal Transformer submission#216

Open
alons23 wants to merge 1 commit intoopenai:mainfrom
alons23:main
Open

Ternary Universal Transformer — 15.6MB, bfloat16, Muon optimizerAdd ternary Universal Transformer submission#216
alons23 wants to merge 1 commit intoopenai:mainfrom
alons23:main

Conversation

@alons23
Copy link
Copy Markdown

@alons23 alons23 commented Mar 20, 2026

68M-param Universal Transformer with ternary weights {-1,0,+1}, Muon optimizer, QK-Norm, RoPE, FlashAttention-2, bfloat16. NB=4 blocks x NR=6 recurrences = 24 effective layers. Artifact size: ~15.6MB. Test run val_bpb ~0.810 on 1xH100.

68M-param Universal Transformer with ternary weights {-1,0,+1},
Muon optimizer, QK-Norm, RoPE, FlashAttention-2, bfloat16.
NB=4 blocks x NR=6 recurrences = 24 effective layers.
Artifact size: ~15.6MB. Test run val_bpb ~0.810 on 1xH100.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant