Ternary Universal Transformer — 15.6MB, bfloat16, Muon optimizerAdd ternary Universal Transformer submission by alons23 · Pull Request #216 · openai/parameter-golf

alons23 · 2026-03-20T14:27:30Z

68M-param Universal Transformer with ternary weights {-1,0,+1}, Muon optimizer, QK-Norm, RoPE, FlashAttention-2, bfloat16. NB=4 blocks x NR=6 recurrences = 24 effective layers. Artifact size: ~15.6MB. Test run val_bpb ~0.810 on 1xH100.

Add ternary Universal Transformer submission

039d987

68M-param Universal Transformer with ternary weights {-1,0,+1}, Muon optimizer, QK-Norm, RoPE, FlashAttention-2, bfloat16. NB=4 blocks x NR=6 recurrences = 24 effective layers. Artifact size: ~15.6MB. Test run val_bpb ~0.810 on 1xH100.

notapplica mentioned this pull request Mar 20, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

rosikand mentioned this pull request Mar 25, 2026

Ideas braindump rosikand/parameter-golf#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ternary Universal Transformer — 15.6MB, bfloat16, Muon optimizerAdd ternary Universal Transformer submission#216

Ternary Universal Transformer — 15.6MB, bfloat16, Muon optimizerAdd ternary Universal Transformer submission#216
alons23 wants to merge 1 commit intoopenai:mainfrom
alons23:main

alons23 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alons23 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant