Non-record: BitNet b1.58 + depth recurrence + NorMuon (1.7510 BPB, 3.78 MB) by Athenox14 · Pull Request #126 · openai/parameter-golf

Athenox14 · 2026-03-19T20:19:36Z

Non-record submission: BitNet b1.58 + Depth Recurrence + NorMuon

val_bpb (ternary roundtrip): 1.7510 | 3.78 MB | Unlimited compute (~3h, 1×RTX 3060)

This submission explores combining BitNet b1.58 ternary quantization with depth recurrence to maximize model capacity within the 16 MB artifact limit.

Key ideas

Ternary packing (2 bits/weight): storing weights as {-1, 0, +1} packed 4-per-byte then zlib-compressed yields a 3.74 MB model — leaving 4× more parameter budget than an equivalent int8+zlib approach, enabling much larger models within the size limit.

Depth recurrence + U-Net skips:

4 unique transformer blocks run 3× each = 12 effective layers, with learnable skip connections between encoder and decoder halves.
A per-block resid_mix parameter lets each recurrence pass blend the current hidden state with the original embedding, allowing blocks to specialize by depth despite shared weights.

NorMuon: Muon optimizer with per-neuron row-wise RMS normalization after Newton-Schulz orthogonalization, replacing the uniform scaling heuristic.

Sequence length warmup + YaRN: geometric warmup 128→1024 over 2000 steps with NTK-aware RoPE base scaling to stabilize early training.

Limitations & next steps

A significant quantization gap exists (pre-quant 1.4866 → post-quant 1.7510, Δ=+0.264 BPB), indicating the QAT does not sufficiently push latent weights toward {-1, 0, +1}.
Follow-up runs add a ternary commitment loss to address this, and scale to ~60M unique parameters (still within the 16 MB budget).

…78 MB)

Non-record: BitNet b1.58 + depth recurrence + NorMuon (1.7510 BPB, 3.…

395a7e2

…78 MB)

mohosy mentioned this pull request Mar 19, 2026

Non-record: Muon-Aware QAT + LAWA + Adaptive LR Scheduling (7 toggleable improvements) #130

Open

notapplica mentioned this pull request Mar 19, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: BitNet b1.58 + depth recurrence + NorMuon (1.7510 BPB, 3.78 MB)#126

Non-record: BitNet b1.58 + depth recurrence + NorMuon (1.7510 BPB, 3.78 MB)#126
Athenox14 wants to merge 1 commit intoopenai:mainfrom
Athenox14:submission/bitnet158-depth-recurrence

Athenox14 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Athenox14 commented Mar 19, 2026

Non-record submission: BitNet b1.58 + Depth Recurrence + NorMuon

Key ideas

Limitations & next steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant