Non-record: BitNet b1.58 - 68M ternary params, val_bpb=1.1770, systematic analysis of ternary limitations by ksang123 · Pull Request #367 · openai/parameter-golf

ksang123 · 2026-03-21T20:21:40Z

Improves on PR #139 (1.2029 → 1.1770). 68M ternary {-1,0,1} params packed at 1.6 bits/param in 15.88MB via base-3 encoding.

Key findings:

The entire standard competition stack (XSA, SmearGate, BigramHash, OrthoInit, WD, EMA/SWA, TTT) either breaks or doesn't help ternary models
XSA and weight decay cause complete training plateaus at val_loss 2.4 — ternary is a fundamentally different optimization regime
Near-lossless quantization roundtrip (0.0016 BPB gap) via fp16 scale simulation during training
Ternary prefers higher LR (0.04 vs 0.025), no regularization, and longer warmdown — the opposite of int6 best practices
Suggests int4 with late QAT as an unexplored middle ground: 50% more params than int6 with near-zero quant gap

Full writeup with negative results table in the README.

…ding window) BitNet b1.58 ternary quantization with full-training STE. 68M params in 15.88MB via base-3 packing (1.6 bits/param). Near-lossless roundtrip (0.0016 BPB gap). Systematic analysis of why the standard competition stack breaks for ternary: - XSA, weight decay, grad clipping: cause training plateau at 2.4 - SmearGate, BigramHash, OrthoInit: hurt or no effect - EMA/SWA: fundamentally incompatible - TTT: no improvement on ternary models What works: higher LR (0.04), wider MLP, fp16 scale simulation, longer warmdown. Improves on PR openai#139 (1.2029 → 1.1770).

…er Value Residual, Base-3 ternary encoding, and tuned hyperparameters

notapplica mentioned this pull request Mar 21, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

HenriqueTecas pushed a commit to HenriqueTecas/parameter-golf that referenced this pull request Mar 23, 2026

Implement PR openai#367 optimizations: BigramHash, SmearGate, ResForm…

0bdebc1

…er Value Residual, Base-3 ternary encoding, and tuned hyperparameters

ksang123 mentioned this pull request Mar 25, 2026

Non-record: BitNet b1.58 — 65M ternary params beat 4-hour baseline in 10 minutes (val_bpb=1.2029) #139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: BitNet b1.58 - 68M ternary params, val_bpb=1.1770, systematic analysis of ternary limitations#367

Non-record: BitNet b1.58 - 68M ternary params, val_bpb=1.1770, systematic analysis of ternary limitations#367
ksang123 wants to merge 1 commit intoopenai:mainfrom
ksang123:bitnet-68M-systematic-analysis

ksang123 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ksang123 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant