Non-record: BitNet b1.58 — 65M ternary params beat 4-hour baseline in 10 minutes (val_bpb=1.2029) by ksang123 · Pull Request #139 · openai/parameter-golf

ksang123 · 2026-03-19T23:09:26Z

BitNet b1.58: 64.5M Ternary Parameters in 15.1MB

val_bpb: 1.2029 (post-roundtrip) | 15.11 MB | 8×H100, 10 minutes

The idea

The baseline's 17.1M params are saturated long before the wallclock runs out (T/N ≈
424×). Chinchilla says: fit more parameters. BitNet b1.58 lets me pack 64.5M ternary
{-1, 0, 1} parameters into 15.1MB via base-3 encoding at 1.6 bits/param — 3.8× more
params than the baseline in the same artifact size.

What's different

BitLinear layers throughout: ternary weight quantization with STE in every
forward pass
Near-Zero quantization gap (0.002 BPB): model trains quantized, so the roundtrip is
lossless. No post-training quantization fight.
fp16 scale simulation during training (.half().float()) ensures the stored
scales match what the model saw
Base-3 packing: 5 trits per byte, lossless, near the theoretical 1.585 bits/
trit minimum
No sliding window, no eval tricks

Results

Model	Params	Artifact	val_bpb	Quant gap	Time
Current SOTA (INT6+SW)	~20M	~15.4MB	1.1748	~0.01	10 min
Naive Baseline	17.1M	15.9MB	1.2244	0.007	10 min
4-Hour Baseline	17.1M	15.9MB	1.2074	0.033	4 hours
BitNet b1.58	64.5M	15.1MB	1.2029	0.002	10 min

A 10-minute ternary model beats 4 hours of full-precision training. The full write-
up with scaling law analysis is in the README.

…s 4h baseline in 10min via Chinchilla scaling

Copilot

Pull request overview

Adds a new non-record 16MB submission entry implementing BitNet b1.58 ternary-weight training + base-3 packing to fit ~64.5M params into a ~15.1MB artifact, along with the exact run script, logs, and write-up.

Changes:

Introduces a new train_gpt.py for this record that swaps core linear layers to ternary BitLinear (STE) and adds ternary/base-3 packing export + roundtrip eval.
Adds reproducibility artifacts: the 8×H100 run script, full training log, and submission.json metadata.
Adds a README write-up describing the scaling rationale and method.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`records/track_non_record_16mb/2026-03-19_BitNet158/train_gpt.py`	New BitNet training script with ternary layers and ternary artifact serialization.
`records/track_non_record_16mb/2026-03-19_BitNet158/run_8xh100.sh`	Reproduction script for the submission run.
`records/track_non_record_16mb/2026-03-19_BitNet158/train.log`	Captured output from the submission run.
`records/track_non_record_16mb/2026-03-19_BitNet158/submission.json`	Reported metrics and artifact sizing metadata.
`records/track_non_record_16mb/2026-03-19_BitNet158/README.md`	Method/results write-up for the submission.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

records/track_non_record_16mb/2026-03-19_BitNet158/train_gpt.py

records/track_non_record_16mb/2026-03-19_BitNet158/README.md

records/track_non_record_16mb/2026-03-19_BitNet158/train_gpt.py

…ding window) BitNet b1.58 ternary quantization with full-training STE. 68M params in 15.88MB via base-3 packing (1.6 bits/param). Near-lossless roundtrip (0.0016 BPB gap). Systematic analysis of why the standard competition stack breaks for ternary: - XSA, weight decay, grad clipping: cause training plateau at 2.4 - SmearGate, BigramHash, OrthoInit: hurt or no effect - EMA/SWA: fundamentally incompatible - TTT: no improvement on ternary models What works: higher LR (0.04), wider MLP, fp16 scale simulation, longer warmdown. Improves on PR openai#139 (1.2029 → 1.1770).

ksang123 · 2026-03-25T16:16:45Z

@0hq Looks like #139 and #367 may have been missed in the queue, neither got reviewed. #139 was the first competitive
ternary/BitNet submission (1.2029 BPB, day 1) and predates #640. Would appreciate a look when you get a chance.

CiprianFlorin-Ifrim · 2026-03-25T17:04:31Z

@ksang123 I feel like tagging my PR in this and also leaving a kind-of snarky comment on my own PR as "exactly what I hoped would happen when I submitted 139" is a bit disingenous, almost as if my work extends or builds upon yours. My work started as soon as the challenge was announced, and as I am the creator of the Bitnet Rust library/bitnet-llm, with a lot of private work and research done on Bitnets (and general int8 transformers) and their application on low power MCUs, it made sense to work on that for this challenge given the constraints. I simply did not want to submit something to "hold a place in the queue that I could update later" until it was complete, otherwise within 3h on the day of the challenge I could've been under 1.20 and mention how I was first. You can find a proper research document on the circa 250 runs I did here Results Document (formatted by Claude from my RESULTS.md file). Do not collate the two submissions.

ksang123 · 2026-03-25T17:22:19Z

@CiprianFlorin-Ifrim I didn't mean to imply your work built on mine, I know it wasn't. It was extremely good and I'm genuinely glad the ternary approach was pushed this far. Separately, I do wish my submissions had gotten reviewed.

CiprianFlorin-Ifrim · 2026-03-25T17:49:15Z

@CiprianFlorin-Ifrim I didn't mean to imply your work built on mine, I know it wasn't. It was extremely good and I'm genuinely glad the ternary approach was pushed this far. Separately, I do wish my submissions had gotten reviewed.

All love, it could be that the title gave the wrong idea? I see people using Record for the main leaderboard and non-record for the "I am uploading something now to have a PR here, but will update it later", which imo even adds pressure on the organisers to manager the hundreds of them. Mine had the specific leaderboard name, "notable non record runs". Also the time too, as it looks like they want the notable non record leaderboard to be just for weird stuff with unlimited compute.

Non-record: BitNet b1.58 — 64.5M ternary params, val_bpb=1.2029, beat…

5eea63a

…s 4h baseline in 10min via Chinchilla scaling

Copilot AI review requested due to automatic review settings March 19, 2026 23:09

Copilot started reviewing on behalf of ksang123 March 19, 2026 23:09 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

notapplica mentioned this pull request Mar 19, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

ksang123 mentioned this pull request Mar 21, 2026

Non-record: BitNet b1.58 - 68M ternary params, val_bpb=1.1770, systematic analysis of ternary limitations #367

Open

ksang123 mentioned this pull request Mar 25, 2026

Record Submission: 1.1570 BPB - 73.7M Ternary U-Net + NeoMuon + 4x relu²MLP + Factored Tied Emb + Poly5 Softcap + YaRN2048 + 8192BPE + FP8QAT + Bitmask-LZMA + Stride-16 Sliding #640

Merged

7 tasks

alexanderaperry-arch mentioned this pull request Mar 27, 2026

QAT x SWA Ablation: SWA sabotages QAT (-3.64 mBPB, 3-seed validated) #989

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: BitNet b1.58 — 65M ternary params beat 4-hour baseline in 10 minutes (val_bpb=1.2029)#139

Non-record: BitNet b1.58 — 65M ternary params beat 4-hour baseline in 10 minutes (val_bpb=1.2029)#139
ksang123 wants to merge 1 commit intoopenai:mainfrom
ksang123:pr-submission

ksang123 commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ksang123 commented Mar 25, 2026

Uh oh!

CiprianFlorin-Ifrim commented Mar 25, 2026

Uh oh!

ksang123 commented Mar 25, 2026

Uh oh!

CiprianFlorin-Ifrim commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ksang123 commented Mar 19, 2026

BitNet b1.58: 64.5M Ternary Parameters in 15.1MB

The idea

What's different

Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ksang123 commented Mar 25, 2026

Uh oh!

CiprianFlorin-Ifrim commented Mar 25, 2026

Uh oh!

ksang123 commented Mar 25, 2026

Uh oh!

CiprianFlorin-Ifrim commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants