Non-record: MoE exploration + multi-bit quantization analysis by imyesung · Pull Request #480 · openai/parameter-golf

imyesung · 2026-03-23T00:25:02Z

Summary

Non-record submission with two negative results under the 16MB artifact cap:

Preliminary MoE negative result: a 2-expert soft-routing MoE (2 × 1.5x MLP) underperforms the dense control throughout the observed training window. I added moe_train_partial.log, the surviving partial 8xH100 SXM log; the RunPod pod died at step 2000, so the MoE conclusion should be interpreted as preliminary rather than a fully converged final result.
Leaderboard-relevant multi-bit quantization comparison: the dense control reaches 1.1456 val_bpb, which is within 0.0028 BPB of the March 20, 2026 leaderboard leader (1.1428). On that same trained dense model, int5 MLP costs +0.0068 BPB while int4 MLP costs +0.0655 BPB, making aggressive quantization unattractive for MoE expansion at this scale.

Included evidence

README.md with updated explanation and MoE-vs-dense checkpoint table
submission.json with updated metadata
train.log for the dense control / quantization comparison
moe_train_partial.log for the surviving MoE run
train_gpt.py
quant_comparison.png

Quantization Comparison Results

Config	Attn	MLP	Artifact	val_bpb	vs baseline
attn6_mlp6	int6	int6	15.14 MB	1.1456	baseline
attn6_mlp5	int6	int5	13.39 MB	1.1524	+0.0068
attn6_mlp4	int6	int4	11.51 MB	1.2111	+0.0655
attn5_mlp5	int5	int5	13.05 MB	1.1559	+0.0103
attn5_mlp4	int5	int4	11.29 MB	1.2183	+0.0727

MoE Observed Checkpoints

Step	Dense control val_bpb	MoE val_bpb	Delta
500	1.4058	1.4115	+0.0057
1000	1.3286	1.3386	+0.0100
1500	1.3024	1.3163	+0.0139
2000	1.2709	1.2866	+0.0157

…n analysis Negative result showing MoE is structurally disadvantaged below 500M params under 16MB constraint. Multi-bit quantization comparison (int4/5/6) on same trained dense model demonstrates int4 MLP incurs +0.065 BPB degradation, closing the MoE parameter expansion path.

imyesung added 2 commits March 23, 2026 09:17

fix: Update README format and chart styling

d7da256

notapplica mentioned this pull request Mar 23, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

fix: Add MoE evidence and leaderboard relevance

85f3399

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: MoE exploration + multi-bit quantization analysis#480

Non-record: MoE exploration + multi-bit quantization analysis#480
imyesung wants to merge 3 commits intoopenai:mainfrom
imyesung:moe-quant-analysis

imyesung commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

imyesung commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Included evidence

Quantization Comparison Results

MoE Observed Checkpoints

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

imyesung commented Mar 23, 2026 •

edited

Loading