Non-record: LLaDA-MDLM Diffusion — val_var_bpb 1.1465 (first diffusion to beat AR baseline) by agalimova · Pull Request #1100 · openai/parameter-golf

agalimova · 2026-03-29T21:47:50Z

Summary

val_var_bpb: 1.1465 (512 eval steps) | ~33M params | 1x NVIDIA GB10 (Project DIGITS)

First discrete diffusion model to beat the AR baseline (1.22 BPB) in parameter-golf. Beats previous best diffusion (PR #820, 1.625 BPB) by 0.47 BPB.

Results

Model	BPB
AR SOTA (merged #1)	1.1194
This (MDLM diffusion)	1.1465
AR baseline	1.2244
PR #820 MDLM	1.625
PR #905 prefix diffusion	1.859

Approach

MDLM (Sahoo et al. 2024) masked diffusion with log-linear noise schedule
11L 512d bidirectional transformer with adaLN timestep conditioning
Frozen visible-token logits in substitution parameterization
Antithetic time sampling, ReLU^2 activation, RoPE
Proper discrete absorbing-mask ELBO evaluation (not MC sampling)
6000 steps, AdamW, cosine warmdown

Key Findings (27 hyperparameter experiments)

Masking eps=0.1 >> 0.001: biggest single improvement for diffusion LMs
Wider > deeper at same param count (8L 640d > 14L 384d)
AR tricks that don't transfer: LeakyReLU^2, BigramHash, prefix conditioning
Eval method is critical: MC ELBO gave 2.41 BPB, discrete ELBO gave 1.15 on same model

Non-Record Reason

Trained on 1x NVIDIA GB10 (Project DIGITS), not 8xH100 SXM.

Test plan

Reproduce on 8xH100 SXM within 10-minute budget
Verify discrete ELBO with exact byte counting (currently uses ~4.3 bytes/token approximation)
Compare with official evaluation harness

🤖 Generated with Claude Code

First discrete diffusion model to beat the AR baseline (1.22) in parameter-golf. MDLM training with log-linear noise, adaLN timestep conditioning, frozen visible-token logits, and discrete absorbing-mask ELBO evaluation. Three rounds of hyperparameter sweeps (27 experiments) identified key techniques for diffusion LMs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 29, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

agalimova closed this Mar 30, 2026

agalimova deleted the submission/llada-mdlm-diffusion branch March 30, 2026 00:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: LLaDA-MDLM Diffusion — val_var_bpb 1.1465 (first diffusion to beat AR baseline)#1100

Non-record: LLaDA-MDLM Diffusion — val_var_bpb 1.1465 (first diffusion to beat AR baseline)#1100
agalimova wants to merge 1 commit intoopenai:mainfrom
agalimova:submission/llada-mdlm-diffusion

agalimova commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

agalimova commented Mar 29, 2026

Summary

Results

Approach

Key Findings (27 hyperparameter experiments)

Non-Record Reason

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant