Skip to content

Submission/2026 03 28 masked diffusion#1053

Open
ikermoel wants to merge 3 commits intoopenai:mainfrom
ikermoel:submission/2026-03-28-masked-diffusion
Open

Submission/2026 03 28 masked diffusion#1053
ikermoel wants to merge 3 commits intoopenai:mainfrom
ikermoel:submission/2026-03-28-masked-diffusion

Conversation

@ikermoel
Copy link
Copy Markdown

Method: Discrete Masked Diffusion Language Model (MDLM)

Mean val_bpb: 1.3600 (3 seeds: 1337, 42, 7)

Seed 1337 achieved 1.3606 BPB in 7344 steps at 81.71 ms/step. Seed 42 achieved 1.3772 BPB in 7189 steps at 83.47 ms/step. Seed 7 achieved 1.3423 BPB in ~7300 steps at ~83 ms/step. The mean across all three seeds is 1.3600 BPB.

Artifact: ~12.9MB

Key Idea

True discrete diffusion LM with bidirectional attention:

  • Bidirectional attention during training — each [MASK] token sees all other tokens
  • Masked token prediction — CE loss only on masked positions, rate sampled from Uniform[0.15, 0.85]
  • Pseudo-log-likelihood eval — 8 forward passes × 50% mask rate, each token predicted with bilateral context

Reproduction:
export USE_BIDIRECTIONAL_TRAIN=1 USE_MASKED_EVAL=1 USE_MASK_LOSS_ONLY=1 MIX_GPT_PROB=0.0 USE_TTT_EVAL=0 WARMDOWN_ITERS=20000 MAX_WALLCLOCK_SECONDS=600 SEED=1337
torchrun --standalone --nproc_per_node=8 train_diffusion.py

Built with help from ChatGPT (OpenAI) and Claude (Anthropic). I'm a high school student — this is my second ML competition submission and first time using this much compute in a project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant