Skip to content

Non-record: MC Dropout ensembling is negative for small LMs#1021

Open
abaybektursun wants to merge 1 commit intoopenai:mainfrom
abaybektursun:nonrecord/mc-dropout-ensemble-negative
Open

Non-record: MC Dropout ensembling is negative for small LMs#1021
abaybektursun wants to merge 1 commit intoopenai:mainfrom
abaybektursun:nonrecord/mc-dropout-ensemble-negative

Conversation

@abaybektursun
Copy link
Copy Markdown
Contributor

Summary

  • MC Dropout (train with dropout, average K=16 softmax distributions at eval) does not improve BPB at 17M parameters
  • Tested dropout=0.30 (+0.005 BPB) and dropout=0.05 (+0.002 BPB) — deterministic single pass is strictly better
  • Sub-networks lack diversity at this scale; the ensemble just adds noise

Results

dropout Baseline BPB MC K=16 BPB Delta
0.30 1.3708 1.3756 +0.0049
0.05 1.3250 1.3269 +0.0019

Test plan

  • Trained and evaluated dropout=0.30 on 1xH100
  • Trained and evaluated dropout=0.05 on 1xH200
  • Verified probability sums = 1.0 (assertion in eval script)
  • Confirmed eval reproduces training's baseline BPB (torch.compile required)

🤖 Generated with Claude Code

MC Dropout (train with dropout, average K=16 softmax distributions
at eval) does not improve BPB at 17M parameters. Tested dropout=0.30
(+0.005 BPB) and dropout=0.05 (+0.002 BPB). The deterministic single
pass is strictly better — sub-networks lack diversity at this scale.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant