Pinned
Excited about this new work where we dig into the role of token order in masked diffusions!
MDMs train on some horribly hard tasks, but careful planning at inference can sidestep the hardest ones, dramatically improving over vanilla MDM sampling (e.g. 7%->90% acc on Sudoku) 1/


















