Skip to content

juchengshen/osdt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSDT: One-Shot Dynamic Thresholding for Diffusion Language Models

⭐ OSDT teaches masked diffusion LMs to decode adaptively with one-shot, data-aware calibration — improving the accuracy–throughput trade-off on GPQA, GSM8K, HumanEval, and potentially many more.

arXiv

Jucheng (Jack) Shen1, Yeonju Ro2

1 Rice University
2 The University of Texas at Austin

OSDT is a training-free, dataset-aware decoding scheme for masked diffusion language models (e.g., LLaDA-8B). It calibrates confidence thresholds on a single sequence and reuses them across the dataset, improving the accuracy–throughput trade-off on GPQA, GSM8K, and HumanEval by dynamically adapting thresholds at block or step-block granularity.


Comparative results

Comparative results (Table 1)


For usage details, see osdt/README.md (core OSDT), confidence_analysis/README.md (confidence tools), and fast_dllm/README.md (baselines).


Confidence patterns

Step–block confidence (Fig. 1)

Cosine similarity across questions (Fig. 2)


Installation

Recommended: Python 3.10 or 3.11 (PyTorch/flash-attn support is best on these).

python3.11 -m venv .venv
source .venv/bin/activate  # macOS/Linux
pip install --upgrade pip

# 1) Install PyTorch matching your CUDA from `nvidia-smi`
#    Pick ONE of the following (or the CPU-only wheel):
# CUDA 12.6:
# pip install torch==2.7.1+cu126 --index-url https://download.pytorch.org/whl/cu126
# CUDA 12.4:
# pip install torch==2.7.1+cu124 --index-url https://download.pytorch.org/whl/cu124
# CUDA 12.1:
# pip install torch==2.7.1+cu121 --index-url https://download.pytorch.org/whl/cu121
# CPU only (no NVIDIA GPU):
# pip install torch==2.7.1 --index-url https://download.pytorch.org/whl/cpu

# 2) Install flash-attn (prebuilt for Torch 2.7.* / CUDA 12.*)
pip install --no-build-isolation "flash-attn==2.8.1"

# 3) Install remaining dependencies
pip install -r requirements.txt

Where is the main implementation?

The primary OSDT source lives in osdt/. See osdt/README.md for:

  • Detailed usage for GPQA, GSM8K, HumanEval
  • Recommended defaults from the paper (dynamic mode, metric, cap κ, slack ε)
  • Utility scripts (hyperparameter sweep, one-shot threshold calculator)

Reproducing results

Our repo ships custom evaluation scripts in each sub-directory (including Fast-dLLM baselines). The results reported in the paper are computed with these scripts.

However, the standard and most widely adopted approach in the community is to evaluate diffusion LMs using lm-eval.

To reproduce with lm-eval, please follow the setup in the official Fast-dLLM repo. In particular:

  1. git clone https://github.com/NVlabs/Fast-dLLM.git and cd llada/

  2. Refer to their evaluation setup, please make sure you only use the parallel generation evaluation method.

  3. Run eval_gsm8k.sh and eval_humaneval.sh. For GPQA, refer to GPQA’s lm-eval page and use the main task with zero_shot. These produce the Fast-dLLM baselines.

  4. Replace the imported generation function in eval_llada.py with OSDT, and update the arguments at place 1, place 2, and place 3 to the OSDT-specific ones.

  5. Re-run eval_gsm8k.sh, eval_humaneval.sh, and eval_gpqa_main.sh (created in step 3) using the OSDT commands in osdt/README.md.

If lm-eval produces results that differ from our internal scripts, or if you observe anything unexpected or interesting during reproduction, please feel free to open an issue — we would be happy to investigate.


Citation

If you find this repository useful, please consider citing:

@misc{shen2025staticcutoffsoneshotdynamic,
      title={Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models}, 
      author={Jucheng Shen and Yeonju Ro},
      year={2025},
      eprint={2511.02077},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.02077}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors