🚀 DMax: Aggressive Parallel Decoding for dLLMs

DMax is a new dLLM paradigm achieving aggressive parallel decoding while preserving generation quality.

dmax_demo.mov

DMax: Aggressive Parallel Decoding for dLLMs
Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, Xinchao Wang
xML Lab, National University of Singapore
Paper Arxiv

⭐ Updates

[April 10, 2026]: Our Arxiv paper is available now.
[April 10, 2026]: Code, model and dataset are released.

💪 Highlights

Aggressive Decoding Parallelism: Achieves 6.0 TPF on math and reasoning tasks and 6.6 TPF on code tasks while preserving accuracy.
Self-Revising dLLM: Extends a pretrained MDLM into a UDLM with an intrinsic ability to revise its own erroneous predictions during decoding.
Soft Parallel Decoding: Uses interpolation between mask and token embeddings to propagate confidence priors from previous steps.

Superior Parallelism-Accuracy Trade-off, Increased TPF with Maintained Accuracy.

💡 Introduction

We present DMax, a new paradigm for efficient dLLMs. It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition, DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings. At the core of our approach is On-Policy Uniform Training, a novel training strategy that efficiently unifies masked and uniform dLLMs, equipping the model to recover clean tokens from both masked inputs and its own erroneous predictions. Building on this foundation, we further intoduce Soft Parallel Decoding. Extensive experiments across a variety of benchmarks demonstrate the effectiveness of DMax.

Overview of the On-Policy Uniform Training.

💻 Model and Datasets

Model	Description	Source Model	Link
🤖 DMax-Math-16B	Highly parallel dLLM for math and reasoning.	LLaDA-2.0-mini	Hugging Face
🤖 DMax-Coder-16B	Highly parallel dLLM for code generation.	LLaDA-2.0-mini	Hugging Face
🤖 DMax-16B	Highly parallel general-purpose dLLM.	LLaDA-2.0-mini	Coming soon

Dataset	Description	Link
📊 DMax-Math-Training-Data	Trajectories on math problems generated by LLaDA-2.0-mini	Hugging Face
📊 DMax-Code-Training-Data	Trajectories on code problems generated by LLaDA-2.0-mini	Hugging Face

🚀 Quick Start

import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Zigeng/DMax-Math-16B", trust_remote_code=True, device_map="cuda:0"
)
model = model.to(torch.bfloat16)
model.eval()
tokenizer = AutoTokenizer.from_pretrained("Zigeng/DMax-Math-16B", trust_remote_code=True)

prompt = "A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take?" + "\nLet's think step by step\n"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
)

nfe, generated_tokens = model.generate_spd(
    inputs=input_ids,
    gen_length=2048,
    block_length=32,
    threshold=0.0,
)

generated_answer = tokenizer.decode(
    generated_tokens[0],
    skip_special_tokens=True,
)

print(generated_answer)
print("nfe:",nfe,"token length",len(generated_tokens[0]))

🔧 Installation

Clone the DMax reposity

git clone https://github.com/czg1225/DMax.git --recursive
cd DMax

Install dFactory environment for training:

cd dFactory
conda create -n dFactory python==3.11
conda activate dFactory
pip install -e VeOmni/

Install dInfer environment for efficient evaluation:

cd dInfer
conda create -n dInfer python==3.11
conda activate dInfer
pip install .
pip install sglang==0.5.3.post1
pip install vllm==0.10.2

🔥 Training

Our training scripts is based on the dFactory reposity.

cd dFactory

1. Download and Merge Model Weights

The training scripts require model weights in a "merged-expert" format for optimal performance. Before starting, you must download the standard weights and convert them.

Download the original model: Follow the helper script to download the weights from the Hugging Face Hub.

# Choose a destination for the original model files
python scripts/download_hf_model.py \
  --repo_id inclusionAI/LLaDA2.0-mini \
  --local_dir /path/to/separate_expert_model

Convert to the merged format: Run the following script to create the merged checkpoint required for training.

# Use the path from the previous step as the source
python scripts/moe_convertor.py \
  --input-path /path/to/separate_expert_model \
  --output-path /path/to/save/merged_model \
  --mode merge

2. Prepare Training Data

Before training, the dataset must be converted into the conversational format expected by our training pipeline. The script below transforms the original "question" and "answer" fields into a "messages" field. Run the following command to perform the conversion.

#prepare the math and reasoning training data
python scripts/build_dataset_oput.py --dataset_path Zigeng/DMax-LLaDA-2.0-Mini-Math-Trajectories
# or prepare the code training data
python scripts/build_dataset_oput.py --dataset_path Zigeng/DMax-LLaDA-2.0-Mini-Code-Trajectories

3. Modify Training Configs

Edit configs/sft/llada2_mini_bd_oput.yaml:

model:
  model_path: "/path/to/save/merged_model"
data:
  train_path: "/your/data/path"
train:
  output_dir: "/your/output/path"

4. Run Training

Once all preparation steps are finished, you can launch the fine-tuning process with the following command.
The default configuration uses distributed training across 8 GPUs.

PYTHONPATH=$(pwd)/VeOmni:$PYTHONPATH sh train.sh tasks/train_llada2_bd_oput.py configs/sft/llada2_mini_bd_oput.yaml

5. Interact with the Trained Model

To interact with a trained model, complete the following two steps:

Step 1: Convert the Checkpoint

First, convert the checkpoint from the merged format used during training back to the standard Mixture-of-Experts (MoE) format.

Note: the --input-path should point to the saved Hugging Face checkpoint, not the root output directory specified during training. The checkpoint is typically located in a subdirectory such as: TRAIN_OUTPUT_DIR/checkpoints/global_step_XXX/hf_ckpt/

Run the following command to perform the conversion:

python scripts/moe_convertor.py \
  --input-path /path/to/merged_model \
  --output-path /path/to/save/separate_expert_model \
  --mode split

Step 2: Copy the Modeling File

After the conversion, a final manual step is required. You must copy the DMax model's architecture file (modeling_llada2_moe.py and configuration_llada2_moe) into the newly created separate_expert_model directory. This file must come from the directory of your local saved DMax model. The training and conversion processes only update the model weights, not the architecture file, which is why the DMax version is needed.

cp /path/to/local_saved_DMax_model/modeling_llada2_moe.py /path/to/save/separate_expert_model/
cp /path/to/local_saved_DMax_model/configuration_llada2_moe.py /path/to/save/separate_expert_model/

With the model converted and the modeling file in place, you are now ready to chat!

⚡ Evaluation

Our training scripts is based on the dInfer reposity.

cd dInfer/evaluations

Download the DMax model: Follow the helper script to download the weights from the Hugging Face Hub.

# Choose a destination for the original model files
python download_hf_model.py \
  --repo_id Zigeng/DMax-Math-16B \
  --local_dir /path/to/local_saved_model

1. Evaluation on Math & Reasoning Benchmarks

We provide evaluation scripts for several math and reasoning benchmarks. Run the following command to launch the evaluation. You may modify the inference settings in eval_llada_dmax_math.sh as needed. Before running the script, please set model_path to the path of your locally saved model.

The current evaluation suite supports four benchmarks:

✅ GSM8K
✅ MATH500
✅ Minerva_Algebra
✅ ASDIV

bash eval_llada_dmax_math.sh

After generation, run the following scripts to extract answers from the generated responses and evaluate accuracy against the ground-truth labels.

python val_gsm8k.py       # postprocess and calculate accuracy on GSM8K
python val_math.py        # postprocess and calculate accuracy on MATH500
python val_algebra.py     # postprocess and calculate accuracy on Minerva_Algebra
python val_asdiv.py       # postprocess and calculate accuracy on ASDIV

2. Evaluation on Code Benchmarks

We also provide evaluation scripts for code generation benchmarks. Run the following command to start the evaluation. You may modify the inference settings in eval_llada_dmax_code.sh as needed. Before running the script, please set model_path to the path of your locally saved model.

The current evaluation suite supports the following four benchmarks:

✅ HumanEval_Instruct
✅ MBPP_Instruct
✅ HumanEval_Instruct_Plus
✅ MBPP_Instruct_Plus

bash eval_llada_dmax_code.sh

🔍 Decoding Process Visualization

We provide a script for visualizing the full decoding process. Run demo.py to generate an HTML file named dllm_demo.html.Then open this file in Chrome to view the decoding visualization.

python demo.py

☀️ Acknowledgement

Our code builds on dFactory, dInfer, and we acknowledge these great works for laying the groundwork that made our approach possible.

📚 Citation

If our research assists your work, please give us a star ⭐ or cite us using:

@misc{chen2026dmaxaggressiveparalleldecoding,
      title={DMax: Aggressive Parallel Decoding for dLLMs}, 
      author={Zigeng Chen and Gongfan Fang and Xinyin Ma and Ruonan Yu and Xinchao Wang},
      year={2026},
      eprint={2604.08302},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.08302}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
dFactory		dFactory
dInfer		dInfer
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
chat.py		chat.py
demo.py		demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 DMax: Aggressive Parallel Decoding for dLLMs

⭐ Updates

💪 Highlights

📚 Table of Contents

💡 Introduction

💻 Model and Datasets

🚀 Quick Start

🔧 Installation

🔥 Training

1. Download and Merge Model Weights

2. Prepare Training Data

3. Modify Training Configs

4. Run Training

5. Interact with the Trained Model

Step 1: Convert the Checkpoint

⚡ Evaluation

1. Evaluation on Math & Reasoning Benchmarks

2. Evaluation on Code Benchmarks

🔍 Decoding Process Visualization

☀️ Acknowledgement

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 DMax: Aggressive Parallel Decoding for dLLMs

⭐ Updates

💪 Highlights

📚 Table of Contents

💡 Introduction

💻 Model and Datasets

🚀 Quick Start

🔧 Installation

🔥 Training

1. Download and Merge Model Weights

2. Prepare Training Data

3. Modify Training Configs

4. Run Training

5. Interact with the Trained Model

Step 1: Convert the Checkpoint

⚡ Evaluation

1. Evaluation on Math & Reasoning Benchmarks

2. Evaluation on Code Benchmarks

🔍 Decoding Process Visualization

☀️ Acknowledgement

📚 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages