ICML 2026 Β |Β Official Implementation
Jiaqi Tangβ , Jianmin Chenβ , Youyang Zhaiβ , Wei Weiβ‘, Runtao Liu, Mengjie Zhao, Xiangyu Wu, Qingfa Xiao,
β Equal contribution Β Β β Corresponding author Β Β β‘ Co-corresponding author
TL;DR β Robust-U1 is a unified MLLM that self-recovers corrupted visual content and reasons over it, enabling robust visual understanding under real-world image degradations.
2026-06-11π₯ We release the code, pretrained models, and the online demo of Robust-U1!2026-05-07π Robust-U1 is accepted to ICML 2026!
π Motivation Β Β·Β π¦ Installation Β Β·Β π€ Models Β Β·Β π» Demo Β Β·Β π§ Training Β Β·Β π Evaluation Β Β·Β β Citation Β Β·Β π¬ Contact
Existing approaches to robust visual understanding face two key limitations:
- π© Black-Box Alignment β Feature-alignment methods lack interpretability and fail to explicitly model the corruption process.
- π© Text-Only Compensation β Text-based reasoning cannot recover lost pixel-level visual details for faithful visual understanding.
This motivates a key question: Can MLLMs recover corrupted visual content by themselves?
1. Clone the repository
git clone https://github.com/jqtangust/Robust-U1.git
cd Robust-U12. Create the environment
conda create -n Robust-U1 python=3.10
conda activate Robust-U1
pip install -r requirements.txt
pip install -e .| Model | Link | Description |
|---|---|---|
| BAGEL-7B-MoT | ByteDance-Seed/BAGEL-7B-MoT | Base model used as the initial weights for training. |
| Robust-U1 | Jiaqi-hkust/Robust-U1 | Final model for visual self-recovery and multimodal reasoning. |
| Robust-U1-SFT | Jiaqi-hkust/Robust-U1-SFT | Stage-I supervised fine-tuned checkpoint. |
| Robust-U1-RL | Jiaqi-hkust/Robust-U1-RL | Stage-II reinforcement-learning checkpoint. |
π Online demo β try Robust-U1 directly on Hugging Face Spaces.
Run the command-line demo with a local model path and an output directory for recovered images:
export MODEL_PATH="/path/to/Robust-U1"
export OUTPUT_DIR="./outputs"
python demo.py \
--model-path "$MODEL_PATH" \
--output-dir "$OUTPUT_DIR"Set the model path and start the local Gradio demo (available at http://localhost:7860 by default):
export MODEL_PATH="/path/to/Robust-U1"
python app.py --model-path "$MODEL_PATH"Robust-U1 is trained with a three-stage pipeline:
| Stage | Goal | Framework |
|---|---|---|
| I. Visual Self-Recovery | Recover clean images from corrupted inputs (SFT) | MathCanvas |
| II. Visual Quality Alignment | Align recovery with pixel-level fidelity & semantics (RL) | Flow-GRPO |
| III. Multimodal Reasoning | Reason over corrupted & recovered images | MathCanvas |
We use MathCanvas for both supervised fine-tuning and multimodal reasoning training. Stage I adapts the base unified MLLM to recover clean images from corrupted inputs, while Stage III trains the model to reason over both corrupted and recovered images.
-
Prepare the MathCanvas training framework:
git clone https://github.com/shiwk24/MathCanvas.git cd MathCanvas/BAGEL-Canvas -
Download the base model BAGEL-7B-MoT.
-
Prepare the training data:
- For Stage I, prepare paired corrupted-clean image data for visual self-recovery.
- For Stage III, prepare reasoning data with corrupted images, recovered images, questions, and reasoning-chain annotations.
-
Modify the dataset paths in
data/dataset_info.pyand configure the corresponding training scripts with your local paths. -
Run Stage-I supervised fine-tuning to obtain the SFT checkpoint:
bash scripts/train/stage1.sh
-
After Stage-II reinforcement learning, run Stage-III multimodal reasoning training:
bash scripts/train/stage2.sh
We use Flow-GRPO to further align the recovery model with pixel-level structural fidelity and semantic consistency. The Robust-U1 rewards are packaged in rewards/ and can be registered directly in Flow-GRPO.
-
Prepare Flow-GRPO and expose Robust-U1 rewards:
git clone https://github.com/yifan123/flow_grpo.git cd flow_grpo -
Register the Robust-U1 reward adapter in
flow_grpo/rewards.py:from rewards import FLOW_GRPO_REFERENCE_REWARD_NAMES, register_flow_grpo_rewards # after Flow-GRPO builds score_functions register_flow_grpo_rewards(score_functions) # reference-based rewards use clean target images elif score_name in FLOW_GRPO_REFERENCE_REWARD_NAMES: scores, rewards = score_fns[score_name](images, ref_images)
-
Prepare restoration data with corrupted images and clean references. Each JSONL record should contain:
{"prompt": "Please restore this corrupted image to its clean version.", "image": "corrupted/000001.png", "target_image": "clean/000001.png"} -
Configure
config/grpo.py:config.dataset = "/path/to/dataset/restoration" config.pretrained.model = "/path/to/Robust-U1-SFT" config.reward_fn = { "restoration": 1.0, "tinyclip": 0.2, }
-
Run reinforcement learning:
bash scripts/multi_node/bagel/main.sh 0
The launcher should point to the restoration config, for example:
accelerate launch --config_file scripts/accelerate_configs/fsdp.yaml \ --num_processes 8 \ scripts/train_bagel.py \ --config config/grpo.py:restoration_bagel
We use VLMEvalKit for anti-degradation evaluation.
-
Clone the VLMEvalKit repository and install dependencies:
git clone https://github.com/open-compass/VLMEvalKit.git cd VLMEvalKit pip install -e .
-
Prepare the evaluation datasets according to VLMEvalKit requirements.
-
Image Degradation Pipeline β generate corrupted images for robustness evaluation.
Navigate to the degradation pipeline directory and process images:
cd add_degradation python generate_pipeline_open_source.py --input_dir <input_dir> --output_base_dir <output_base_dir> --dataset_name <dataset_name> --verbose
The script will generate three output directories with different degradation intensities for each image.
-
Configure the model path and evaluation settings in the VLMEvalKit configuration file.
-
Run the evaluation command:
python run.py --model <your_model_name_or_path> --data <dataset_name>
For R-Bench evaluation, we use R-Bench to assess model performance under real-world corruptions.
-
Clone the R-Bench repository:
git clone https://github.com/Q-Future/R-Bench.git
-
Evaluate using VLMEvalKit with the R-Bench dataset:
cd VLMEvalKit python run.py --data R-Bench-Dis --model <your_model_name_or_path> --verbose
-
For full dataset evaluation, follow the R-Bench pipeline as described in the R-Bench repository.
If you find this repository useful, please consider citing our paper:
@inproceedings{tang2026robustu1,
title={Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?},
author={Tang, Jiaqi and Chen, Jianmin and Zhai, Youyang and Wei, Wei and Liu, Runtao and Zhao, Mengjie and Wu, Xiangyu and Xiao, Qingfa and Chen, Qifeng},
booktitle={Proceedings of the 43rd International Conference on Machine Learning (ICML)},
year={2026},
}For questions about the paper or code, feel free to open a GitHub issue or reach out:
- Jiaqi Tang β jtang092@connect.ust.hk
We thank the authors of BAGEL, MathCanvas, and Flow-GRPO for their excellent open-source contributions.

