Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

ICML 2026 | Official Implementation

Jiaqi Tang^★, Jianmin Chen^★, Youyang Zhai^★, Wei Wei^‡, Runtao Liu, Mengjie Zhao, Xiangyu Wu, Qingfa Xiao,

_{^★ Equal contribution ^† Corresponding author ^‡ Co-corresponding author}

TL;DR — Robust-U1 is a unified MLLM that self-recovers corrupted visual content and reasons over it, enabling robust visual understanding under real-world image degradations.

📰 News

2026-06-11 🔥 We release the code, pretrained models, and the online demo of Robust-U1!
2026-05-07 🎉 Robust-U1 is accepted to ICML 2026!

📑 Table of Contents

🔭 Motivation · 📦 Installation · 🤖 Models · 💻 Demo · 🧠 Training · 📊 Evaluation · ⭐ Citation · 📬 Contact

🔭 Motivation

Existing approaches to robust visual understanding face two key limitations:

🚩 Black-Box Alignment — Feature-alignment methods lack interpretability and fail to explicitly model the corruption process.
🚩 Text-Only Compensation — Text-based reasoning cannot recover lost pixel-level visual details for faithful visual understanding.

This motivates a key question: Can MLLMs recover corrupted visual content by themselves?

📦 Installation

1. Clone the repository

git clone https://github.com/jqtangust/Robust-U1.git
cd Robust-U1

2. Create the environment

conda create -n Robust-U1 python=3.10
conda activate Robust-U1
pip install -r requirements.txt
pip install -e .

🤖 Models

Model	Link	Description
BAGEL-7B-MoT	ByteDance-Seed/BAGEL-7B-MoT	Base model used as the initial weights for training.
Robust-U1	Jiaqi-hkust/Robust-U1	Final model for visual self-recovery and multimodal reasoning.
Robust-U1-SFT	Jiaqi-hkust/Robust-U1-SFT	Stage-I supervised fine-tuned checkpoint.
Robust-U1-RL	Jiaqi-hkust/Robust-U1-RL	Stage-II reinforcement-learning checkpoint.

💻 Demo

🌐 Online demo — try Robust-U1 directly on Hugging Face Spaces.

🖥️ CLI

Run the command-line demo with a local model path and an output directory for recovered images:

export MODEL_PATH="/path/to/Robust-U1"
export OUTPUT_DIR="./outputs"

python demo.py \
  --model-path "$MODEL_PATH" \
  --output-dir "$OUTPUT_DIR"

🪟 GUI

Set the model path and start the local Gradio demo (available at http://localhost:7860 by default):

export MODEL_PATH="/path/to/Robust-U1"
python app.py --model-path "$MODEL_PATH"

🧠 Training

Robust-U1 is trained with a three-stage pipeline:

Stage	Goal	Framework
I. Visual Self-Recovery	Recover clean images from corrupted inputs (SFT)	MathCanvas
II. Visual Quality Alignment	Align recovery with pixel-level fidelity & semantics (RL)	Flow-GRPO
III. Multimodal Reasoning	Reason over corrupted & recovered images	MathCanvas

🎓 Stage I & III — Self-Recovery & Reasoning

We use MathCanvas for both supervised fine-tuning and multimodal reasoning training. Stage I adapts the base unified MLLM to recover clean images from corrupted inputs, while Stage III trains the model to reason over both corrupted and recovered images.

Prepare the MathCanvas training framework:

git clone https://github.com/shiwk24/MathCanvas.git
cd MathCanvas/BAGEL-Canvas

Download the base model BAGEL-7B-MoT.
Prepare the training data:
- For Stage I, prepare paired corrupted-clean image data for visual self-recovery.
- For Stage III, prepare reasoning data with corrupted images, recovered images, questions, and reasoning-chain annotations.
Modify the dataset paths in data/dataset_info.py and configure the corresponding training scripts with your local paths.
Run Stage-I supervised fine-tuning to obtain the SFT checkpoint:
```
bash scripts/train/stage1.sh
```
After Stage-II reinforcement learning, run Stage-III multimodal reasoning training:
```
bash scripts/train/stage2.sh
```

🎓 Stage II — Visual Quality Alignment (RL)

We use Flow-GRPO to further align the recovery model with pixel-level structural fidelity and semantic consistency. The Robust-U1 rewards are packaged in rewards/ and can be registered directly in Flow-GRPO.

Prepare Flow-GRPO and expose Robust-U1 rewards:

git clone https://github.com/yifan123/flow_grpo.git
cd flow_grpo

Register the Robust-U1 reward adapter in flow_grpo/rewards.py:

from rewards import FLOW_GRPO_REFERENCE_REWARD_NAMES, register_flow_grpo_rewards

# after Flow-GRPO builds score_functions
register_flow_grpo_rewards(score_functions)

# reference-based rewards use clean target images
elif score_name in FLOW_GRPO_REFERENCE_REWARD_NAMES:
    scores, rewards = score_fns[score_name](images, ref_images)

Prepare restoration data with corrupted images and clean references. Each JSONL record should contain:

{"prompt": "Please restore this corrupted image to its clean version.", "image": "corrupted/000001.png", "target_image": "clean/000001.png"}

Configure config/grpo.py:

config.dataset = "/path/to/dataset/restoration"
config.pretrained.model = "/path/to/Robust-U1-SFT"
config.reward_fn = {
    "restoration": 1.0,
    "tinyclip": 0.2,
}

Run reinforcement learning:

bash scripts/multi_node/bagel/main.sh 0

The launcher should point to the restoration config, for example:

accelerate launch --config_file scripts/accelerate_configs/fsdp.yaml \
  --num_processes 8 \
  scripts/train_bagel.py \
  --config config/grpo.py:restoration_bagel

📊 Evaluation

We use VLMEvalKit for anti-degradation evaluation.

Clone the VLMEvalKit repository and install dependencies:

git clone https://github.com/open-compass/VLMEvalKit.git
cd VLMEvalKit
pip install -e .

Prepare the evaluation datasets according to VLMEvalKit requirements.
Image Degradation Pipeline — generate corrupted images for robustness evaluation.

Navigate to the degradation pipeline directory and process images:
```
cd add_degradation
python generate_pipeline_open_source.py --input_dir <input_dir> --output_base_dir <output_base_dir> --dataset_name <dataset_name> --verbose
```
The script will generate three output directories with different degradation intensities for each image.
Configure the model path and evaluation settings in the VLMEvalKit configuration file.

Run the evaluation command:

python run.py --model <your_model_name_or_path> --data <dataset_name>

🔬 R-Bench Evaluation

For R-Bench evaluation, we use R-Bench to assess model performance under real-world corruptions.

Clone the R-Bench repository:

git clone https://github.com/Q-Future/R-Bench.git

Evaluate using VLMEvalKit with the R-Bench dataset:

cd VLMEvalKit
python run.py --data R-Bench-Dis --model <your_model_name_or_path> --verbose

For full dataset evaluation, follow the R-Bench pipeline as described in the R-Bench repository.

⭐ Citation

If you find this repository useful, please consider citing our paper:

@inproceedings{tang2026robustu1,
      title={Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?},
      author={Tang, Jiaqi and Chen, Jianmin and Zhai, Youyang and Wei, Wei and Liu, Runtao and Zhao, Mengjie and Wu, Xiangyu and Xiao, Qingfa and Chen, Qifeng},
      booktitle={Proceedings of the 43rd International Conference on Machine Learning (ICML)},
      year={2026},
}

📬 Contact

For questions about the paper or code, feel free to open a GitHub issue or reach out:

Jiaqi Tang — jtang092@connect.ust.hk

🤝 Acknowledgements

We thank the authors of BAGEL, MathCanvas, and Flow-GRPO for their excellent open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
inference		inference
modeling		modeling
rewards		rewards
README.md		README.md
app.py		app.py
demo.py		demo.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

📰 News

📑 Table of Contents

🔭 Motivation

📦 Installation

🤖 Models

💻 Demo

🖥️ CLI

🪟 GUI

🧠 Training

🎓 Stage I & III — Self-Recovery & Reasoning

🎓 Stage II — Visual Quality Alignment (RL)

📊 Evaluation

🔬 R-Bench Evaluation

⭐ Citation

📬 Contact

🤝 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

📰 News

📑 Table of Contents

🔭 Motivation

📦 Installation

🤖 Models

💻 Demo

🖥️ CLI

🪟 GUI

🧠 Training

🎓 Stage I & III — Self-Recovery & Reasoning

🎓 Stage II — Visual Quality Alignment (RL)

📊 Evaluation

🔬 R-Bench Evaluation

⭐ Citation

📬 Contact

🤝 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages