Multimodal Region-Specific Refinement for Perfect Local Details
RefineAnything targets region-specific image refinement: given an input image and a user-specified region (e.g., scribble mask or bounding box), it restores fine-grained details—text, logos, thin structures—while keeping all non-edited pixels unchanged. It supports both reference-based and reference-free refinement.
- 2026-04-14 — Community ComfyUI integration by @smthemex: ComfyUI_RefineAnything. Thanks for the great work!
- 2026-04-14 — Local Gradio demo (
app.py) is available for interactive testing. - 2026-04-12 — Hugging Face Space demo is live: https://huggingface.co/spaces/limuloo1999/RefineAnything.
- 2026-04-09 — Checkpoint released on Hugging Face: https://huggingface.co/limuloo1999/RefineAnything.
- 2026-04-09 — Release inference scripts.
- 2026-04-08 — Documentation skeleton added; code release coming this month (inference scripts, environment, and checkpoints will be linked here).
- TBD — Checkpoints and training/evaluation resources will be announced once finalized.
- Region-accurate refinement — Explicit region cues (scribbles or boxes) steer edits to the target area.
- Reference-based and reference-free — Optional reference image for guided local detail recovery.
- Strict background preservation — Edits stay inside the target region; training emphasizes seamless boundaries.
pip install -r requirement.txtOnly three things are required to run RefineAnything:
| Argument | Description |
|---|---|
--input |
Source image |
--mask |
Binary mask (white = region to refine) |
--prompt |
What to refine |
--ref |
(optional) Reference image for guided refinement |
Refine a blurry logo on a pillow using a reference image.
python scripts/fast_inference.py \
--input src/input1.png \
--mask src/mask1.png \
--prompt "Refine the LOGO." \
--ref src/ref1.png \
--output output/demo1.png| Input | Reference | Prompt |
![]() |
![]() |
"Refine the LOGO." |
| Output | ||
![]() |
||
Refine blurry Chinese text on a building sign — no reference image needed.
python scripts/fast_inference.py \
--input src/input2.png \
--mask src/mask2.png \
--prompt "refine the text '鼎好商城'" \
--output output/demo2.png| Input | Prompt |
![]() |
"refine the text '鼎好商城'" |
| Output | |
![]() |
|
We also provide a Gradio-based web UI for interactive testing. You can brush regions, upload reference images, and adjust all inference parameters in the browser.
python app.pyThen open http://localhost:7860 in your browser. The app will automatically download the base model (Qwen/Qwen-Image-Edit-2511) and the RefineAnything LoRA from Hugging Face on first launch.
You can specify a custom base model path via the MODEL_DIR environment variable:
MODEL_DIR=/path/to/local/Qwen-Image-Edit-2511 python app.pyFeatures of the Gradio demo:
- Brush-to-select: paint directly on the source image to define the refinement region.
- Optional reference image: upload a second image and optionally brush to crop a specific reference area.
- Focus crop: automatically crops and zooms into the edit region for higher detail fidelity, then composites back seamlessly.
- Lightning LoRA: one-click toggle for faster inference with fewer steps.
- Before / After slider: instantly compare input and output.
If you use this repository, please cite:
@article{zhou2026refineanything,
title={RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details},
author={Zhou, Dewei and Li, You and Yang, Zongxin and Yang, Yi},
journal={arXiv preprint arXiv:2604.06870},
year={2026}
}RefineAnything builds on ideas and components from the broader diffusion and multimodal ecosystem (including Qwen2.5-VL, Qwen-Image, and latent diffusion with VAE + MMDiT). Base model weights and API terms are subject to their respective licenses—verify compliance before redistributing checkpoints or derived weights.
Repository code license: TBD (e.g., Apache-2.0 or MIT)—set LICENSE when you open-source the implementation.







