Skip to content

limuloo/RefineAnything

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RefineAnything

Multimodal Region-Specific Refinement for Perfect Local Details

RefineAnything targets region-specific image refinement: given an input image and a user-specified region (e.g., scribble mask or bounding box), it restores fine-grained details—text, logos, thin structures—while keeping all non-edited pixels unchanged. It supports both reference-based and reference-free refinement.

Teaser

News


Highlights

  • Region-accurate refinement — Explicit region cues (scribbles or boxes) steer edits to the target area.
  • Reference-based and reference-free — Optional reference image for guided local detail recovery.
  • Strict background preservation — Edits stay inside the target region; training emphasizes seamless boundaries.

Comparisons

Reference-free qualitative comparisons

Reference-based qualitative comparisons


Installation

pip install -r requirement.txt

Quick Start

Only three things are required to run RefineAnything:

Argument Description
--input Source image
--mask Binary mask (white = region to refine)
--prompt What to refine
--ref (optional) Reference image for guided refinement

Demo 1 — Reference-based Logo Refinement

Refine a blurry logo on a pillow using a reference image.

python scripts/fast_inference.py \
    --input  src/input1.png \
    --mask   src/mask1.png \
    --prompt "Refine the LOGO." \
    --ref    src/ref1.png \
    --output output/demo1.png
Input Reference Prompt
"Refine the LOGO."
Output

Demo 2 — Reference-free Text Refinement

Refine blurry Chinese text on a building sign — no reference image needed.

python scripts/fast_inference.py \
    --input  src/input2.png \
    --mask   src/mask2.png \
    --prompt "refine the text '鼎好商城'" \
    --output output/demo2.png
Input Prompt
"refine the text '鼎好商城'"
Output

Local Gradio Demo

We also provide a Gradio-based web UI for interactive testing. You can brush regions, upload reference images, and adjust all inference parameters in the browser.

python app.py

Then open http://localhost:7860 in your browser. The app will automatically download the base model (Qwen/Qwen-Image-Edit-2511) and the RefineAnything LoRA from Hugging Face on first launch.

You can specify a custom base model path via the MODEL_DIR environment variable:

MODEL_DIR=/path/to/local/Qwen-Image-Edit-2511 python app.py

Features of the Gradio demo:

  • Brush-to-select: paint directly on the source image to define the refinement region.
  • Optional reference image: upload a second image and optionally brush to crop a specific reference area.
  • Focus crop: automatically crops and zooms into the edit region for higher detail fidelity, then composites back seamlessly.
  • Lightning LoRA: one-click toggle for faster inference with fewer steps.
  • Before / After slider: instantly compare input and output.

Citation

If you use this repository, please cite:

@article{zhou2026refineanything,
  title={RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details},
  author={Zhou, Dewei and Li, You and Yang, Zongxin and Yang, Yi},
  journal={arXiv preprint arXiv:2604.06870},
  year={2026}
}

Acknowledgements and License

RefineAnything builds on ideas and components from the broader diffusion and multimodal ecosystem (including Qwen2.5-VL, Qwen-Image, and latent diffusion with VAE + MMDiT). Base model weights and API terms are subject to their respective licenses—verify compliance before redistributing checkpoints or derived weights.

Repository code license: TBD (e.g., Apache-2.0 or MIT)—set LICENSE when you open-source the implementation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages