WeEdit

WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing
Hui Zhang^1,2, Juntao Liu¹, Zongkai Liu^1,3, Liqiang Niu¹, Fandong Meng¹, Zuxuan Wu², and Yu-Gang Jiang²
¹WeChat AI, Tencent, ²Fudan University, ³Sun Yat-sen University

Introduction

WeEdit is a systematic framework for text-centric image editing, addressing the challenges of modifying, translating, and rearranging textual elements embedded within images.

WeEdit Dataset 🗂️: A large-scale dataset of 330K text-centric editing pairs constructed via a novel HTML-based automatic pipeline, covering 7 editing operations and 15 languages.

WeEdit Benchmark 📊: Standardized bilingual (Chinese-English) and multilingual (15 languages) benchmarks with 2,000 test cases each, covering 8 editing operations (Add, Replace, Delete, Rearrange, Translate, Change Style, Combined, and Reasoning) for comprehensive evaluation.

Glyph-Guided SFT ✏️: A supervised fine-tuning stage that injects rendered glyph images as explicit spatial priors, enabling precise text placement and character-level fidelity.

Multi-Objective RL 🎯: A reinforcement learning stage with separate reward models targeting instruction adherence, text clarity, background preservation, and relative quality.

Dataset and Benchmark

WeEdit Dataset

Our WeEdit dataset contains 330K high-quality text-centric image editing pairs constructed through two complementary pipelines:

Structured Data (~170K): A novel HTML-based pipeline converts source images to HTML, extracts and edits text content via a VLM, and renders both source and target images through a headless browser, yielding pixel-perfect editing pairs.
Unstructured Data (~160K): An automated edit-verify-and-retry pipeline operates directly at the image level for images with complex layouts, diverse typography, and text tightly entangled with complex visual backgrounds.

The dataset covers 7 editing operation types (Add, Replace, Delete, Rearrange, Translate, Change Style, Combined) and 15 languages (English, Chinese, Hindi, Spanish, French, Arabic, Portuguese, Bengali, Russian, German, Korean, Japanese, Thai, Indonesian, Vietnamese).

WeEdit Benchmark

Our comprehensive benchmark evaluates text-centric image editing capabilities across multiple dimensions:

Bilingual Benchmark: 2,000 test cases covering Chinese and English
Multilingual Benchmark: 2,000 test cases spanning 15 languages
8 Task Categories: Add, Replace, Delete, Rearrange, Translate, Change Style, Combined, and Reasoning
3 Evaluation Dimensions: Instruction Adherence (IA), Text Clarity (TC), and Background Preservation (BP)

Evaluation

To evaluate a model's text-centric image editing capabilities on our benchmark:

Generate edited images and save them to a results directory with a generated_imgs/ subfolder. Each image should be named as {img_id}_{instruction_type}.png, where img_id and instruction_type are from the corresponding benchmark item.
Implement your own Gemini-3-Pro API call in evaluation/evaluation_benchmark.py by filling in the call_gemini() function.
Run the evaluation script:

Evaluate on the Bilingual Benchmark:

python evaluation/evaluation_benchmark.py \
    --results_dir <path_to_results> \
    --benchmark_file benchmark/Bilingual_benchmark.jsonl

Evaluate on the Multilingual Benchmark:

python evaluation/evaluation_benchmark.py \
    --results_dir <path_to_results> \
    --benchmark_file benchmark/Multilingual_benchmark.jsonl

The evaluation uses Gemini-3-Pro as an impartial VLM judge to score edited images across Instruction Adherence, Text Clarity, and Background Preservation on a 0-9 scale.

Main Results

Bilingual Benchmark

Multilingual Benchmark

WeEdit achieves the best performance among open-source models on both benchmarks, surpassing most proprietary models and ranking second only to Gemini-3-Pro-Image.

Citation

If you find our work useful for your research and applications, please kindly cite using this BibTeX:

@article{zhang2026weedit,
  title={WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing},
  author={Zhang, Hui and Liu, Juntao and Liu, Zongkai and Niu, Liqiang and Meng, Fandong and Wu, Zuxuan and Jiang, Yu-Gang},
  journal={arXiv preprint arXiv:2603.11593},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets/figures		assets/figures
benchmark		benchmark
evaluation		evaluation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeEdit

Introduction

Dataset and Benchmark

WeEdit Dataset

WeEdit Benchmark

Evaluation

Main Results

Bilingual Benchmark

Multilingual Benchmark

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WeEdit

Introduction

Dataset and Benchmark

WeEdit Dataset

WeEdit Benchmark

Evaluation

Main Results

Bilingual Benchmark

Multilingual Benchmark

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages