Skip to content

Orange-3DV-Team/MoCha

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoCha: End-to-End Video Character Replacement without Structural Guidance

🔥🔥🔥Welcome to read our full paper!

teaser1.mp4
teaser2.mp4

🔥 Updates

  • [2026.02.21]: Paper accepted by CVPR 2026!
  • [2026.01.14]: Paper released on arxiv!
  • [2025.10.22]: Special thanks to @kijai for adding MoCha to the custom ComfyUI node WanVideoWrapper!
  • [2025.10.21]: Try our work with ComfyUI workflow!
  • [2025.10.21]: Release the inference code.
  • [2025.10.20]: Release the project page.

📝 Abstract

Controllable video character replacement with a user-provided one remains a challenging problem due to the lack of qualified paired-video data. Prior works have predominantly adopted a reconstruction-based paradigm reliant on per-frame masks and explicit structural guidance (e.g., pose, depth). This reliance, however, renders them fragile in complex scenarios involving occlusions, rare poses, character-object interactions, or complex illumination, often resulting in visual artifacts and temporal discontinuities. In this paper, we propose MoCha, a novel framework that bypasses these limitations, which requires only a single first-frame mask and re-renders the character by unifying different conditions into a single token stream. Further, MoCha adopts a condition-aware RoPE to support multi-reference images and variable-length video generation. To overcome the data bottleneck, we construct a comprehensive data synthesis pipeline to collect qualified paired-training videos. Extensive experiments show that our method substantially outperforms existing state-of-the-art approaches.

☕ Getting Started with MoCha

Inference

Step 1: Clone this repository

git clone https://github.com/Orange-3DV-Team/MoCha.git
cd MoCha

Step 2: Set up the environment

# 1. Create conda environment
conda create -n MoCha python==3.10

# 2. Activate the environment
conda activate MoCha

# 3. Install pip dependencies
pip install -r requirements.txt

Step 3: Download the pretrained checkpoints

  1. Download the pre-trained Wan2.1 models from huggingface
  2. Download the pre-trained MoCha checkpoint

Please download from huggingface and place it in ./checkpoints.

Step 4: Test the example videos

python inference_mocha.py

Test your own video

To start your own character replacement with MoCha, the following three inputs are required:

  • Source Video: The original video with the character to be replaced.
  • Designation Mask for the First Frame: A mask marking the source character to be replaced in the first frame of Source Video.
  • Reference Images: Reference Images of the new character for replacement with clean background. We recommend uploading at least one high-quality, front-facing facial close-up.
start_MoCha.mp4

Then organize your test data following the structure of the ./data/test_data.csv.

  • source_video: Path to Source Video.
  • source_mask: Path to Designation Mask.
  • reference_1: Path to first Reference Image.
  • reference_2: Path to second Reference Image. This image needs to be a high-quality, front-facing facial close-up. (You can even zoom up your first reference image!) If you really cannot provide this reference image, leave it as None.

Finally, test your videos by:

python inference_mocha.py --data_path path/to/your/data.csv

💭 Communication & Feedback

Have more ideas? Welcome to scan the code and join the WeChat group for in-depth discussion!

🌟 Citation

Please leave us a star 🌟 and cite our repo if you find our work helpful.

@inproceedings{orange2025mocha,
  title={MoCha: End-to-End Video Character Replacement without Structural Guidance}, 
  author={Zhengbo Xu, Jie Ma, Ziheng Wang, Zhan Peng, Jun Liang, Jing Li},
  journal={arXiv preprint arXiv:2601.08587},
  year={2026},
  url={https://github.com/Orange-3DV-Team/MoCha}
}

About

MoCha: End-to-End Video Character Replacement without Structural Guidance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages