Skip to content

uni-medical/Ophora

Repository files navigation

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model

Abstract: In ophthalmic surgery, developing an AI system capable of interpreting surgical videos and predicting subsequent operations requires numerous ophthalmic surgical videos with high-quality annotations, which are difficult to collect due to privacy concerns and labor consumption. Text-guided video generation (T2V) emerges as a promising solution to overcome this issue by generating ophthalmic surgical videos based on surgeon instructions. In this paper, we present Ophora, a pioneering model that can generate ophthalmic surgical videos following natural language instructions. To construct Ophora, we first propose a Comprehensive Data Curation pipeline to convert narrative ophthalmic surgical videos into a large-scale, high-quality dataset comprising over 160K video-instruction pairs, Ophora-160K. Then, we propose a Progressive Video-Instruction Tuning scheme to transfer rich spatial-temporal knowledge from a T2V model pre-trained on natural video-text datasets for privacy-preserved ophthalmic surgical video generation based on Ophora-160K. Experiments on video quality evaluation via quantitative analysis and ophthalmologist feedback demonstrate that Ophora can generate realistic and reliable ophthalmic surgical videos based on surgeon instructions. We also validate the capability of Ophora for empowering downstream task of ophthalmic surgical workflow understanding.

Introduction

This repository is for our work submitted to MICCAI25, titled "Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model".

We have released the training and inference codes of Ophora. The model checkpoint and dataset are released.

Framework

Synthesized Videos

Anterio Chamber Flushing Capsule Polishing
Anterio Chamber Flushing: Simulated procedure demonstrating anterior chamber irrigation during ophthalmic surgery. Capsule Polishing: Synthesized video showing delicate capsule polishing with micro-instruments.
Hydrodissection Lens Implantation
Hydrodissection: Text-guided video generation of hydrodissection phase during cataract surgery. Lens Implantation: Generated video illustrating lens implantation following cataract extraction.

Ophora & Ophora-160K

Model Checkpoint

We provide model checkpoints for Ophora at the Ophora repository.

Dataset

The curated large-scale dataset Ophora-160K can be accessed at Ophora-160K datasets.

Prepare Environment

Training and inference with Ophora require an environment compatible with the CogVideoX-2b model.
Please refer to its official page for installation instructions and dependencies: CogVideoX-2b on Hugging Face.

To prepare dataset for model training

bash prepare_dataset.sh

Train

Transfer Pre-Training

bash TPT.sh

Privacy-Preserving Fine-tuning

bash P2FT.sh

Inference

We provide phase captions written by professional ophthalmologists based on the phase labels in the Cataract-1K dataset.
You can use the Cataract-1K-phase_prompts.csv file for inference.

bash sample.sh

Citation

@article{li2025ophora,
  title={Ophora: A large-scale data-driven text-guided ophthalmic surgical video generation model},
  author={Li, Wei and Hu, Ming and Wang, Guoan and Liu, Lihao and Zhou, Kaijin and Ning, Junzhi and Guo, Xin and Ge, Zongyuan and Gu, Lixu and He, Junjun},
  journal={arXiv preprint arXiv:2505.07449},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors