Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model

Abstract: In ophthalmic surgery, developing an AI system capable of interpreting surgical videos and predicting subsequent operations requires numerous ophthalmic surgical videos with high-quality annotations, which are difficult to collect due to privacy concerns and labor consumption. Text-guided video generation (T2V) emerges as a promising solution to overcome this issue by generating ophthalmic surgical videos based on surgeon instructions. In this paper, we present Ophora, a pioneering model that can generate ophthalmic surgical videos following natural language instructions. To construct Ophora, we first propose a Comprehensive Data Curation pipeline to convert narrative ophthalmic surgical videos into a large-scale, high-quality dataset comprising over 160K video-instruction pairs, Ophora-160K. Then, we propose a Progressive Video-Instruction Tuning scheme to transfer rich spatial-temporal knowledge from a T2V model pre-trained on natural video-text datasets for privacy-preserved ophthalmic surgical video generation based on Ophora-160K. Experiments on video quality evaluation via quantitative analysis and ophthalmologist feedback demonstrate that Ophora can generate realistic and reliable ophthalmic surgical videos based on surgeon instructions. We also validate the capability of Ophora for empowering downstream task of ophthalmic surgical workflow understanding.

Introduction

This repository is for our work submitted to MICCAI25, titled "Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model".

We have released the training and inference codes of Ophora. The model checkpoint and dataset are released.

Synthesized Videos


Anterio Chamber Flushing: Simulated procedure demonstrating anterior chamber irrigation during ophthalmic surgery.	Capsule Polishing: Synthesized video showing delicate capsule polishing with micro-instruments.


Hydrodissection: Text-guided video generation of hydrodissection phase during cataract surgery.	Lens Implantation: Generated video illustrating lens implantation following cataract extraction.

Ophora & Ophora-160K

Model Checkpoint

We provide model checkpoints for Ophora at the Ophora repository.

Dataset

The curated large-scale dataset Ophora-160K can be accessed at Ophora-160K datasets.

Prepare Environment

Training and inference with Ophora require an environment compatible with the CogVideoX-2b model.
Please refer to its official page for installation instructions and dependencies: CogVideoX-2b on Hugging Face.

To prepare dataset for model training

bash prepare_dataset.sh

Train

Transfer Pre-Training

bash TPT.sh

Privacy-Preserving Fine-tuning

bash P2FT.sh

Inference

We provide phase captions written by professional ophthalmologists based on the phase labels in the Cataract-1K dataset.
You can use the Cataract-1K-phase_prompts.csv file for inference.

bash sample.sh

Citation

@article{li2025ophora,
  title={Ophora: A large-scale data-driven text-guided ophthalmic surgical video generation model},
  author={Li, Wei and Hu, Ming and Wang, Guoan and Liu, Lihao and Zhou, Kaijin and Ning, Junzhi and Guo, Xin and Ge, Zongyuan and Gu, Lixu and He, Junjun},
  journal={arXiv preprint arXiv:2505.07449},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model

Introduction

Synthesized Videos

Ophora & Ophora-160K

Model Checkpoint

Dataset

Prepare Environment

To prepare dataset for model training

Train

Inference

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dataset		dataset
demo		demo
training_code		training_code
Cataract-1K-phase_prompts.csv		Cataract-1K-phase_prompts.csv
P2FT.sh		P2FT.sh
README.md		README.md
TPT.sh		TPT.sh
ophora.png		ophora.png
prepare_dataset.sh		prepare_dataset.sh
sample.sh		sample.sh
samples.py		samples.py

Folders and files

Latest commit

History

Repository files navigation

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model

Introduction

Synthesized Videos

Ophora & Ophora-160K

Model Checkpoint

Dataset

Prepare Environment

To prepare dataset for model training

Train

Inference

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages