Skip to content

DAVIAN-Robotics/EgoX

Repository files navigation

EgoX: Egocentric Video Generation from a Single Exocentric Video

Hugging Face Paper arXiv Project Page Hugging Face

Taewoong Kang*, Kinam Kim*, Dohyeon Kim*, Minho Park, Junha Hyung, and Jaegul Choo

DAVIAN Robotics, KAIST AI, SNU
arXiv 2025. (* indicates equal contribution)

🎬 Teaser Video

teaser.mp4

πŸ“‹ TODO

πŸ”Ή This Week

  • Release inference code
  • Release model weights
  • Release data preprocessing code (for inference)

πŸ”Ή By End of December

  • Release training code
  • Release data preprocessing code (for train)

πŸ”Ή Ongoing

  • Release user-friendly interface

πŸ› οΈ Environment Setup

System Requirements

  • GPU: < 80GB (for inference) < 140GB (for train)
  • CUDA: 12.1 or higher
  • Python: 3.10
  • PyTorch: Compatible with CUDA 12.1

Installation

Create a conda environment and install dependencies:

# Create conda environment
conda create -n egox python=3.10 -y
conda activate egox

# Install PyTorch with CUDA 12.1
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install other dependencies
pip install -r requirements.txt

πŸ“₯ Model Weights Download

πŸ’Ύ Wan2.1-I2V-14B Pretrained Model

Download the Wan2.1-I2V-14B model and save it to the checkpoints/pretrained_model/ folder.

pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='Wan-AI/Wan2.1-I2V-14B-480P-Diffusers', local_dir='./checkpoints/pretrained_model/Wan2.1-I2V-14B-480P-Diffusers')"

πŸ’Ύ EgoX Model Weights Download

Download the trained EgoX LoRA weights using one of the following methods:

Option 1: Hugging Face

pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='DAVIAN-Robotics/EgoX', local_dir='./checkpoints/EgoX', allow_patterns='*.safetensors')"

Option 2: Google Drive

  • Download from Google Drive and save to the checkpoints/EgoX/ folder.

πŸš€ Inference

Quick Start with Example Data

For quick testing, the codebase includes example data in the example/ directory. You can run inference immediately:

# For in-the-wild example
bash scripts/infer_itw.sh

# For Ego4D example
bash scripts/infer_ego4d.sh

Edit the GPU ID and seed in the script if needed. Results will be saved to ./results/.

Custom Data Inference

To run inference with your own data, prepare the following file structure:

your_dataset/              # Your custom dataset folder
β”œβ”€β”€ meta.json              # Meta information for each video
β”œβ”€β”€ videos/                # Videos directory
β”‚   └── take_name/
β”‚       β”œβ”€β”€ ego_Prior.mp4
β”‚       β”œβ”€β”€ exo.mp4
β”‚       └── ...
└── depth_maps/            # Depth maps directory
    └── take_name/
        β”œβ”€β”€ frame_000.npy
        └── ...
meta.json - Meta information for each video

JSON file containing exocentric video path, egocentric prior video path, prompt, camera intrinsic and extrinsic parameters for each video. The structure includes test_datasets array with entries for each videos.

Example:

{
    "test_datasets": [
        {
            "exo_path": "./example/in_the_wild/videos/joker/exo.mp4",
            "ego_prior_path": "./example/in_the_wild/videos/joker/ego_Prior.mp4",
            "prompt": "[Exo view]\n**Scene Overview:**\nThe scene is set on a str...\n\n[Ego view]\n**Scene Overview:**\nFrom the inferred first-person perspective, the environment appears chaotic and filled with sm...",
            "camera_intrinsics": [
                [634.47327, 0.0, 392.0],
                [0.0, 634.4733, 224.0],
                [0.0, 0.0, 1.0]
            ],
            "camera_extrinsics": [
                [1.0, 0.0, 0.0, 0.0],
                [0.0, 1.0, 0.0, 0.0],
                [0.0, 0.0, 1.0, 0.0]
            ],
            "ego_intrinsics": [
                [150.0, 0.0, 255.5],
                [0.0, 150.0, 255.5],
                [0.0, 0.0, 1.0]
            ],
            "ego_extrinsics": [
                [[0.6263, 0.7788, -0.0336, 0.3432],
                 [-0.0557, 0.0018, -0.9984, 2.3936],
                 [-0.7776, 0.6272, 0.0445, 0.1299]],
                ...
            ]
        },
        ...
    ]
}

To prepare your own dataset, follow the instruction from here.

Constraints

Since EgoX is trained on the Ego-Exo4D dataset where exocentric view camera poses are fixed, you must provide exocentric videos with fixed camera poses as input during inference. Also, the model is trained on 448x448(ego), 448x784(exo) resolutions and 49 frames. Please preprocess your videos to these resolutions.

Custom dataset init structure

Before running the script, you need to create a custom dataset folder with the following structure:

your_dataset/              # Your custom dataset folder
β”œβ”€β”€ videos/                # Videos directory
    └── take_name/
        └──  exo.mp4

Then, by using meta_init.py, you can create a meta.json file with the following command:

python meta_init.py --folder_path ./your_dataset --output_json ./your_dataset/meta.json --overwrite
your_dataset/              # Your custom dataset folder
β”œβ”€β”€ meta.json              # Meta information for each video
β”œβ”€β”€ videos/                # Videos directory
    └── take_name/
        └──  exo.mp4

Then, you can use caption.py to generate caption for each video with this command:

python caption.py --json_file ./your_dataset/meta.json --output_json ./your_dataset/meta.json --overwrite

Make sure that your api key is properly set in caption.py.

Finally, follow the instruction from here. Then you can get depth maps, camera intrinsic, ego camera extrinsics for each video.

your_dataset/              # Your custom dataset folder
β”œβ”€β”€ meta.json              # Meta information for each video
β”œβ”€β”€ videos/                # Videos directory
    └── take_name/
        β”œβ”€β”€ ego_Prior.mp4
        β”œβ”€β”€ exo.mp4
        └── ...
└── depth_maps/            # Depth maps directory
    └── take_name/
        β”œβ”€β”€ frame_000.npy
        └── ...

Then, modify scripts/infer_itw.sh (or create a new script) to point to your data paths:

python3 infer.py \
    --meta_data_file ./example/your_dataset/meta.json \
    --model_path ./checkpoints/pretrained_model/Wan2.1-I2V-14B-480P-Diffusers \
    --lora_path ./checkpoints/EgoX/pytorch_lora_weights.safetensors \
    --lora_rank 256 \
    --out ./results \
    --seed 42 \
    --use_GGA \
    --cos_sim_scaling_factor 3.0 \
    --in_the_wild

🌟 Star History

Star History Chart

πŸ™ Acknowledgements

This project is built upon the following works:

πŸ“ Citation

If you use this dataset or code in your research, please cite our paper:

@misc{kang2025egoxegocentricvideogeneration,
      title={EgoX: Egocentric Video Generation from a Single Exocentric Video}, 
      author={Taewoong Kang and Kinam Kim and Dohyeon Kim and Minho Park and Junha Hyung and Jaegul Choo},
      year={2025},
      eprint={2512.08269},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.08269}, 
}

About

Code for "EgoX: Egocentric Video Generation from a Single Exocentric Video"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •