Skip to content

jhcook/image_to_video

Repository files navigation

Multi-Backend Video Generator

A Python application that converts images to videos using multiple AI providers: OpenAI's Sora-2, Azure AI Foundry Sora, Google's Veo-3, and RunwayML's Gen-4 models.

Features

  • 🎨 Multiple AI Backends - Choose between OpenAI Sora-2, Azure Sora, Google Veo-3, or RunwayML
  • 🖼️ Flexible Image Input - Single files, multiple images, wildcard patterns, or text-only
  • 🔗 Seamless Stitching - Multi-clip video generation with automatic frame transitions (Veo 3.1, RunwayML Veo)
  • 🔄 Automatic Retries - Exponential backoff when APIs are at capacity
  • 📝 Comprehensive Logging - DEBUG-level logging to logs/video_gen.log
  • 🏗️ Modular Architecture - Clean, maintainable, and extensible codebase

Quick Start

# Install dependencies
pip install -r requirements.txt

# Set your API key
export OPENAI_API_KEY="your-key"      # For OpenAI Sora
export RUNWAY_API_KEY="your-key"      # For RunwayML
# For Google Veo - see authentication guide

# Generate a video
./image2video.py "A peaceful sunset over mountains"

# With images
./image2video.py -i "photo.jpg" "Animate this scene"

# Choose a provider
./image2video.py --provider runway "Your prompt"

Documentation

📚 Complete Documentation - Full documentation index

Quick Links

Provider Guides

Advanced Topics

Supported Backends

Backend Models Pricing Multi-Image Stitching
OpenAI Sora sora-2, sora-2-pro Variable
Azure Sora sora-2, sora-2-pro $0.10/sec
Google Veo veo-3.0, veo-3.1 $0.15-0.40
RunwayML gen4, gen4_turbo, google.x Variable Single only ✅ (Veo)

See Provider Comparison for detailed feature matrix.

Usage Examples

Basic Text-to-Video

./image2video.py "A serene lake at dawn with mist rising"

Image-to-Video

./image2video.py -i "landscape.jpg" "Time-lapse of this scene at sunset"

Multiple Images

./image2video.py -i "img1.jpg,img2.jpg,img3.jpg" "Tour of these locations"

Wildcard Patterns

./image2video.py -i "photos/*.jpg" "Create a walkthrough video"

Provider Selection

# Use Google Veo
./image2video.py --provider google --model veo-3.1-fast-generate-preview "Your prompt"

# Use RunwayML
./image2video.py --provider runway --model gen4 "Your prompt"

# Use Azure Sora
./image2video.py --provider azure "Your prompt"

Seamless Multi-Clip Stitching (Veo 3.1)

./image2video.py --provider google --model veo-3.1-fast-generate-preview --stitch \\
  -i reference_images/*.jpg \\
  -p "Camera pans across the foyer" \\
     "Dolly forward into the living room" \\
     "Pan right to show the kitchen"

💡 Tip: Control which images are used for each clip - see Image Grouping Guide

Installation

Requirements

  • Python 3.8 or higher
  • ffmpeg (for video processing)

Setup

# Clone or download the repository
git clone <repository-url>
cd image_to_video

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\\Scripts\\activate

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your API keys

See Installation Guide for detailed instructions.

Authentication

Each provider requires different authentication:

Tip: A fully commented template of all required and optional variables is provided in .env.sample. Copy it to .env and edit values as needed.

OpenAI Sora

export OPENAI_API_KEY="your-api-key"

Azure Sora

export AZURE_OPENAI_API_KEY="your-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"

Google Veo

# Browser OAuth (easiest)
./image2video.py --provider google --google-login

# Or manual with gcloud
gcloud auth application-default login
export GOOGLE_API_KEY="\$(gcloud auth application-default print-access-token)"
export GOOGLE_CLOUD_PROJECT="your-project-id"

RunwayML

export RUNWAY_API_KEY="your-api-key"

See Authentication Guide for complete details.

Project Structure

image_to_video/
├── image2video.py              # Main CLI entry point
├── image2video_mono.py         # Legacy monolithic script (Sora-2 only)
├── requirements.txt            # Python dependencies
├── README.md                   # This file
│
├── docs/                       # Documentation
│   ├── README.md              # Documentation index
│   ├── quick-start.md         # Quick start guide
│   ├── user-guide.md          # Complete user guide
│   ├── providers/              # Backend-specific docs
│   ├── advanced/              # Advanced topics
│   ├── technical/             # Technical documentation
│   └── reference/             # Reference materials
│
└── video_gen/                  # Core package
    ├── config.py              # Configuration management
    ├── file_handler.py        # File operations
    ├── arg_parser.py          # Argument parsing
    ├── video_generator.py     # Main orchestration
    ├── logger.py              # Logging infrastructure
    └── providers/             # Provider implementations
        ├── openai_provider/   # OpenAI Sora
        ├── azure_provider/    # Azure Sora
        ├── google_provider/   # Google Veo
        └── runway_provider/   # RunwayML Gen-4 & Veo

Development

Running Tests

# Activate virtual environment
source venv/bin/activate

# Run the full unittest suite
python -m unittest discover -s tests -p "test_*.py" -v

# Test specific provider
./image2video.py --provider openai "Test prompt"

Contributing

We welcome contributions! See the Development Guide for:

  • Setting up a development environment
  • Code style and conventions
  • Testing guidelines
  • Submitting pull requests

Troubleshooting

Common issues and solutions:

"No images provided" error

# Use -i flag before image paths
./image2video.py -i "images/*.jpg" "Your prompt"

API key not found

# Verify environment variables are set
echo \$OPENAI_API_KEY
echo \$RUNWAY_API_KEY

Google Veo authentication issues

# Use browser OAuth (easiest method)
./image2video.py --provider google --google-login

See Troubleshooting Guide for complete solutions.

Architecture

The application uses a modular, provider-based architecture:

  • Providers - Backend-specific implementations (OpenAI, Azure, Google, RunwayML)
  • Clients - Separate client classes per model family within each provider
  • Configuration - Per-provider config classes with validation
  • Orchestration - Central dispatcher routes requests to appropriate providers
  • Logging - Centralized logging infrastructure with DEBUG-level detail

See Architecture Guide for detailed design documentation.

License

This project is provided as-is for educational and research purposes.

Links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors