serve

Visual Generation API Examples

This directory contains example scripts that demonstrate how to use the TensorRT-LLM Visual Generation API endpoints for image and video generation.

Overview

These examples show how to interact with the visual generation server using both the OpenAI Python SDK and standard HTTP requests. The API provides endpoints for:

Image Generation: Text-to-image generation (T2I)
Video Generation:
- Text-to-video generation (T2V) - generate videos from text prompts only
- Text+Image-to-video generation (TI2V) - generate videos from text + reference image
- Both synchronous and asynchronous modes supported
- Multipart/form-data support for file uploads
Video Management: Retrieving and deleting generated videos

Prerequisites

Before running these examples, ensure you have:

Install modules: Install optional dependency:

Optional: For better video compression (H.264/MP4), install ffmpeg:
```
# Ubuntu/Debian
apt-get install ffmpeg
```
If ffmpeg is not available, the server will use a pure Python encoder that outputs MJPEG/AVI format. See FFmpeg download page for installation instructions on other platforms.

Server Running: The TensorRT-LLM visual generation server must be running

trtllm-serve <path to your model> --extra_visual_gen_options <path to config yaml>

e.g.

trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan21.yml
trtllm-serve $LLM_MODEL_DIR/Wan2.2-T2V-A14B-Diffusers --extra_visual_gen_options ./configs/wan22.yml
trtllm-serve $LLM_MODEL_DIR/FLUX.1-dev --extra_visual_gen_options ./configs/flux1.yml
trtllm-serve $LLM_MODEL_DIR/FLUX.2-dev --extra_visual_gen_options ./configs/flux2.yml
trtllm-serve $LLM_MODEL_DIR/LTX-2/ --extra_visual_gen_options ./configs/ltx2.yml
trtllm-serve $LLM_MODEL_DIR/Qwen-Image --extra_visual_gen_options ./configs/qwen_image.yml

# Run server on background:
trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan21.yml > /tmp/serve.log 2>&1 &

## Check if the server is setup
tail -f /tmp/serve.log

For LTX-2, you need to provide a proper text_encoder_path in ./configs/ltx2.yml.

Examples

Current supported & tested models:

WAN T2V/I2V for video generation (t2v, ti2v, delete_video)
FLUX.1 for image generation (t2i)
FLUX.2 for image generation (t2i)
LTX-2 for video generation with audio (t2v, ti2v)
Qwen-Image for image generation (t2i)

1. Synchronous Image Generation (`sync_image_gen.py`)

Demonstrates synchronous text-to-image generation using the OpenAI SDK. Supports FLUX.1 and FLUX.2.

Features:

Generates images from text prompts
Supports configurable model and image size
Returns base64-encoded images or URLs
Saves generated images to disk

Usage:

# FLUX.2 (default)
python sync_image_gen.py

# FLUX.1
python sync_image_gen.py --model flux1

# Custom server and prompt
python sync_image_gen.py --base-url http://your-server:8000/v1 --prompt "A sunset"

API Endpoint: POST /v1/images/generations

Output: Saves generated image to output_generation.png (or numbered files for multiple images)

2. Synchronous Video Generation with T2V and TI2V Modes (`sync_video_gen.py`)

Demonstrates synchronous video generation using direct HTTP requests. Waits for completion and returns the video file directly.

Features:

T2V Mode: Generate videos from text prompts only
TI2V Mode: Generate videos from text + reference image (multipart/form-data)
Waits for video generation to complete before returning
Returns video file directly in response
Command-line interface for easy testing

Usage:

# Text-to-Video (T2V) - No reference image
python sync_video_gen.py --mode t2v \
    --prompt "A cute cat playing with a ball in the park" \
    --duration 4.0 --fps 24 --size 256x256

# Text+Image-to-Video (TI2V) - With reference image
## Note: longer duration and higher size will lead to much longer waiting time
python sync_video_gen.py --mode ti2v \
    --prompt "She turns around and smiles, then slowly walks out of the frame" \
    --image ./media/woman_skyline_original_720p.jpeg \
    --duration 4.0 --fps 24 --size 512x512

# Custom parameters
python sync_video_gen.py --mode t2v \
    --prompt "A serene sunset over the ocean" \
    --duration 5.0 --fps 30 --size 512x512 \
    --output my_video.mp4

# LTX-2: Text-to-Video (generates video with audio)
python sync_video_gen.py --mode t2v \
    --model ltx2 \
    --prompt "A cute cat playing with a ball in the park" \
    --duration 5.0 --fps 24 --size 1280x720

# LTX-2: Image-to-Video
python sync_video_gen.py --mode ti2v \
    --model ltx2 \
    --prompt "She turns around and smiles, then slowly walks out of the frame" \
    --image ./media/woman_skyline_original_720p.jpeg \
    --duration 5.0 --fps 24 --size 1280x720

Command-Line Arguments:

--mode - Generation mode: t2v or ti2v (default: t2v)
--prompt - Text prompt for video generation (required)
--image - Path to reference image (required for ti2v mode)
--base-url - API server URL (default: http://localhost:8000/v1)
--model - Model name (default: wan). Use ltx2 for LTX-2.
--duration - Video duration in seconds (default: 4.0)
--fps - Frames per second (default: 24)
--size - Video resolution in WxH format (default: 256x256)
--output - Output video file path (default: output_sync.mp4)

API Endpoint: POST /v1/videos/generations

API Details:

T2V uses JSON Content-Type: application/json
TI2V uses multipart/form-data Content-Type: multipart/form-data with file upload

Output: Saves generated video to specified output file

3. Async Video Generation with T2V and TI2V Modes (`async_video_gen.py`)

NEW: Enhanced async video generation supporting both Text-to-Video (T2V) and Text+Image-to-Video (TI2V) modes.

Features:

T2V Mode: Generate videos from text prompts only (JSON request)
TI2V Mode: Generate videos from text + reference image (multipart/form-data with file upload)
Command-line interface for easy testing
Automatic mode detection
Comprehensive parameter control

Usage:

# Text-to-Video (T2V) - No reference image
python async_video_gen.py --mode t2v \
    --prompt "A cool cat on a motorcycle in the night" \
    --duration 4.0 --fps 24 --size 256x256

# Text+Image-to-Video (TI2V) - With reference image
python async_video_gen.py --mode ti2v \
    --prompt "She turns around and smiles, then slowly walks out of the frame" \
    --image ./media/woman_skyline_original_720p.jpeg \
    --duration 4.0 --fps 24 --size 512x512

# Custom parameters
python async_video_gen.py --mode t2v \
    --prompt "A serene sunset over the ocean" \
    --duration 5.0 --fps 30 --size 512x512 \
    --output my_video.mp4

# LTX-2: Async Text-to-Video (generates video with audio)
python async_video_gen.py --mode t2v \
    --model ltx2 \
    --prompt "A cool cat on a motorcycle in the night" \
    --duration 5.0 --fps 24 --size 1280x720

# LTX-2: Async Image-to-Video
python async_video_gen.py --mode ti2v \
    --model ltx2 \
    --prompt "She turns around and smiles, then slowly walks out of the frame" \
    --image ./media/woman_skyline_original_720p.jpeg \
    --duration 5.0 --fps 24 --size 1280x720

Command-Line Arguments:

--mode - Generation mode: t2v or ti2v (default: t2v)
--prompt - Text prompt for video generation (required)
--image - Path to reference image (required for ti2v mode)
--base-url - API server URL (default: http://localhost:8000/v1)
--model - Model name (default: wan). Use ltx2 for LTX-2.
--duration - Video duration in seconds (default: 4.0)
--fps - Frames per second (default: 24)
--size - Video resolution in WxH format (default: 256x256)
--output - Output video file path (default: output_async.mp4)

API Details:

T2V uses JSON Content-Type: application/json
TI2V uses multipart/form-data Content-Type: multipart/form-data with file upload

Output: Saves generated video to specified output file

4. Video Deletion (`delete_video.py`)

Demonstrates the complete lifecycle of video generation and deletion.

Features:

Creates a test video generation job
Waits for completion
Deletes the generated video
Verifies deletion by attempting to retrieve the deleted video
Tests error handling for non-existent videos

Usage:

# Use default localhost server
python delete_video.py

# Specify custom server URL
python delete_video.py http://your-server:8000/v1

API Endpoints:

POST /v1/videos - Create video job
GET /v1/videos/{video_id} - Check video status
DELETE /v1/videos/{video_id} - Delete video

Test Flow:

Create video generation job
Wait for completion
Delete the video
Verify video returns NotFoundError
Test deletion of non-existent video

API Configuration

All examples use the following default configuration:

Base URL: http://localhost:8000/v1
API Key: "tensorrt_llm" (authentication token)
Timeout: 300 seconds for async operations

You can customize these by:

Passing the base URL as a command-line argument
Modifying the default parameters in each script's function

Common Parameters

Image Generation

prompt: Text description (required)
n: Number of images to generate
size: Image dimensions in WxH format (e.g., "512x512", "1024x1024") — or use the structured pair width + height (both required when sent)
seed: Random seed; null / omitted means the engine draws a fresh seed
num_inference_steps, guidance_scale, max_sequence_length, negative_prompt: per-request denoise controls (override pipeline defaults when sent)
extra_params: model-specific overflow as a JSON object (see "Model-Specific extra_params" below). Unknown keys are rejected by the executor.
response_format: "b64_json" or "url"
format: Generation content encoding. Image encoders: "png", "webp", "jpeg". Tensor formats: "safetensors", "pt".
Accept-and-warn OpenAI-shape fields (no engine semantic): model, quality, style, user. Sending quality/style logs a server-side WARNING; sending model warns on mismatch. None of these change generation behavior.

Video Generation

prompt: Text description (required)
size / width / height: same convention as image
seconds: Duration in seconds (engine multiplies by frame_rate to derive num_frames when the latter is absent)
frame_rate (canonical) or fps (alias): frames per second
num_frames: when set, wins over the seconds * frame_rate derivation
seed, num_inference_steps, guidance_scale, max_sequence_length, negative_prompt: per-request denoise controls
input_reference: Reference image (TI2V mode); accepted as base64-encoded string in JSON or as a file in multipart form-data
extra_params: model-specific overflow (see below)
response_format: "b64_json" or "url"
format: Generation content encoding. Video encoders: "mp4", "avi", "auto". Tensor formats: "safetensors", "pt" (carries video + audio + scalar metadata in one payload for LTX-2).

Tensor-format consumer contract

When format="safetensors" or format="pt", the payload bundles every populated media tensor (image / video / audio) and the scalar metadata (frame_rate, audio_sample_rate) into one file.

pt: torch.load(buf, weights_only=True) returns a dict with the tensor keys and the scalars as native Python values.
safetensors: safetensors.torch.load(bytes) returns a dict with the tensor keys and each scalar as a 0-d tensor under the same key — call .item() to unbox (e.g. loaded["frame_rate"].item()). The same scalars are also written to the safetensors file header as strings; safe_open(path, framework="pt").metadata() exposes them in that form for consumers that prefer header access.

Unknown-field policy

The visual-gen endpoints reject unknown top-level fields with HTTP 422 (extra="forbid"). Anything model-specific belongs inside extra_params. Sending output_format, top-level guidance_rescale, or — for video — top-level n returns 422 with the offending field named in the error body.

Model-specific `extra_params`

Use the Python API to discover accepted keys for a loaded pipeline:

generator = VisualGen(model="...")
print(generator.extra_param_specs)   # {key: ExtraParamSchema(type=..., range=..., default=..., description=...)}

Examples:

LTX-2: stg_scale, stg_blocks, modality_scale, guidance_rescale, output_type, ...
Wan 2.2 A14B: guidance_scale_2, boundary_ratio
Wan 2.1 / Flux: no model-specific extra_params declared

Note: LTX-2 generates video with audio. The ltx2.yml config must include text_encoder_path pointing to a Gemma3 model (e.g., google/gemma-3-12b-it).

Quick Reference - curl Examples

Text-to-Video (JSON)

curl -X POST "http://localhost:8000/v1/videos" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A cool cat on a motorcycle",
    "seconds": 4.0,
    "fps": 24,
    "size": "256x256"
  }'

Text-to-Video with LTX-2 (JSON, generates video with audio)

curl -X POST "http://localhost:8000/v1/videos" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ltx2",
    "prompt": "A cool cat on a motorcycle",
    "seconds": 5.0,
    "fps": 24,
    "size": "1280x720"
  }'

Text+Image-to-Video (Multipart with File Upload)

curl -X POST "http://localhost:8000/v1/videos" \
  -F "prompt=She turns around and smiles" \
  -F "input_reference=@./media/woman_skyline_original_720p.jpeg" \
  -F "seconds=4.0" \
  -F "fps=24" \
  -F "size=256x256" \
  -F "guidance_scale=5.0"

Check Video Status

curl -X GET "http://localhost:8000/v1/videos/{video_id}"

Download Video

# The server returns either MP4 (with ffmpeg) or AVI (without ffmpeg)
# Check the Content-Type header to determine the format
curl -X GET "http://localhost:8000/v1/videos/{video_id}/content" -o output.mp4

# Or use -J -O to let curl use the server-provided filename
curl -X GET "http://localhost:8000/v1/videos/{video_id}/content" -J -O

Delete Video

curl -X DELETE "http://localhost:8000/v1/videos/{video_id}"

API Endpoints Summary

Endpoint	Method	Mode	Content-Type	Purpose
`/v1/videos`	POST	Async	JSON or Multipart	Create video job (T2V/TI2V)
`/v1/videos/generations`	POST	Sync	JSON or Multipart	Generate video sync (T2V/TI2V)
`/v1/videos/{id}`	GET	-	-	Get video status/metadata
`/v1/videos/{id}/content`	GET	-	-	Download video file
`/v1/videos/{id}`	DELETE	-	-	Delete video
`/v1/videos`	GET	-	-	List all videos
`/v1/images/generations`	POST	-	JSON	Generate images (T2I)

Note: Both /v1/videos (async) and /v1/videos/generations (sync) support:

JSON: Standard text-to-video (T2V)
Multipart/Form-Data: Text+image-to-video (TI2V) with file upload

Error Handling

All examples include comprehensive error handling:

Connection errors (server not running)
API errors (invalid parameters, model not found)
Timeout errors (generation taking too long)
Resource errors (video not found for deletion)

Errors are displayed with full stack traces for debugging.

Output Files

Generated files are saved to the current working directory:

output_generation.png - Synchronous image generation (sync_image_gen.py)
output_sync.mp4 or output_sync.avi - Synchronous video generation (sync_video_gen.py)
output_async.mp4 or output_async.avi - Asynchronous video generation (async_video_gen.py)

Note: You can customize output filenames using the --output parameter in all scripts.

Video Encoding

The server supports two video encoding modes:

Encoder	Format	Requirements	Features
FFmpeg (H.264)	MP4	ffmpeg installed	Better compression, audio support
Pure Python (MJPEG)	AVI	None (built-in)	No external dependencies

The server automatically selects the best available encoder. The example scripts detect the actual format from the server response and adjust the output filename extension accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Visual Generation API Examples

Overview

Prerequisites

Examples

1. Synchronous Image Generation (`sync_image_gen.py`)

2. Synchronous Video Generation with T2V and TI2V Modes (`sync_video_gen.py`)

3. Async Video Generation with T2V and TI2V Modes (`async_video_gen.py`)

4. Video Deletion (`delete_video.py`)

API Configuration

Common Parameters

Image Generation

Video Generation

Tensor-format consumer contract

Unknown-field policy

Model-specific `extra_params`

Quick Reference - curl Examples

Text-to-Video (JSON)

Text-to-Video with LTX-2 (JSON, generates video with audio)

Text+Image-to-Video (Multipart with File Upload)

Check Video Status

Download Video

Delete Video

API Endpoints Summary

Error Handling

Output Files

Video Encoding

Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
media		media
README.md		README.md
async_video_gen.py		async_video_gen.py
benchmark_visual_gen.sh		benchmark_visual_gen.sh
benchmark_visual_gen_mgmn_distributed.sh		benchmark_visual_gen_mgmn_distributed.sh
delete_video.py		delete_video.py
sync_image_gen.py		sync_image_gen.py
sync_video_gen.py		sync_video_gen.py

FilesExpand file tree

serve

Directory actions

More options

Directory actions

More options

Latest commit

History

serve

Folders and files

parent directory

README.md

Visual Generation API Examples

Overview

Prerequisites

Examples

1. Synchronous Image Generation (sync_image_gen.py)

2. Synchronous Video Generation with T2V and TI2V Modes (sync_video_gen.py)

3. Async Video Generation with T2V and TI2V Modes (async_video_gen.py)

4. Video Deletion (delete_video.py)

API Configuration

Common Parameters

Image Generation

Video Generation

Tensor-format consumer contract

Unknown-field policy

Model-specific extra_params

Quick Reference - curl Examples

Text-to-Video (JSON)

Text-to-Video with LTX-2 (JSON, generates video with audio)

Text+Image-to-Video (Multipart with File Upload)

Check Video Status

Download Video

Delete Video

API Endpoints Summary

Error Handling

Output Files

Video Encoding

1. Synchronous Image Generation (`sync_image_gen.py`)

2. Synchronous Video Generation with T2V and TI2V Modes (`sync_video_gen.py`)

3. Async Video Generation with T2V and TI2V Modes (`async_video_gen.py`)

4. Video Deletion (`delete_video.py`)

Model-specific `extra_params`