Built around the latest public Wan video docs from Alibaba Cloud Model Studio: text-to-video, first-frame image-to-video, first-and-last-frame control, reference-to-video, and multi-shot narrative.
The closest official public Wan video references currently describe four core paths: text-to-video, first-frame image-to-video, first-and-last-frame control, and reference-to-video.

First-frame image-to-video with prompt-based motion and camera guidance.
Built for every creator — from influencers to product teams.
Turn your image library into daily content without hours of editing.
The official public Wan video docs highlight control, cinematic output, and clear production constraints more than heavy editing complexity.
These are the capabilities we could verify from official Alibaba Cloud Model Studio pages.
Generate a video from a single sentence with cinematic-quality visuals.
Use one input image as the first frame and generate a complete shot from a prompt.
Guide the transition between two states using provided first and last frame images.
Generate performance video from reference images or videos while reusing appearance and optional voice cues.
Current Wan 2.6 text-to-video and image-to-video docs explicitly mention multi-shot narrative capability.
Official pages list 720P and 1080P options, 30 fps MP4 output, and region-specific duration limits.
When creators search for Wan 2.7 video, the closest official public references we could verify are Alibaba Cloud's Wan video pages. Those pages currently document the Wan 2.6 video line and its core workflows.
Official Wan video docs describe text-to-video as generating videos from a single sentence with rich styles and cinematic-quality visuals.
Use one image as the starting frame and drive the shot with a prompt. This is the clearest official path for turning a still into motion.
Official Wan guidance also includes a workflow that uses both the starting and ending images to control the transition between two moments.
The official reference-to-video description says it can reuse a character's appearance from an input video or image and can also reference timbre from the video.
Based on Alibaba Cloud Model Studio's public guidance, Wan video generation usually follows four steps:
Start with text-to-video, first-frame image-to-video, first-and-last-frame control, or reference-to-video depending on your input and target shot.
Use text for motion and camera intent, add first or last frames where needed, and include audio or reference clips when the model supports them.
Official pages list 720P/1080P and durations from 2 to 15 seconds for current Wan 2.6 video models, with region-specific differences.
The public API references treat video generation as a long-running task, so task creation and status polling are part of the normal workflow.
Have another question? Contact us by email.
Can't find what you're looking for? Contact our customer support team
Start from the official public Wan video feature set, then choose the plan that fits your usage.