Segments Guide

Overview

Segments let you sync different audio clips to different time ranges within a single video in one API call. This enables multi-speaker lip sync by letting you assign different audio inputs to different parts of your video. Using segments, you can:

  • LipSync different audio clips to different parts of your video
  • Use specific portion of audio input to lipsync a segment for precise timing
  • Use both audio and text-to-speech inputs to lipsync multiple segments with different input types in a single generation

Basic Concepts

To use segments feature, you need to provide a top-level segments array with each item defining a video time range/segment, each with its own audio configuration.

Segment

Each segment item takes the following properties:

startTime
doubleRequired

Segment start time in seconds

endTime
doubleRequired

Segment end time in seconds

audioInput
SegmentAudioInputRequired

Audio configuration with refId and optional cropping

audioInput

Each segment requires exactly one audioInput. audioInput takes the following properties:

refId
stringRequired

Reference ID of the audio/text-to-speech input to use for this segment

startTime
double

Optional start time (in seconds) to crop the referenced audio. When specified, endTime must also be provided

endTime
double

Optional end time (in seconds) to crop the referenced audio. When specified, startTime must also be provided

The specified audioInput will be used to lipsync the video segment between startTime and endTime.

API Usage Examples

1from sync import Sync
2from sync.common import Audio, Video
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://assets.sync.so/docs/example-video.mp4"),
9 Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1"),
10 ],
11 model="lipsync-2",
12 segments=[
13 GenerationSegment(
14 start_time=2,
15 end_time=5,
16 audio_input=SegmentAudioInput(ref_id="audio_1"),
17 ),
18 ],
19)

Multiple Segments with Single Audio Input

1from sync import Sync
2from sync.common import Audio, Video, TTS
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://assets.sync.so/docs/example-video.mp4"),
9 Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1")
10 ],
11 segments=[
12 {
13 "startTime": 2,
14 "endTime": 5,
15 "audioInput": {"refId": "audio_1", "startTime": 2, "endTime": 5}
16 },
17 {
18 "startTime": 6,
19 "endTime": 8,
20 "audioInput": {"refId": "audio_1", "startTime": 6, "endTime": 8}
21 }
22 ],
23 model="lipsync-2"
24)

Multiple Segments with Single Audio Input

1from sync import Sync
2from sync.common import Audio, Video, TTS
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://assets.sync.so/docs/example-video.mp4"),
9 Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1"),
10 Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_2")
11 ],
12 segments=[
13 {
14 "startTime": 2,
15 "endTime": 5,
16 "audioInput": {"refId": "audio_1", "startTime": 2, "endTime": 5}
17 },
18 {
19 "startTime": 6,
20 "endTime": 8,
21 "audioInput": {"refId": "audio_2", "startTime": 6, "endTime": 8}
22 }
23 ],
24 model="lipsync-2"
25)

Best Practices

Planning Your Segments

  1. Map your timeline: Identify video segments and corresponding audio needs
  2. Prepare audio files: Ensure audio quality and appropriate duration
  3. Test segment boundaries: Verify smooth transitions between segments

Audio Preparation

  • Use consistent audio quality across all segments and the video’s audio.
  • For best results, ensure proper timing alignment with video segments. If segment duration and corresponding audio duration don’t match, rely on sync_mode to determine how to handle the mismatch.

Troubleshooting

Common Errors

Provide a top-level segments array when using multiple audio or text inputs.

1# ❌ This will fail
2response = sync.generations.create(
3 input=[
4 Video(url="video.mp4"),
5 Audio(url="audio1.wav"), # Multiple audio without segments
6 Audio(url="audio2.wav")
7 ],
8 model="lipsync-2"
9)
10
11# ✅ This will work
12response = sync.generations.create(
13 input=[
14 Video(url="video.mp4"),
15 Audio(url="audio1.wav", ref_id="a1"),
16 Audio(url="audio2.wav", ref_id="a2")
17 ],
18 segments=[
19 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "a1"}},
20 {"start_time": 10, "end_time": 20, "audio_input": {"refId": "a2"}}
21 ],
22 model="lipsync-2"
23)

Ensure all audio inputs have valid url or assetId values and that referenced refId values exist in your audio or text inputs.

1# ❌ Missing refId reference
2response = sync.generations.create(
3 input=[
4 Video(url="video.mp4"),
5 Audio(url="audio.wav", ref_id="audio1")
6 ],
7 segments=[
8 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "missing"}} # Wrong refId
9 ],
10 model="lipsync-2"
11)
12
13# ✅ Correct refId reference
14response = sync.generations.create(
15 input=[
16 Video(url="video.mp4"),
17 Audio(url="audio.wav", ref_id="audio1")
18 ],
19 segments=[
20 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "audio1"}} # Correct refId
21 ],
22 model="lipsync-2"
23)

This error occurs when a segment’s audio_input is missing a refId or the refId is empty. Each segment must reference a valid audio or text input through its refId.

1# ❌ Missing refId in segment
2response = sync.generations.create(
3 input=[
4 Video(url="video.mp4"),
5 Audio(url="audio.wav", ref_id="audio1")
6 ],
7 segments=[
8 {
9 "start_time": 0,
10 "end_time": 10,
11 "audio_input": {} # Missing refId
12 }
13 ],
14 model="lipsync-2"
15)
16
17# ✅ Include refId in segment
18response = sync.generations.create(
19 input=[
20 Video(url="video.mp4"),
21 Audio(url="audio.wav", ref_id="audio1")
22 ],
23 segments=[
24 {
25 "start_time": 0,
26 "end_time": 10,
27 "audio_input": {"refId": "audio1"} # Valid refId
28 }
29 ],
30 model="lipsync-2"
31)

This error occurs when a segment references a refId that doesn’t exist in your audio or text inputs. Ensure all referenced refId values match exactly with those defined in your inputs.

1# ❌ Segment references unknown refId
2response = sync.generations.create(
3 input=[
4 Video(url="video.mp4"),
5 Audio(url="audio.wav", ref_id="audio1") # refId is "audio1"
6 ],
7 segments=[
8 {
9 "start_time": 0,
10 "end_time": 10,
11 "audio_input": {"refId": "nonexistent"} # References unknown refId
12 }
13 ],
14 model="lipsync-2"
15)
16
17# ✅ Segment references existing refId
18response = sync.generations.create(
19 input=[
20 Video(url="video.mp4"),
21 Audio(url="audio.wav", ref_id="audio1") # refId is "audio1"
22 ],
23 segments=[
24 {
25 "start_time": 0,
26 "end_time": 10,
27 "audio_input": {"refId": "audio1"} # References existing refId
28 }
29 ],
30 model="lipsync-2"
31)

Each segment’s startTime must be less than or equal to its endTime. Zero-length segments (where startTime equals endTime) are allowed for use cases like zero-duration crop points.

1# ❌ Invalid: startTime greater than endTime
2segments=[
3 {
4 "startTime": 10,
5 "endTime": 5, # endTime must be >= startTime
6 "audioInput": {"refId": "audio1"}
7 }
8]
9
10# ✅ Valid: startTime less than endTime
11segments=[
12 {
13 "startTime": 0,
14 "endTime": 10,
15 "audioInput": {"refId": "audio1"}
16 }
17]
18
19# ✅ Valid: startTime equals endTime (zero-length segment)
20segments=[
21 {
22 "startTime": 5,
23 "endTime": 5,
24 "audioInput": {"refId": "audio1"}
25 }
26]

When specifying segment boundaries using frames instead of seconds, startFrame must be strictly less than endFrame. Unlike time-based segments which allow equal start and end times, frame-based segments require at least one frame of difference.

1# ❌ Invalid: startFrame greater than or equal to endFrame
2segments=[
3 {
4 "startFrame": 100,
5 "endFrame": 0, # endFrame must be > startFrame
6 "audioInput": {"refId": "audio1"}
7 }
8]
9
10# ❌ Invalid: startFrame equals endFrame (not allowed for frames)
11segments=[
12 {
13 "startFrame": 50,
14 "endFrame": 50, # endFrame must be > startFrame
15 "audioInput": {"refId": "audio1"}
16 }
17]
18
19# ✅ Valid: startFrame less than endFrame
20segments=[
21 {
22 "startFrame": 0,
23 "endFrame": 100,
24 "audioInput": {"refId": "audio1"}
25 }
26]

When cropping audio within a segment, both startTime and endTime must be provided, and startTime must be less than or equal to endTime.

1# ❌ Incomplete crop range
2segments=[
3 {
4 "startTime": 0,
5 "endTime": 10,
6 "audioInput": {
7 "refId": "audio1",
8 "startTime": 5 # Missing endTime
9 }
10 }
11]
12
13# ❌ Invalid: startTime greater than endTime
14segments=[
15 {
16 "startTime": 0,
17 "endTime": 10,
18 "audioInput": {
19 "refId": "audio1",
20 "startTime": 15,
21 "endTime": 5 # endTime must be >= startTime
22 }
23 }
24]
25
26# ✅ Valid: complete crop range with startTime <= endTime
27segments=[
28 {
29 "startTime": 0,
30 "endTime": 10,
31 "audioInput": {
32 "refId": "audio1",
33 "startTime": 5,
34 "endTime": 15
35 }
36 }
37]
38
39# ✅ Valid: zero-duration crop point (startTime equals endTime)
40segments=[
41 {
42 "startTime": 0,
43 "endTime": 10,
44 "audioInput": {
45 "refId": "audio1",
46 "startTime": 5,
47 "endTime": 5
48 }
49 }
50]

Ensure you have at least one audio input or text input with a valid refId when using segments.

1# ❌ This will fail - no audio or text inputs
2response = sync.generations.create(
3 input=[
4 Video(url="https://example.com/video.mp4")
5 ],
6 segments=[
7 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "missing"}}
8 ],
9 model="lipsync-2"
10)
11
12# ✅ This will work - includes text input
13response = sync.generations.create(
14 input=[
15 Video(url="https://example.com/video.mp4"),
16 TTS(
17 provider={
18 "name": "elevenlabs",
19 "voiceId": "EXAVITQu4vr4xnSDxMaL",
20 "script": "Hello world"
21 },
22 ref_id="text1"
23 )
24 ],
25 segments=[
26 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "text1"}}
27 ],
28 model="lipsync-2"
29)
  • Lipsync Model — learn about supported models for single and multi-segment generations
  • Video Dubbing API Guide — use segments with dubbing workflows for multi-speaker video dubbing