Segments Guide

Overview

Segments let you sync different audio clips to different time ranges within a single video in one API call. This enables multi-speaker lip sync by letting you assign different audio inputs to different parts of your video. Using segments, you can:

LipSync different audio clips to different parts of your video
Use specific portion of audio input to lipsync a segment for precise timing
Use both audio and text-to-speech inputs to lipsync multiple segments with different input types in a single generation

Basic Concepts

To use segments feature, you need to provide a top-level segments array with each item defining a video time range/segment, each with its own audio configuration.

Segment

Each segment item takes the following properties:

startTime

doubleRequired

Segment start time in seconds

endTime

doubleRequired

Segment end time in seconds

audioInput

SegmentAudioInputRequired

Audio configuration with refId and optional cropping

audioInput

Each segment requires exactly one audioInput. audioInput takes the following properties:

refId

stringRequired

Reference ID of the audio/text-to-speech input to use for this segment

startTime

double

Optional start time (in seconds) to crop the referenced audio. When specified, endTime must also be provided

endTime

double

Optional end time (in seconds) to crop the referenced audio. When specified, startTime must also be provided

The specified audioInput will be used to lipsync the video segment between startTime and endTime.

API Usage Examples

Single Segment with Single Audio

1 from sync import Sync
2 from sync.common import Audio, Video
3 
4 sync = Sync()
5 
6 response = sync.generations.create(
7     input=[
8         Video(url="https://assets.sync.so/docs/example-video.mp4"),
9         Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1"),
10     ],
11     model="lipsync-2",
12     segments=[
13         GenerationSegment(
14             start_time=2,
15             end_time=5,
16             audio_input=SegmentAudioInput(ref_id="audio_1"),
17         ),
18     ],
19 )

Multiple Segments with Single Audio

Multiple Segments with Single Audio Input

1 from sync import Sync
2 from sync.common import Audio, Video, TTS
3 
4 sync = Sync()
5 
6 response = sync.generations.create(
7     input=[
8         Video(url="https://assets.sync.so/docs/example-video.mp4"),
9         Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1")
10     ],
11     segments=[
12         {
13             "startTime": 2,
14             "endTime": 5,
15              "audioInput": {"refId": "audio_1", "startTime": 2, "endTime": 5}
16         },
17         {
18             "startTime": 6,
19             "endTime": 8,
20             "audioInput": {"refId": "audio_1", "startTime": 6, "endTime": 8}
21         }
22     ],
23     model="lipsync-2"
24 )

Multiple Segments with Multiple Audio

Multiple Segments with Single Audio Input

1 from sync import Sync
2 from sync.common import Audio, Video, TTS
3 
4 sync = Sync()
5 
6 response = sync.generations.create(
7     input=[
8         Video(url="https://assets.sync.so/docs/example-video.mp4"),
9         Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1"),
10         Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_2")
11     ],
12     segments=[
13         {
14             "startTime": 2,
15             "endTime": 5,
16              "audioInput": {"refId": "audio_1", "startTime": 2, "endTime": 5}
17         },
18         {
19             "startTime": 6,
20             "endTime": 8,
21             "audioInput": {"refId": "audio_2", "startTime": 6, "endTime": 8}
22         }
23     ],
24     model="lipsync-2"
25 )

Best Practices

Planning Your Segments

Map your timeline: Identify video segments and corresponding audio needs
Prepare audio files: Ensure audio quality and appropriate duration
Test segment boundaries: Verify smooth transitions between segments

Audio Preparation

Use consistent audio quality across all segments and the video’s audio.
For best results, ensure proper timing alignment with video segments. If segment duration and corresponding audio duration don’t match, rely on sync_mode to determine how to handle the mismatch.

Troubleshooting

Common Errors

"Multiple audio inputs are only allowed when using multi-segments"

Provide a top-level segments array when using multiple audio or text inputs.

1 # ❌ This will fail
2 response = sync.generations.create(
3     input=[
4         Video(url="video.mp4"),
5         Audio(url="audio1.wav"),  # Multiple audio without segments
6         Audio(url="audio2.wav")
7     ],
8     model="lipsync-2"
9 )
10 
11 # ✅ This will work
12 response = sync.generations.create(
13     input=[
14         Video(url="video.mp4"),
15         Audio(url="audio1.wav", ref_id="a1"),
16         Audio(url="audio2.wav", ref_id="a2")
17     ],
18     segments=[
19         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "a1"}},
20         {"start_time": 10, "end_time": 20, "audio_input": {"refId": "a2"}}
21     ],
22     model="lipsync-2"
23 )

"Unable to resolve audio input URL"

Ensure all audio inputs have valid url or assetId values and that referenced refId values exist in your audio or text inputs.

1 # ❌ Missing refId reference
2 response = sync.generations.create(
3     input=[
4         Video(url="video.mp4"),
5         Audio(url="audio.wav", ref_id="audio1")
6     ],
7     segments=[
8         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "missing"}}  # Wrong refId
9     ],
10     model="lipsync-2"
11 )
12 
13 # ✅ Correct refId reference
14 response = sync.generations.create(
15     input=[
16         Video(url="video.mp4"),
17         Audio(url="audio.wav", ref_id="audio1")
18     ],
19     segments=[
20         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "audio1"}}  # Correct refId
21     ],
22     model="lipsync-2"
23 )

"Segment at index X is missing a valid audioInput.refId"

This error occurs when a segment’s audio_input is missing a refId or the refId is empty. Each segment must reference a valid audio or text input through its refId.

1 # ❌ Missing refId in segment
2 response = sync.generations.create(
3     input=[
4         Video(url="video.mp4"),
5         Audio(url="audio.wav", ref_id="audio1")
6     ],
7     segments=[
8         {
9             "start_time": 0, 
10             "end_time": 10, 
11             "audio_input": {}  # Missing refId
12         }
13     ],
14     model="lipsync-2"
15 )
16 
17 # ✅ Include refId in segment
18 response = sync.generations.create(
19     input=[
20         Video(url="video.mp4"),
21         Audio(url="audio.wav", ref_id="audio1")
22     ],
23     segments=[
24         {
25             "start_time": 0, 
26             "end_time": 10, 
27             "audio_input": {"refId": "audio1"}  # Valid refId
28         }
29     ],
30     model="lipsync-2"
31 )

"Segment at index X references unknown refId"

This error occurs when a segment references a refId that doesn’t exist in your audio or text inputs. Ensure all referenced refId values match exactly with those defined in your inputs.

1 # ❌ Segment references unknown refId
2 response = sync.generations.create(
3     input=[
4         Video(url="video.mp4"),
5         Audio(url="audio.wav", ref_id="audio1")  # refId is "audio1"
6     ],
7     segments=[
8         {
9             "start_time": 0, 
10             "end_time": 10, 
11             "audio_input": {"refId": "nonexistent"}  # References unknown refId
12         }
13     ],
14     model="lipsync-2"
15 )
16 
17 # ✅ Segment references existing refId
18 response = sync.generations.create(
19     input=[
20         Video(url="video.mp4"),
21         Audio(url="audio.wav", ref_id="audio1")  # refId is "audio1"
22     ],
23     segments=[
24         {
25             "start_time": 0, 
26             "end_time": 10, 
27             "audio_input": {"refId": "audio1"}  # References existing refId
28         }
29     ],
30     model="lipsync-2"
31 )

"Invalid segment time range: startTime must be <= endTime"

Each segment’s startTime must be less than or equal to its endTime. Zero-length segments (where startTime equals endTime) are allowed for use cases like zero-duration crop points.

1 # ❌ Invalid: startTime greater than endTime
2 segments=[
3     {
4         "startTime": 10,
5         "endTime": 5,  # endTime must be >= startTime
6         "audioInput": {"refId": "audio1"}
7     }
8 ]
9 
10 # ✅ Valid: startTime less than endTime
11 segments=[
12     {
13         "startTime": 0,
14         "endTime": 10,
15         "audioInput": {"refId": "audio1"}
16     }
17 ]
18 
19 # ✅ Valid: startTime equals endTime (zero-length segment)
20 segments=[
21     {
22         "startTime": 5,
23         "endTime": 5,
24         "audioInput": {"refId": "audio1"}
25     }
26 ]

"Invalid segment frame range: startFrame must be < endFrame"

When specifying segment boundaries using frames instead of seconds, startFrame must be strictly less than endFrame. Unlike time-based segments which allow equal start and end times, frame-based segments require at least one frame of difference.

1 # ❌ Invalid: startFrame greater than or equal to endFrame
2 segments=[
3     {
4         "startFrame": 100,
5         "endFrame": 0,  # endFrame must be > startFrame
6         "audioInput": {"refId": "audio1"}
7     }
8 ]
9 
10 # ❌ Invalid: startFrame equals endFrame (not allowed for frames)
11 segments=[
12     {
13         "startFrame": 50,
14         "endFrame": 50,  # endFrame must be > startFrame
15         "audioInput": {"refId": "audio1"}
16     }
17 ]
18 
19 # ✅ Valid: startFrame less than endFrame
20 segments=[
21     {
22         "startFrame": 0,
23         "endFrame": 100,
24         "audioInput": {"refId": "audio1"}
25     }
26 ]

"Invalid audioInput crop range"

When cropping audio within a segment, both startTime and endTime must be provided, and startTime must be less than or equal to endTime.

1 # ❌ Incomplete crop range
2 segments=[
3     {
4         "startTime": 0,
5         "endTime": 10,
6         "audioInput": {
7             "refId": "audio1",
8             "startTime": 5  # Missing endTime
9         }
10     }
11 ]
12 
13 # ❌ Invalid: startTime greater than endTime
14 segments=[
15     {
16         "startTime": 0,
17         "endTime": 10,
18         "audioInput": {
19             "refId": "audio1",
20             "startTime": 15,
21             "endTime": 5  # endTime must be >= startTime
22         }
23     }
24 ]
25 
26 # ✅ Valid: complete crop range with startTime <= endTime
27 segments=[
28     {
29         "startTime": 0,
30         "endTime": 10,
31         "audioInput": {
32             "refId": "audio1",
33             "startTime": 5,
34             "endTime": 15
35         }
36     }
37 ]
38 
39 # ✅ Valid: zero-duration crop point (startTime equals endTime)
40 segments=[
41     {
42         "startTime": 0,
43         "endTime": 10,
44         "audioInput": {
45             "refId": "audio1",
46             "startTime": 5,
47             "endTime": 5
48         }
49     }
50 ]

"When using multi-segments, please provide at least one audio or text input"

Ensure you have at least one audio input or text input with a valid refId when using segments.

1 # ❌ This will fail - no audio or text inputs
2 response = sync.generations.create(
3     input=[
4         Video(url="https://example.com/video.mp4")
5     ],
6     segments=[
7         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "missing"}}
8     ],
9     model="lipsync-2"
10 )
11 
12 # ✅ This will work - includes text input
13 response = sync.generations.create(
14     input=[
15         Video(url="https://example.com/video.mp4"),
16         TTS(
17             provider={
18                 "name": "elevenlabs",
19                 "voiceId": "EXAVITQu4vr4xnSDxMaL",
20                 "script": "Hello world"
21             },
22             ref_id="text1"
23         )
24     ],
25     segments=[
26         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "text1"}}
27     ],
28     model="lipsync-2"
29 )

Lipsync Model — learn about supported models for single and multi-segment generations
Video Dubbing API Guide — use segments with dubbing workflows for multi-speaker video dubbing

Overview

Basic Concepts

Segment

audioInput

API Usage Examples

Single Segment with Single Audio

Multiple Segments with Single Audio

Multiple Segments with Single Audio Input

Multiple Segments with Multiple Audio

Multiple Segments with Single Audio Input

Best Practices

Planning Your Segments

Audio Preparation

Troubleshooting

Common Errors

"Multiple audio inputs are only allowed when using multi-segments"

"Unable to resolve audio input URL"

"Segment at index X is missing a valid audioInput.refId"

"Segment at index X references unknown refId"

"Invalid segment time range: startTime must be <= endTime"

"Invalid segment frame range: startFrame must be < endFrame"

"Invalid audioInput crop range"

"When using multi-segments, please provide at least one audio or text input"

Related Resources