Skip to content

@remotion/elevenlabs: Add package for converting ElevenLabs output to captions#6930

Merged
JonnyBurger merged 16 commits intoremotion-dev:mainfrom
tiwariaayu:#6927
Mar 31, 2026
Merged

@remotion/elevenlabs: Add package for converting ElevenLabs output to captions#6930
JonnyBurger merged 16 commits intoremotion-dev:mainfrom
tiwariaayu:#6927

Conversation

@tiwariaayu
Copy link
Copy Markdown
Contributor

@tiwariaayu tiwariaayu commented Mar 30, 2026

Summary

  • Adds @remotion/elevenlabs package for converting ElevenLabs Speech to Text API output to Caption objects
  • elevenLabsTranscriptToCaptions() — converts word-level transcript output to captions, analogous to @remotion/openai-whisper
  • API must be called with timestamps_granularity: "word" — function throws a helpful error with docs link if the response is missing word timing
  • Docs include a full fetch() example, API reference with argument/return type docs

Test plan

  • Full snapshot test with real ElevenLabs transcript (324 words, "Gimme Gimme Gimme")
  • Invalid input throws error with helpful message
  • Empty words array returns empty captions
  • All 3 tests pass

Closes #6927

🤖 Generated with Claude Code

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog bot commented Mar 30, 2026

Reviewed PR #6930. Submitted 4 inline comments covering: test runner mismatch (vitest should be bun:test), missing array-length validation in the conversion function, broken import sort order in api.tsx, and a misleading comment referencing OpenAI Whisper.

Task list (7/7 completed)
  • Read the full diff to understand the scope of changes
  • Review core package code (types, implementation, exports)
  • Review tests
  • Review package configuration (package.json, tsconfig, eslint)
  • Review documentation changes
  • Review integration with existing packages (CLI, create-video, docs)
  • Self-critique and submit review
Pullfrog  | [View workflow run](https://github.com/remotion-dev/remotion/actions/runs/23750015059/job/69189094498) | Triggered by [Pullfrog](https://pullfrog.com) | Using `Big Pickle` (free) | [𝕏](https://x.com/pullfrogai)

@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Mar 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
bugs Ready Ready Preview, Comment Mar 31, 2026 0:45am
remotion Ready Ready Preview, Comment Mar 31, 2026 0:45am

Request Review

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package structure, docs integration, and conversion logic are solid overall. A few issues need attention — primarily the test runner mismatch (vitest vs bun:test) and a missing array-length validation that could produce NaN timestamps silently.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow run | Using Big Pickle (free) | 𝕏

The original implementation only handled the TTS alignment format.
This adds support for the ElevenLabs Speech to Text API output,
which is the actual analogue to `@remotion/openai-whisper`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JonnyBurger JonnyBurger changed the title [elevenlabs]: Add package for converting ElevenLabs output to captions @remotion/elevenlabs: Add package for converting ElevenLabs output to captions Mar 31, 2026
This was based on a misunderstanding of the issue — the request was
for Speech to Text support (like @remotion/openai-whisper), not
Text to Speech alignment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The inner slider circle used box-content which made its total rendered
size 24x24 (20px content + 4px borders), filling the entire track with
no visible gap. Removing it uses the default border-box from Tailwind's
base reset, making the circle 20x20 total with symmetric 4px gaps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JonnyBurger JonnyBurger merged commit 700a574 into remotion-dev:main Mar 31, 2026
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Package for converting ElevenLabs output to @remotion/captions

2 participants