[Inference] Together: add feature-extraction, text-to-speech, automatic-speech-recognition by nbroad1881 · Pull Request #2130 · huggingface/huggingface.js

nbroad1881 · 2026-04-28T19:37:00Z

Summary

The Together provider currently supports conversational, text-generation, and text-to-image. Per the Together docs, Together also serves audio (TTS + STT) and embedding models, so this PR adds three new task helpers in packages/inference/src/providers/together.ts:

TogetherFeatureExtractionTask → POST /v1/embeddings (OpenAI-compatible: { input, model }, returns data[].embedding)
TogetherTextToSpeechTask → POST /v1/audio/speech ({ input, model, voice, response_format, ... }, returns binary audio as a Blob)
TogetherAutomaticSpeechRecognitionTask → POST /v1/audio/transcriptions

ASR is the only one that doesn't follow the existing JSON-body pattern: Together (and OpenAI's Whisper-compatible API) requires multipart/form-data. The new helper overrides makeBody to construct a real FormData (audio as file, all other args as form fields) and overrides prepareHeaders to leave Content-Type unset so fetch populates the multipart boundary itself. verbose_json segments are mapped into the existing AutomaticSpeechRecognitionOutput.chunks shape.

The three new helpers are wired into getProviderHelper.ts under together: { ... }.

Drive-by fix

While testing the ASR path I hit a pre-existing bug in utils/request.ts::bodyToJson — it crashed with Cannot read properties of null (reading 'accessToken') whenever the body wasn't a Blob/ArrayBuffer/string, which any FormData body would now trigger during error reporting. Fixed in the same PR (also handles the case where the parsed body is a non-object like a JSON string).

Note on HF model mapping

These new tasks work today when callers pass a direct Together API key (the SDK then bypasses the HF router and POSTs straight to api.together.xyz). For them to also work with HF tokens routed through router.huggingface.co/together, Together needs to register at least one model per task in the partner mapping at https://huggingface.co/api/partners/together/models — which currently only lists conversational and text-to-image models. The header comment in together.ts already describes that workflow.

Test plan

pnpm --filter @huggingface/inference run check (tsc) — passes
pnpm --filter @huggingface/inference run lint:check (eslint) — passes
Live request against api.together.xyz with a direct Together API key:
- Embeddings: intfloat/multilingual-e5-large-instruct → 2 vectors, dim=1024, ~313 ms
- TTS: hexgrad/Kokoro-82M, voice af_alloy, response_format=wav → 152 KB audio/wav blob, ~303 ms
- ASR: openai/whisper-large-v3 on packages/inference/test/sample2.wav → "He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and can discover in it but little of rocky Ithaca.", ~1008 ms
Offline mock-fetch verification of request URL/headers/body shape for all three tasks
Once Together registers models for these tasks in the HF partner mapping, end-to-end via hf_… tokens through router.huggingface.co/together should also succeed (no further SDK changes needed)

Made with Cursor

Note

Medium Risk
Adds multiple new Together task helpers (including multipart FormData uploads and async polling/download flows), which may affect request formatting, error handling, and runtime behavior for new modalities.

Overview
Extends the Together provider to support additional tasks beyond chat/text/image generation: feature extraction (embeddings), text-to-speech, and automatic speech recognition, and wires them into getProviderHelper.

Also adds Together-specific implementations for image-to-image (Blob inputs converted to data URLs with model-specific input field selection) and async video generation for text-to-video and image-to-video via job polling and subsequent output download.

Improves robustness of request error reporting by updating bodyToJson to safely handle FormData bodies and avoid null/object-shape assumptions, and tweaks Together payload normalization (e.g., mapping num_inference_steps to steps) and response details for text generation.

^{Reviewed by Cursor Bugbot for commit 882446e. Bugbot is set up for automated code reviews on this repo. Configure here.}

…ch, automatic-speech-recognition Together exposes more modalities than the three currently wired up (conversational, text-generation, text-to-image). This adds three new task helpers, all hitting Together's existing public endpoints: - TogetherFeatureExtractionTask -> POST /v1/embeddings - TogetherTextToSpeechTask -> POST /v1/audio/speech - TogetherAutomaticSpeechRecognitionTask -> POST /v1/audio/transcriptions ASR is the only one that doesn't follow the existing JSON-body pattern: Together (and OpenAI's Whisper-compatible API) requires `multipart/form-data`. The new task overrides `makeBody` to construct a real `FormData` (audio under `file`, the rest as form fields), and overrides `prepareHeaders` to leave `Content-Type` unset so `fetch` populates the multipart boundary itself. `verbose_json` segments are mapped to the existing `AutomaticSpeechRecognitionOutput.chunks` shape. Also fixes a pre-existing bug in `utils/request.ts::bodyToJson` that crashed with "Cannot read properties of null (reading 'accessToken')" whenever the body wasn't a Blob/ArrayBuffer/string -- which is now hit by any FormData body during error reporting. Verified live against api.together.xyz with a Together API key: - Embeddings: intfloat/multilingual-e5-large-instruct, dim=1024 ✓ - TTS: hexgrad/Kokoro-82M, voice af_alloy -> 152 KB WAV ✓ - ASR: openai/whisper-large-v3 on test/sample2.wav -> "He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and can discover in it but little of rocky Ithaca." ✓ Note: For these tasks to be usable through HF tokens (not just direct Together keys), Together still needs to register at least one model per task in the partner mapping at https://huggingface.co/api/partners/together/models, which currently only lists `conversational` and `text-to-image` models. The comment at the top of `together.ts` already describes that workflow. Made-with: Cursor

hanouticelina

made a first pass, thanks! looks good overall

nbroad1881 · 2026-05-06T00:02:56Z

@hanouticelina ,

I have resolved your comments

hanouticelina

@nbroad1881 thanks! looks good to me! I've tested text-to-speech and asr and it works as expected. /v1/embeddings returns service unavailable on your side, is it expected?
also we need to allow the new routes (v1/audio/speech, v1/audio/transcriptions, v1/embeddings) server-side first, as soon as it's done and https://api.together.ai/v1/embeddings is available again, I will merge the PR!

nbroad1881 · 2026-05-14T17:37:08Z

@nbroad1881 thanks! looks good to me! I've tested text-to-speech and asr and it works as expected. /v1/embeddings returns service unavailable on your side, is it expected? also we need to allow the new routes (v1/audio/speech, v1/audio/transcriptions, v1/embeddings) server-side first, as soon as it's done and https://api.together.ai/v1/embeddings is available again, I will merge the PR!

@hanouticelina , my tests show it working. I use intfloat/multilingual-e5-large-instruct and openai/whisper-large-v3 and canopylabs/orpheus-3b-0.1-ft

cursor

Cursor Bugbot has reviewed your changes using default mode and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a4c51b0. Configure here.}

hanouticelina

Thank you! We'll merge it as soon as the corresponding server-side change is deployed to allow the new routes through router.huggingface.co

…81/huggingface.js into together-add-embeddings-tts-asr

nbroad1881 requested review from SBrandeis, hanouticelina and julien-c as code owners April 28, 2026 19:37

nbroad1881 mentioned this pull request Apr 28, 2026

Update Together provider support status huggingface/hub-docs#2165

Open

Merge branch 'main' into together-add-embeddings-tts-asr

faa742a

hanouticelina reviewed May 4, 2026

View reviewed changes

address comments

2b2881a

hanouticelina reviewed May 12, 2026

View reviewed changes

hanouticelina mentioned this pull request May 12, 2026

[Inference] Add embeddings, TTS, ASR, image-to-image and video tasks for Together huggingface/huggingface_hub#4164

Merged

update

9545e7d

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread packages/inference/src/providers/together.ts

nbroad1881 added 2 commits May 14, 2026 10:47

fix image_url check

035af71

fix order

919bc6e

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread packages/inference/src/providers/together.ts

check for bad generations

0247505

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread packages/inference/src/providers/together.ts

provide correct task

a4c51b0

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread packages/inference/src/providers/together.ts

Comment thread packages/inference/src/providers/together.ts

nbroad1881 and others added 4 commits May 18, 2026 15:25

address url, headers

882446e

Merge branch 'main' into together-add-embeddings-tts-asr

9614d3e

fixes

2e38464

Merge branch 'main' into together-add-embeddings-tts-asr

4c3e2b2

hanouticelina approved these changes May 19, 2026

View reviewed changes

hanouticelina added 2 commits May 19, 2026 15:48

Treat missing initial job status as pending, not terminal

23d9b0d

Merge branch 'together-add-embeddings-tts-asr' of github.com:nbroad18…

449a6aa

…81/huggingface.js into together-add-embeddings-tts-asr

hanouticelina merged commit 556088f into huggingface:main May 20, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] Together: add feature-extraction, text-to-speech, automatic-speech-recognition#2130

[Inference] Together: add feature-extraction, text-to-speech, automatic-speech-recognition#2130
hanouticelina merged 14 commits into
huggingface:mainfrom
nbroad1881:together-add-embeddings-tts-asr

nbroad1881 commented Apr 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

hanouticelina left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nbroad1881 commented May 6, 2026

Uh oh!

hanouticelina left a comment

Uh oh!

Uh oh!

nbroad1881 commented May 14, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

hanouticelina left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nbroad1881 commented Apr 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Drive-by fix

Note on HF model mapping

Test plan

Uh oh!

hanouticelina left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nbroad1881 commented May 6, 2026

Uh oh!

hanouticelina left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nbroad1881 commented May 14, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hanouticelina left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nbroad1881 commented Apr 28, 2026 •

edited by cursor Bot

Loading