Skip to content

[bot] OpenAI: Streaming audio transcription (audio.transcriptions.create(stream=True)) not instrumented #374

@braintrust-bot

Description

@braintrust-bot

Summary

The OpenAI audio.transcriptions.create(stream=True) streaming mode is not instrumented. When called with stream=True and models like gpt-4o-transcribe or gpt-4o-mini-transcribe, the OpenAI SDK returns an SSE stream of transcript.text.delta events followed by a transcript.text.done event. The current TranscriptionWrapper routes through BaseWrapper.create(), which has no streaming support — it calls the API, immediately tries to process the response as a single object, and closes the span. No transcript delta events are accumulated, no time_to_first_token is measured, and the caller may receive a broken or prematurely consumed stream.

By contrast, ChatCompletionWrapper.create() and ResponseWrapper.create() both have explicit if stream: branches with dedicated stream proxy classes (_TracedStream / _AsyncTracedStream) that yield chunks to the caller while accumulating them for span logging. The transcription path lacks this entirely.

Non-streaming transcription (stream=False, the default) works correctly — #174 added the TranscriptionWrapper and it handles that case fine.

What is missing

OpenAI Audio Transcription Mode Instrumented?
audio.transcriptions.create() (non-streaming) Yes
audio.transcriptions.create(stream=True) (streaming) No
audio.translations.create() Yes
audio.speech.create() Yes

How the gap manifests

_make_base_wrapper_callback (line 329 of tracing.py) checks kwargs.get("stream", False) but only uses it to decide whether to skip with_raw_response. It still routes to BaseWrapper.create() (line 346), which:

  1. Calls the API — gets back an SSE streaming iterator
  2. Calls _try_to_dict(raw_response) on the iterator — produces garbage or empty dict
  3. Calls self.process_output(log_response, span) — logs meaningless output
  4. Closes the span immediately — no stream consumption tracked
  5. Returns the (possibly partially consumed) iterator to the caller

What should happen

The TranscriptionWrapper (or a new streaming-aware subclass) should:

  • Detect stream=True
  • Return a traced stream proxy (like _TracedStream) that yields transcript.text.delta events to the caller
  • Accumulate delta text across chunks
  • Measure time_to_first_token on the first delta
  • Log the final accumulated transcript text as span output when the stream is exhausted
  • Close the span only after stream consumption

This matches the existing patterns in ChatCompletionWrapper.create() (line 408) and ResponseWrapper.create() (line 890).

Upstream API details

OpenAI's streaming transcription is available in openai==2.33.0 (the pinned latest in this repo):

Braintrust docs status

not_found — The OpenAI integration page does not mention streaming audio transcription.

Upstream sources

Local files inspected

  • py/src/braintrust/integrations/openai/tracing.py:
    • _make_base_wrapper_callback (line 329) — routes stream=True through BaseWrapper.create() with no streaming handling
    • BaseWrapper.create() (line 1210) — no if stream: branch, processes response as single object
    • TranscriptionWrapper (line 1411) — inherits from BaseWrapper, no streaming override
    • ChatCompletionWrapper.create() (line 408) — has proper if stream: branch with _TracedStream (the pattern to follow)
    • ResponseWrapper.create() (line 890) — has proper if stream: branch (another pattern to follow)
  • py/src/braintrust/integrations/openai/patchers.py_WrapTranscriptions patcher targets Transcriptions.create and AsyncTranscriptions.create
  • py/src/braintrust/integrations/openai/test_openai.py — no test cases for audio.transcriptions.create(stream=True)

Relationship to existing issues

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions