Summary
The OpenAI audio.transcriptions.create(stream=True) streaming mode is not instrumented. When called with stream=True and models like gpt-4o-transcribe or gpt-4o-mini-transcribe, the OpenAI SDK returns an SSE stream of transcript.text.delta events followed by a transcript.text.done event. The current TranscriptionWrapper routes through BaseWrapper.create(), which has no streaming support — it calls the API, immediately tries to process the response as a single object, and closes the span. No transcript delta events are accumulated, no time_to_first_token is measured, and the caller may receive a broken or prematurely consumed stream.
By contrast, ChatCompletionWrapper.create() and ResponseWrapper.create() both have explicit if stream: branches with dedicated stream proxy classes (_TracedStream / _AsyncTracedStream) that yield chunks to the caller while accumulating them for span logging. The transcription path lacks this entirely.
Non-streaming transcription (stream=False, the default) works correctly — #174 added the TranscriptionWrapper and it handles that case fine.
What is missing
| OpenAI Audio Transcription Mode |
Instrumented? |
audio.transcriptions.create() (non-streaming) |
Yes |
audio.transcriptions.create(stream=True) (streaming) |
No |
audio.translations.create() |
Yes |
audio.speech.create() |
Yes |
How the gap manifests
_make_base_wrapper_callback (line 329 of tracing.py) checks kwargs.get("stream", False) but only uses it to decide whether to skip with_raw_response. It still routes to BaseWrapper.create() (line 346), which:
- Calls the API — gets back an SSE streaming iterator
- Calls
_try_to_dict(raw_response) on the iterator — produces garbage or empty dict
- Calls
self.process_output(log_response, span) — logs meaningless output
- Closes the span immediately — no stream consumption tracked
- Returns the (possibly partially consumed) iterator to the caller
What should happen
The TranscriptionWrapper (or a new streaming-aware subclass) should:
- Detect
stream=True
- Return a traced stream proxy (like
_TracedStream) that yields transcript.text.delta events to the caller
- Accumulate delta text across chunks
- Measure
time_to_first_token on the first delta
- Log the final accumulated transcript text as span output when the stream is exhausted
- Close the span only after stream consumption
This matches the existing patterns in ChatCompletionWrapper.create() (line 408) and ResponseWrapper.create() (line 890).
Upstream API details
OpenAI's streaming transcription is available in openai==2.33.0 (the pinned latest in this repo):
Braintrust docs status
not_found — The OpenAI integration page does not mention streaming audio transcription.
Upstream sources
Local files inspected
py/src/braintrust/integrations/openai/tracing.py:
_make_base_wrapper_callback (line 329) — routes stream=True through BaseWrapper.create() with no streaming handling
BaseWrapper.create() (line 1210) — no if stream: branch, processes response as single object
TranscriptionWrapper (line 1411) — inherits from BaseWrapper, no streaming override
ChatCompletionWrapper.create() (line 408) — has proper if stream: branch with _TracedStream (the pattern to follow)
ResponseWrapper.create() (line 890) — has proper if stream: branch (another pattern to follow)
py/src/braintrust/integrations/openai/patchers.py — _WrapTranscriptions patcher targets Transcriptions.create and AsyncTranscriptions.create
py/src/braintrust/integrations/openai/test_openai.py — no test cases for audio.transcriptions.create(stream=True)
Relationship to existing issues
Summary
The OpenAI
audio.transcriptions.create(stream=True)streaming mode is not instrumented. When called withstream=Trueand models likegpt-4o-transcribeorgpt-4o-mini-transcribe, the OpenAI SDK returns an SSE stream oftranscript.text.deltaevents followed by atranscript.text.doneevent. The currentTranscriptionWrapperroutes throughBaseWrapper.create(), which has no streaming support — it calls the API, immediately tries to process the response as a single object, and closes the span. No transcript delta events are accumulated, notime_to_first_tokenis measured, and the caller may receive a broken or prematurely consumed stream.By contrast,
ChatCompletionWrapper.create()andResponseWrapper.create()both have explicitif stream:branches with dedicated stream proxy classes (_TracedStream/_AsyncTracedStream) that yield chunks to the caller while accumulating them for span logging. The transcription path lacks this entirely.Non-streaming transcription (
stream=False, the default) works correctly — #174 added theTranscriptionWrapperand it handles that case fine.What is missing
audio.transcriptions.create()(non-streaming)audio.transcriptions.create(stream=True)(streaming)audio.translations.create()audio.speech.create()How the gap manifests
_make_base_wrapper_callback(line 329 oftracing.py) checkskwargs.get("stream", False)but only uses it to decide whether to skipwith_raw_response. It still routes toBaseWrapper.create()(line 346), which:_try_to_dict(raw_response)on the iterator — produces garbage or empty dictself.process_output(log_response, span)— logs meaningless outputWhat should happen
The
TranscriptionWrapper(or a new streaming-aware subclass) should:stream=True_TracedStream) that yieldstranscript.text.deltaevents to the callertime_to_first_tokenon the first deltaThis matches the existing patterns in
ChatCompletionWrapper.create()(line 408) andResponseWrapper.create()(line 890).Upstream API details
OpenAI's streaming transcription is available in
openai==2.33.0(the pinned latest in this repo):gpt-4o-transcribe,gpt-4o-mini-transcribestream=Trueonaudio.transcriptions.create()transcript.text.deltaandtranscript.text.doneeventsBraintrust docs status
not_found — The OpenAI integration page does not mention streaming audio transcription.
Upstream sources
client.audio.transcriptions.create(stream=True)returns an SSE streamLocal files inspected
py/src/braintrust/integrations/openai/tracing.py:_make_base_wrapper_callback(line 329) — routesstream=TruethroughBaseWrapper.create()with no streaming handlingBaseWrapper.create()(line 1210) — noif stream:branch, processes response as single objectTranscriptionWrapper(line 1411) — inherits fromBaseWrapper, no streaming overrideChatCompletionWrapper.create()(line 408) — has properif stream:branch with_TracedStream(the pattern to follow)ResponseWrapper.create()(line 890) — has properif stream:branch (another pattern to follow)py/src/braintrust/integrations/openai/patchers.py—_WrapTranscriptionspatcher targetsTranscriptions.createandAsyncTranscriptions.createpy/src/braintrust/integrations/openai/test_openai.py— no test cases foraudio.transcriptions.create(stream=True)Relationship to existing issues
client.audio.speech,transcriptions,translations) not instrumented #174 (closed) added the non-streamingTranscriptionWrapper— this issue is about the missing streaming variantaudiodelta chunks (GPT-4o audio modality) #201 (closed) fixed streaming audio delta accumulation in chat completions — this issue is the same class of gap but for the transcription API