fix(audio): 修复 TTS 流式 Accept、外部存储与持久化内存占用#1792
Conversation
- DoStream 强制 SSE Accept,避免 chat/completions 流式被覆盖为 application/json - /audio/speech 缺省 stream_format 时保持非流式,恢复音频外部存储路径 - 空响应检测识别 Object="[DONE]" 和纯 speech.audio.done,触发重试 - 持久化缓冲与 LivePreview buffer 摘要二进制音频 chunk,避免缓存完整音频字节
Greptile SummaryThis PR fixes five regressions introduced by #1784's OpenAI Audio API integration: broken SSE Accept-header negotiation, default-streaming bypassing external audio storage, empty binary TTS streams not triggering retries, bare
Confidence Score: 5/5Safe to merge. All five bug fixes have targeted, well-isolated changes and the new binary streaming path is covered end-to-end by integration tests. Each regression is addressed with the minimal necessary change: the Accept-header fix, the default-streaming removal, the binary decoder registration, the memory summarisation at every persistence boundary, and the empty-response sentinel recognition. The test suite covers all four empty-response scenarios and the full binary streaming round-trip. No correctness issues were found during review. No files require special attention. The most complex new code is binaryChunkDecoder and SummarizeBinaryChunk, both of which are fully unit-tested. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
Client -->|POST /audio/speech| CreateSpeech
CreateSpeech -->|stream_format absent| NonStream[Non-streaming path\nChatCompletionWithRequest]
CreateSpeech -->|stream_format=sse| SSEPath[SSE path\nWriteSSEStream]
CreateSpeech -->|stream_format=audio| BinaryPath[Binary path\nWriteBinaryStream]
NonStream -->|Do HTTP| ProviderHTTP[Provider HTTP response\nbinary audio body]
ProviderHTTP -->|UpdateRequestCompletedWithAudio| ExternalStorage[(External Storage)]
SSEPath -->|DoStream Accept: text/event-stream| ProviderSSE[Provider SSE stream]
ProviderSSE -->|SSE decoder| SSEDecoder[speech.audio.delta / done events]
SSEDecoder -->|transformSpeechStreamChunk| LLMResponseSSE[llm.Response SpeechStreamEvent]
BinaryPath -->|DoStream Accept: */*| ProviderBinary[Provider chunked audio]
ProviderBinary -->|binaryChunkDecoder| BinaryDecoder[audio/mpeg chunks + binary.done]
BinaryDecoder -->|transformSpeechBinaryChunk| LLMResponseBin[llm.Response SpeechAudioChunk / DONE]
LLMResponseSSE --> CheckEmpty{checkEmptyResponse}
LLMResponseBin --> CheckEmpty
CheckEmpty -->|has content| InboundTransform[inbound.TransformStream]
CheckEmpty -->|DONE with no content| Retry[ErrEmptyResponse - retry]
InboundTransform -->|SpeechAudioChunk| StreamEventBin[StreamEvent Type=audio/mpeg Data=bytes]
InboundTransform -->|SpeechStreamEvent| StreamEventSSE[StreamEvent Type=speech.audio.delta]
InboundTransform -->|binary DONE| StreamEventDone[StreamEvent Type=binary.done]
StreamEventBin -->|InboundPersistentStream| SummarizePersist[SummarizeBinaryChunk\nSize only stored]
SummarizePersist --> DB[(DB chunks\naudio_bytes count)]
StreamEventBin -->|WriteBinaryStream| ClientResponse[Binary audio response]
StreamEventDone -->|WriteBinaryStream skip| ClientResponse
Reviews (2): Last reviewed commit: "fix: 处理 TTS 流式评审问题" | Re-trigger Greptile |
* fix(audio): 修复 TTS 流式 Accept、外部存储与持久化内存占用 - DoStream 强制 SSE Accept,避免 chat/completions 流式被覆盖为 application/json - /audio/speech 缺省 stream_format 时保持非流式,恢复音频外部存储路径 - 空响应检测识别 Object="[DONE]" 和纯 speech.audio.done,触发重试 - 持久化缓冲与 LivePreview buffer 摘要二进制音频 chunk,避免缓存完整音频字节 * fix(lint): chat_test 中 embedded 字段后补空行 * fix: 处理 TTS 流式评审问题
背景
#1784引入 OpenAI Audio API 后,对/audio/speech默认走stream_format=audio流式路径,并放宽了DoStream的 Accept 设置,导致几个回归。本 PR 修复这些问题。修复内容
1.
DoStream不再正确协商 SSEllm/httpclient/client.go之前改成只在 Accept 为空时才写text/event-stream,但 OpenAI chat outbound 默认设Accept: application/json,导致普通流式 chat 请求向上游声明的是 JSON 而非 SSE。修复:Accept 为空 或 等于
application/json时都强制改为text/event-stream,保留 TTS 二进制流*/*和已显式text/event-stream的 outbound。2.
/audio/speech默认走流式,绕过音频外部存储llm/transformer/openai/audio_inbound.go之前缺省stream_format时强制Stream=true、stream_format=audio,会绕开UpdateRequestCompletedWithAudio的音频外部存储路径,只剩audio_bytes摘要,无法下载完整音频。修复:缺省
stream_format保持非流式(Stream=false),客户端必须显式sse或audio才进流式管线。OpenAI 官方该参数本就是可选的。3. 空二进制 TTS 流被误判为完成
transformSpeechBinaryChunk把binary.done转成新构造的*llm.Response{Object:"[DONE]"},但checkEmptyResponse只识别event == llm.DoneResponse指针。结果是 200 + 空 body 时不会触发 empty-response retry,最终保存audio_bytes:0的 completed 请求。修复:空响应检测同时识别
event.Object == "[DONE]";hasResponseContent同步处理。4. TTS SSE 仅 done、无 delta 时也未触发重试
hasResponseContent中SpeechStreamEvent.Type != ""把纯speech.audio.done当成了有效内容。修复:只把
AudioBase64 != ""视为有内容,裸speech.audio.done不再算内容。5. 流式音频在持久化缓冲里被全量保留
主持久化流和 LivePreview buffer 都把原始
audio/mpegchunk append 到 buffer,长音频流在请求结束前会累积完整音频字节多份。修复:
httpclient.StreamEvent新增Size字段SummarizeBinaryChunk():摘要二进制音频 chunk(清空Data、保留Type与字节数)InboundPersistentStream.Current、OutboundPersistentStream.Current、liveRequestStream.Next、liveRequestExecutionStream.Next均在写入持久化/preview buffer 前摘要化;下游消费者仍拿原始事件aggregateSpeechStreamChunks与marshalStreamEventForStorage从chunk.Size兜底统计字节数测试
tts-1非流式 +gpt-4o-mini-tts(sse/audio) 三条路径手动验证生成音频均正常go test ./...全部通过