fix(audio): 修复 TTS 流式 Accept、外部存储与持久化内存占用 by EmccK · Pull Request #1792 · looplj/axonhub

EmccK · 2026-06-06T14:24:57Z

背景

#1784 引入 OpenAI Audio API 后，对 /audio/speech 默认走 stream_format=audio 流式路径，并放宽了 DoStream 的 Accept 设置，导致几个回归。本 PR 修复这些问题。

修复内容

1. `DoStream` 不再正确协商 SSE

llm/httpclient/client.go 之前改成只在 Accept 为空时才写 text/event-stream，但 OpenAI chat outbound 默认设 Accept: application/json，导致普通流式 chat 请求向上游声明的是 JSON 而非 SSE。

修复：Accept 为空或等于 application/json 时都强制改为 text/event-stream，保留 TTS 二进制流 */* 和已显式 text/event-stream 的 outbound。

2. `/audio/speech` 默认走流式，绕过音频外部存储

llm/transformer/openai/audio_inbound.go 之前缺省 stream_format 时强制 Stream=true、stream_format=audio，会绕开 UpdateRequestCompletedWithAudio 的音频外部存储路径，只剩 audio_bytes 摘要，无法下载完整音频。

修复：缺省 stream_format 保持非流式（Stream=false），客户端必须显式 sse 或 audio 才进流式管线。OpenAI 官方该参数本就是可选的。

3. 空二进制 TTS 流被误判为完成

transformSpeechBinaryChunk 把 binary.done 转成新构造的 *llm.Response{Object:"[DONE]"}，但 checkEmptyResponse 只识别 event == llm.DoneResponse 指针。结果是 200 + 空 body 时不会触发 empty-response retry，最终保存 audio_bytes:0 的 completed 请求。

修复：空响应检测同时识别 event.Object == "[DONE]"；hasResponseContent 同步处理。

4. TTS SSE 仅 done、无 delta 时也未触发重试

hasResponseContent 中 SpeechStreamEvent.Type != "" 把纯 speech.audio.done 当成了有效内容。

修复：只把 AudioBase64 != "" 视为有内容，裸 speech.audio.done 不再算内容。

5. 流式音频在持久化缓冲里被全量保留

主持久化流和 LivePreview buffer 都把原始 audio/mpeg chunk append 到 buffer，长音频流在请求结束前会累积完整音频字节多份。

修复：

给 httpclient.StreamEvent 新增 Size 字段
新增 SummarizeBinaryChunk()：摘要二进制音频 chunk（清空 Data、保留 Type 与字节数）
InboundPersistentStream.Current、OutboundPersistentStream.Current、liveRequestStream.Next、liveRequestExecutionStream.Next 均在写入持久化/preview buffer 前摘要化；下游消费者仍拿原始事件
aggregateSpeechStreamChunks 与 marshalStreamEventForStorage 从 chunk.Size 兜底统计字节数

测试

新增/更新单测覆盖所有四个修复点（empty binary stream、empty SSE done、Accept 协商、外部存储路径、Size 兜底）
本地以 tts-1 非流式 + gpt-4o-mini-tts (sse / audio) 三条路径手动验证生成音频均正常
go test ./... 全部通过

- DoStream 强制 SSE Accept，避免 chat/completions 流式被覆盖为 application/json - /audio/speech 缺省 stream_format 时保持非流式，恢复音频外部存储路径 - 空响应检测识别 Object="[DONE]" 和纯 speech.audio.done，触发重试 - 持久化缓冲与 LivePreview buffer 摘要二进制音频 chunk，避免缓存完整音频字节

greptile-apps · 2026-06-06T14:39:44Z

Greptile Summary

This PR fixes five regressions introduced by #1784's OpenAI Audio API integration: broken SSE Accept-header negotiation, default-streaming bypassing external audio storage, empty binary TTS streams not triggering retries, bare speech.audio.done events counted as content, and full audio payloads accumulating in persistence buffers.

Accept header fix (DoStream): forces text/event-stream when the outbound Accept is empty or application/json, preserving explicit */* for binary TTS.
Default streaming removed (audio_inbound): omitting stream_format now stays non-streaming so UpdateRequestCompletedWithAudio can persist the full audio file; only explicit \"sse\" or \"audio\" engages streaming.
Memory fix (SummarizeBinaryChunk): persistence/live-preview buffers store a size-only summary instead of the raw audio bytes, with chunk.Size used as fallback in aggregators when Data has been elided.
Empty-response detection (checkEmptyResponse + hasResponseContent): recognises freshly-constructed {Object:\"[DONE]\"} sentinels and restricts SpeechStreamEvent content check to AudioBase64 != \"\".

Confidence Score: 5/5

Safe to merge. All five bug fixes have targeted, well-isolated changes and the new binary streaming path is covered end-to-end by integration tests.

Each regression is addressed with the minimal necessary change: the Accept-header fix, the default-streaming removal, the binary decoder registration, the memory summarisation at every persistence boundary, and the empty-response sentinel recognition. The test suite covers all four empty-response scenarios and the full binary streaming round-trip. No correctness issues were found during review.

No files require special attention. The most complex new code is binaryChunkDecoder and SummarizeBinaryChunk, both of which are fully unit-tested.

Important Files Changed

Filename	Overview
llm/httpclient/model.go	Adds Size field, IsBinaryAudioChunk() method, and SummarizeBinaryChunk() to StreamEvent; introduces BinaryStreamDoneEventType constant. Implementation is clean and well-tested.
llm/httpclient/decoder.go	Adds binaryChunkDecoder for non-SSE audio streams, with correct EOF/error handling, atomic close, and buffer reuse. Registers decoders for all common audio MIME types.
llm/httpclient/client.go	Fixes Accept header negotiation in DoStream: forces text/event-stream when Accept is empty or application/json, preserving explicit values like / for binary TTS. Adds MIME parameter stripping for content-type decoder lookup.
llm/pipeline/stream.go	Extends empty-response check to recognize freshly-constructed {Object:"[DONE]"} terminators in addition to the shared llm.DoneResponse sentinel, fixing TTS binary stream empty-retry detection.
llm/pipeline/empty_response.go	hasResponseContent now recognizes SpeechAudioChunk and properly rejects bare speech.audio.done events (no audio) as non-content. Also handles {Object:"[DONE]"} response objects.
llm/transformer/openai/audio_inbound.go	Default stream_format now stays non-streaming; only explicit "sse" or "audio" engages the streaming pipeline. TransformStream correctly routes binary vs SSE done events and aggregateSpeechStreamChunks uses chunk.Size as fallback for summarized audio chunks.
llm/transformer/openai/audio_outbound.go	Adds transformSpeechBinaryChunk and speechStreamChunkTransformFor to handle raw binary provider streams alongside SSE. Binary done sentinel correctly emits an {Object:"[DONE]"} response with RequestType/APIFormat set for downstream routing.
internal/server/orchestrator/inbound.go	InboundPersistentStream.Current() now appends summarized binary chunks to responseChunks while returning the original full-data event to consumers. BinaryStreamDoneEventType added to isTerminalStreamEvent.
internal/server/orchestrator/outbound.go	OutboundPersistentStream.Current() summarizes binary chunks for persistence. isCompletedAggregated now uses meta.Completed flag, enabling TTS aggregation completion without requiring completion tokens.
internal/server/biz/request.go	Adds marshalStreamEventForStorage and shouldSkipStoredStreamChunk to unify chunk serialization, correctly using chunk.Size fallback when binary audio Data has been elided by summarization.
internal/server/api/chat.go	ChatCompletion refactored to delegate to new ChatCompletionWithRequest. Adds WriteBinaryStream for raw chunked audio responses and streamErrorStatus for mapping errors to HTTP status codes.
internal/server/api/openai.go	CreateSpeech now routes to WriteBinaryStream for stream_format=audio and to the default SSE/non-streaming path otherwise, based on shouldUseBinarySpeechStream routing function.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Client -->|POST /audio/speech| CreateSpeech

    CreateSpeech -->|stream_format absent| NonStream[Non-streaming path\nChatCompletionWithRequest]
    CreateSpeech -->|stream_format=sse| SSEPath[SSE path\nWriteSSEStream]
    CreateSpeech -->|stream_format=audio| BinaryPath[Binary path\nWriteBinaryStream]

    NonStream -->|Do HTTP| ProviderHTTP[Provider HTTP response\nbinary audio body]
    ProviderHTTP -->|UpdateRequestCompletedWithAudio| ExternalStorage[(External Storage)]

    SSEPath -->|DoStream Accept: text/event-stream| ProviderSSE[Provider SSE stream]
    ProviderSSE -->|SSE decoder| SSEDecoder[speech.audio.delta / done events]
    SSEDecoder -->|transformSpeechStreamChunk| LLMResponseSSE[llm.Response SpeechStreamEvent]

    BinaryPath -->|DoStream Accept: */*| ProviderBinary[Provider chunked audio]
    ProviderBinary -->|binaryChunkDecoder| BinaryDecoder[audio/mpeg chunks + binary.done]
    BinaryDecoder -->|transformSpeechBinaryChunk| LLMResponseBin[llm.Response SpeechAudioChunk / DONE]

    LLMResponseSSE --> CheckEmpty{checkEmptyResponse}
    LLMResponseBin --> CheckEmpty

    CheckEmpty -->|has content| InboundTransform[inbound.TransformStream]
    CheckEmpty -->|DONE with no content| Retry[ErrEmptyResponse - retry]

    InboundTransform -->|SpeechAudioChunk| StreamEventBin[StreamEvent Type=audio/mpeg Data=bytes]
    InboundTransform -->|SpeechStreamEvent| StreamEventSSE[StreamEvent Type=speech.audio.delta]
    InboundTransform -->|binary DONE| StreamEventDone[StreamEvent Type=binary.done]

    StreamEventBin -->|InboundPersistentStream| SummarizePersist[SummarizeBinaryChunk\nSize only stored]
    SummarizePersist --> DB[(DB chunks\naudio_bytes count)]
    StreamEventBin -->|WriteBinaryStream| ClientResponse[Binary audio response]

    StreamEventDone -->|WriteBinaryStream skip| ClientResponse

_{Reviews (2): Last reviewed commit: "fix: 处理 TTS 流式评审问题" | Re-trigger Greptile}

* fix(audio): 修复 TTS 流式 Accept、外部存储与持久化内存占用 - DoStream 强制 SSE Accept，避免 chat/completions 流式被覆盖为 application/json - /audio/speech 缺省 stream_format 时保持非流式，恢复音频外部存储路径 - 空响应检测识别 Object="[DONE]" 和纯 speech.audio.done，触发重试 - 持久化缓冲与 LivePreview buffer 摘要二进制音频 chunk，避免缓存完整音频字节 * fix(lint): chat_test 中 embedded 字段后补空行 * fix: 处理 TTS 流式评审问题

EmccK added 2 commits June 6, 2026 22:24

fix(lint): chat_test 中 embedded 字段后补空行

a313449

greptile-apps Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread llm/httpclient/decoder.go

Comment thread llm/httpclient/decoder.go Outdated

Comment thread internal/server/api/openai.go

fix: 处理 TTS 流式评审问题

c9efcd8

looplj merged commit 7891a8d into looplj:unstable Jun 7, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(audio): 修复 TTS 流式 Accept、外部存储与持久化内存占用#1792

fix(audio): 修复 TTS 流式 Accept、外部存储与持久化内存占用#1792
looplj merged 3 commits into
looplj:unstablefrom
EmccK:feat/openai-audio-next

EmccK commented Jun 6, 2026

Uh oh!

greptile-apps Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EmccK commented Jun 6, 2026

背景

修复内容

1. DoStream 不再正确协商 SSE

2. /audio/speech 默认走流式，绕过音频外部存储

3. 空二进制 TTS 流被误判为完成

4. TTS SSE 仅 done、无 delta 时也未触发重试

5. 流式音频在持久化缓冲里被全量保留

测试

Uh oh!

greptile-apps Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `DoStream` 不再正确协商 SSE

2. `/audio/speech` 默认走流式，绕过音频外部存储

greptile-apps Bot commented Jun 6, 2026 •

edited

Loading