Summary
The ToolCallTextFilter in OpenAiCompatibleChatClient.GetStreamingResponseAsync() suppresses SSE updates via continue when it detects <tool_call XML in streaming text content. Once the filter activates, ALL subsequent text updates are silently dropped, preventing LlmResponseDeltaReceived messages from reaching the session actor. The ProcessingWatchdog never receives Refresh() calls during suppression, causing it to fire StreamIdleTimeout (default 120s) even though the GPU is actively generating tokens at full speed.
Root Cause
In OpenAiCompatibleChatClient.cs (lines 97-98), when suppressThisUpdate is true:
if (suppressThisUpdate)
continue; // skips yield return — caller never sees this SSE event
yield return update;
The ToolCallTextFilter (line 717) triggers on <tool_call and is permanently latched (_suppressionActive = true). Once triggered, every subsequent text SSE event is consumed by ReadLineAsync() but never yielded to the SessionLlmInvoker streaming loop. This means:
SessionLlmInvoker stops dispatching LlmResponseDeltaReceived messages
- The actor's
ProcessingWatchdog.Refresh() is never called
- After 120s of no refresh, the watchdog fires and cancels the HTTP request
Impact
This affects any model that falls back to text-based tool calling (e.g., <tool_call> XML format) after structured tool_calls fail, and any scenario where the filter activates early in a long generation. The GPU continues generating tokens at full speed while Netclaw kills the request, wasting GPU compute.
Observed in production with Gemma 4 26B-A4B via llama-server. The model generated 8,688 output tokens over ~180 seconds before being cancelled. The error surfaced as "The LLM response stream stopped unexpectedly" even though the stream was actively producing tokens the entire time.
Fix
Two changes needed:
OpenAiCompatibleChatClient: When suppressing text, yield a content-free keepalive ChatResponseUpdate instead of continue-ing past the yield
SessionLlmInvoker: When an update yields no dispatchable content, send a keepalive LlmResponseDeltaReceived with empty TextContent to refresh the watchdog
The actor's delta handler (LlmSessionActor.cs:483-484) already calls _watchdog.Refresh() unconditionally before the content-type switch, so an empty TextContent will refresh the timer without emitting any visible output.
Files
src/Netclaw.Providers/SelfHosted/OpenAiCompatibleChatClient.cs — suppression logic (lines 97-98)
src/Netclaw.Actors/Sessions/Pipelines/SessionLlmInvoker.cs — streaming loop (lines 69-128)
src/Netclaw.Actors/Sessions/LlmSessionActor.cs — delta handler (lines 480-504)
src/Netclaw.Actors/Sessions/Handlers/ProcessingWatchdog.cs — timer refresh
Summary
The
ToolCallTextFilterinOpenAiCompatibleChatClient.GetStreamingResponseAsync()suppresses SSE updates viacontinuewhen it detects<tool_callXML in streaming text content. Once the filter activates, ALL subsequent text updates are silently dropped, preventingLlmResponseDeltaReceivedmessages from reaching the session actor. TheProcessingWatchdognever receivesRefresh()calls during suppression, causing it to fireStreamIdleTimeout(default 120s) even though the GPU is actively generating tokens at full speed.Root Cause
In
OpenAiCompatibleChatClient.cs(lines 97-98), whensuppressThisUpdateis true:The
ToolCallTextFilter(line 717) triggers on<tool_calland is permanently latched (_suppressionActive = true). Once triggered, every subsequent text SSE event is consumed byReadLineAsync()but never yielded to theSessionLlmInvokerstreaming loop. This means:SessionLlmInvokerstops dispatchingLlmResponseDeltaReceivedmessagesProcessingWatchdog.Refresh()is never calledImpact
This affects any model that falls back to text-based tool calling (e.g.,
<tool_call>XML format) after structuredtool_callsfail, and any scenario where the filter activates early in a long generation. The GPU continues generating tokens at full speed while Netclaw kills the request, wasting GPU compute.Observed in production with Gemma 4 26B-A4B via llama-server. The model generated 8,688 output tokens over ~180 seconds before being cancelled. The error surfaced as "The LLM response stream stopped unexpectedly" even though the stream was actively producing tokens the entire time.
Fix
Two changes needed:
OpenAiCompatibleChatClient: When suppressing text, yield a content-free keepaliveChatResponseUpdateinstead ofcontinue-ing past the yieldSessionLlmInvoker: When an update yields no dispatchable content, send a keepaliveLlmResponseDeltaReceivedwith emptyTextContentto refresh the watchdogThe actor's delta handler (
LlmSessionActor.cs:483-484) already calls_watchdog.Refresh()unconditionally before the content-type switch, so an emptyTextContentwill refresh the timer without emitting any visible output.Files
src/Netclaw.Providers/SelfHosted/OpenAiCompatibleChatClient.cs— suppression logic (lines 97-98)src/Netclaw.Actors/Sessions/Pipelines/SessionLlmInvoker.cs— streaming loop (lines 69-128)src/Netclaw.Actors/Sessions/LlmSessionActor.cs— delta handler (lines 480-504)src/Netclaw.Actors/Sessions/Handlers/ProcessingWatchdog.cs— timer refresh