Skip to content

ToolCallTextFilter suppresses streaming keepalives, causing false StreamIdleTimeout #717

@Aaronontheweb

Description

@Aaronontheweb

Summary

The ToolCallTextFilter in OpenAiCompatibleChatClient.GetStreamingResponseAsync() suppresses SSE updates via continue when it detects <tool_call XML in streaming text content. Once the filter activates, ALL subsequent text updates are silently dropped, preventing LlmResponseDeltaReceived messages from reaching the session actor. The ProcessingWatchdog never receives Refresh() calls during suppression, causing it to fire StreamIdleTimeout (default 120s) even though the GPU is actively generating tokens at full speed.

Root Cause

In OpenAiCompatibleChatClient.cs (lines 97-98), when suppressThisUpdate is true:

if (suppressThisUpdate)
    continue;   // skips yield return — caller never sees this SSE event
yield return update;

The ToolCallTextFilter (line 717) triggers on <tool_call and is permanently latched (_suppressionActive = true). Once triggered, every subsequent text SSE event is consumed by ReadLineAsync() but never yielded to the SessionLlmInvoker streaming loop. This means:

  1. SessionLlmInvoker stops dispatching LlmResponseDeltaReceived messages
  2. The actor's ProcessingWatchdog.Refresh() is never called
  3. After 120s of no refresh, the watchdog fires and cancels the HTTP request

Impact

This affects any model that falls back to text-based tool calling (e.g., <tool_call> XML format) after structured tool_calls fail, and any scenario where the filter activates early in a long generation. The GPU continues generating tokens at full speed while Netclaw kills the request, wasting GPU compute.

Observed in production with Gemma 4 26B-A4B via llama-server. The model generated 8,688 output tokens over ~180 seconds before being cancelled. The error surfaced as "The LLM response stream stopped unexpectedly" even though the stream was actively producing tokens the entire time.

Fix

Two changes needed:

  1. OpenAiCompatibleChatClient: When suppressing text, yield a content-free keepalive ChatResponseUpdate instead of continue-ing past the yield
  2. SessionLlmInvoker: When an update yields no dispatchable content, send a keepalive LlmResponseDeltaReceived with empty TextContent to refresh the watchdog

The actor's delta handler (LlmSessionActor.cs:483-484) already calls _watchdog.Refresh() unconditionally before the content-type switch, so an empty TextContent will refresh the timer without emitting any visible output.

Files

  • src/Netclaw.Providers/SelfHosted/OpenAiCompatibleChatClient.cs — suppression logic (lines 97-98)
  • src/Netclaw.Actors/Sessions/Pipelines/SessionLlmInvoker.cs — streaming loop (lines 69-128)
  • src/Netclaw.Actors/Sessions/LlmSessionActor.cs — delta handler (lines 480-504)
  • src/Netclaw.Actors/Sessions/Handlers/ProcessingWatchdog.cs — timer refresh

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions