ToolCallTextFilter suppresses streaming keepalives, causing false StreamIdleTimeout

## Summary

The `ToolCallTextFilter` in `OpenAiCompatibleChatClient.GetStreamingResponseAsync()` suppresses SSE updates via `continue` when it detects `<tool_call` XML in streaming text content. Once the filter activates, **ALL subsequent text updates are silently dropped**, preventing `LlmResponseDeltaReceived` messages from reaching the session actor. The `ProcessingWatchdog` never receives `Refresh()` calls during suppression, causing it to fire `StreamIdleTimeout` (default 120s) even though the GPU is actively generating tokens at full speed.

## Root Cause

In `OpenAiCompatibleChatClient.cs` (lines 97-98), when `suppressThisUpdate` is true:

```csharp
if (suppressThisUpdate)
    continue;   // skips yield return — caller never sees this SSE event
yield return update;
```

The `ToolCallTextFilter` (line 717) triggers on `<tool_call` and is permanently latched (`_suppressionActive = true`). Once triggered, every subsequent text SSE event is consumed by `ReadLineAsync()` but never yielded to the `SessionLlmInvoker` streaming loop. This means:

1. `SessionLlmInvoker` stops dispatching `LlmResponseDeltaReceived` messages
2. The actor's `ProcessingWatchdog.Refresh()` is never called
3. After 120s of no refresh, the watchdog fires and cancels the HTTP request

## Impact

This affects any model that falls back to text-based tool calling (e.g., `<tool_call>` XML format) after structured `tool_calls` fail, and any scenario where the filter activates early in a long generation. The GPU continues generating tokens at full speed while Netclaw kills the request, wasting GPU compute.

Observed in production with Gemma 4 26B-A4B via llama-server. The model generated 8,688 output tokens over ~180 seconds before being cancelled. The error surfaced as "The LLM response stream stopped unexpectedly" even though the stream was actively producing tokens the entire time.

## Fix

Two changes needed:

1. **`OpenAiCompatibleChatClient`**: When suppressing text, yield a content-free keepalive `ChatResponseUpdate` instead of `continue`-ing past the yield
2. **`SessionLlmInvoker`**: When an update yields no dispatchable content, send a keepalive `LlmResponseDeltaReceived` with empty `TextContent` to refresh the watchdog

The actor's delta handler (`LlmSessionActor.cs:483-484`) already calls `_watchdog.Refresh()` unconditionally before the content-type switch, so an empty `TextContent` will refresh the timer without emitting any visible output.

## Files

- `src/Netclaw.Providers/SelfHosted/OpenAiCompatibleChatClient.cs` — suppression logic (lines 97-98)
- `src/Netclaw.Actors/Sessions/Pipelines/SessionLlmInvoker.cs` — streaming loop (lines 69-128)
- `src/Netclaw.Actors/Sessions/LlmSessionActor.cs` — delta handler (lines 480-504)
- `src/Netclaw.Actors/Sessions/Handlers/ProcessingWatchdog.cs` — timer refresh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ToolCallTextFilter suppresses streaming keepalives, causing false StreamIdleTimeout #717

Summary

Root Cause

Impact

Fix

Files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ToolCallTextFilter suppresses streaming keepalives, causing false StreamIdleTimeout #717

Description

Summary

Root Cause

Impact

Fix

Files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions