Skip to content

fix(pipeline): prevent channels from getting stuck in processing status#1530

Merged
looplj merged 2 commits into
looplj:unstablefrom
djdembeck:fix/pipeline-stuck-processing-status
Apr 29, 2026
Merged

fix(pipeline): prevent channels from getting stuck in processing status#1530
looplj merged 2 commits into
looplj:unstablefrom
djdembeck:fix/pipeline-stuck-processing-status

Conversation

@djdembeck

@djdembeck djdembeck commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Requests can get permanently stuck in "processing" status, causing the system to failover to a different channel without giving the first one a chance to complete.

Introduced by #1503 (per-channel concurrency queue + admission control).

Bug Behavior

  • Requests appear stuck in "processing" status and never transition to completed/failed
  • System immediately failovers to a different channel
  • User sees: channel A stuck saying processing, request handled by channel B
  • Concurrency of 15 should not trigger this, but hard-mode queueing (QueueSize > 0) can when the queue is at capacity

Root Cause

When applyRawRequestMiddlewares fails partway through (e.g. withChannelLimiter returns ChannelQueueError after persistRequestExecution has already run), the pipeline returns the error without calling applyRawErrorResponseMiddlewares. This means persistRequestExecution.OnOutboundRawError is never invoked, leaving the request execution stuck in "processing" forever.

Before #1503, no middleware after persistRequestExecution could fail in OnOutboundRawRequest, so this was a latent bug. The new withChannelLimiter middleware (positioned after persistRequestExecution) can return ChannelQueueError, which now triggers it.

The same latent bug existed in notStream and stream for non-executor error paths (e.g. TransformResponse failure, empty response detection).

Key Changes

  • pipeline.go: Call applyRawErrorResponseMiddlewares when applyRawRequestMiddlewares fails
  • non_streaming.go: Add cleanup calls on all non-executor error paths (raw response, transform, LLM response, empty response, inbound transform, inbound raw response)
  • stream.go: Add cleanup calls on all non-executor error paths + save original stream references before wrapping to prevent nil-pointer panics when closing on error. Also fix stream leak in checkEmptyResponse when llmStream.Err() returns non-nil.
  • middleware_test.go: Regression test verifying already-executed middlewares receive OnOutboundRawError when a later middleware fails

Risks

All OnOutboundRawError handlers are idempotent (nil checks, sync.Once guards), so calling them after a later middleware fails is safe.

Known limitation (P2, pre-existing)

If applyRawStreamMiddlewares iterates multiple wrapping middlewares and a later one fails after an earlier one has already wrapped the stream, only the innermost (pre-wrap) stream is closed. Intermediate wrapped streams from successfully-run middlewares are not cleaned up. This is rare (requires ≥2 raw stream middlewares where one fails mid-chain) and pre-existing — not introduced by this PR. A proper fix would require tracking intermediate streams in the middleware application loop, which is a separate refactor.

…stuck processing status

When applyRawRequestMiddlewares fails partway through (e.g. withChannelLimiter
returns ChannelQueueError after persistRequestExecution has run), the pipeline
returned the error without calling applyRawErrorResponseMiddlewares. This left
request executions stuck in "processing" forever because
persistRequestExecution.OnOutboundRawError was never invoked.

The same latent bug existed in notStream and stream for non-executor error
paths. In stream, additional care was needed to save original stream references
before middleware wrapping, since middleware functions return nil on error and
calling Close() on nil would panic.

Add applyRawErrorResponseMiddlewares calls on all error paths:
- pipeline.go: applyRawRequestMiddlewares failure
- non_streaming.go: raw response, transform, LLM response, empty response,
  inbound transform, and inbound raw response middlewares failures
- stream.go: raw stream, transform stream, LLM stream, empty response,
  inbound transform, and inbound raw stream middlewares failures

All OnOutboundRawError handlers are idempotent (nil checks, sync.Once guards)
so calling them after a later middleware fails is safe.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances error handling and resource management within the LLM pipeline by ensuring that error response middlewares are consistently invoked across all failure paths and that streams are properly closed to prevent leaks. A potential resource leak was identified in the streaming pipeline where a stream could remain open if checkEmptyResponse returns an error, as the reference to the stream is overwritten before it can be closed; a suggestion was provided to track and close the stream in this scenario.

Comment thread llm/pipeline/stream.go
@greptile-apps

greptile-apps Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a bug where requests could get permanently stuck in "processing" status when applyRawRequestMiddlewares failed after persistRequestExecution had already run — introduced by the per-channel concurrency queue in #1503. The fix propagates applyRawErrorResponseMiddlewares to all non-executor error paths in pipeline.go, non_streaming.go, and stream.go, ensuring middleware cleanup (e.g., marking executions as failed) happens on every failure branch.

Confidence Score: 5/5

Safe to merge; the only finding is a redundant double-close in the checkEmptyResponse error path that is benign with current stream implementations.

The core fix is correct and well-targeted. All remaining findings are P2: the double-close in stream.go is redundant (not harmful with standard Go streams) and the test's unconditional OnOutboundRawError notification for un-executed middlewares is intentional and consistent with the idempotency guarantee stated in the PR description.

llm/pipeline/stream.go — specifically the checkEmptyResponse error path where rawLlmStream.Close() is called after checkEmptyResponse has already closed the stream internally.

Important Files Changed

Filename Overview
llm/pipeline/pipeline.go Adds applyRawErrorResponseMiddlewares call when applyRawRequestMiddlewares fails, fixing the root cause where stuck-processing requests were not cleaned up on admission-control rejection.
llm/pipeline/non_streaming.go Adds applyRawErrorResponseMiddlewares on all six non-executor error paths (raw response, TransformResponse, LLM response middlewares, empty response, inbound TransformResponse, inbound raw response), ensuring middleware cleanup on every failure branch.
llm/pipeline/stream.go Adds stream close + applyRawErrorResponseMiddlewares on all non-executor error paths and saves pre-wrap stream references to avoid nil-pointer panics; however the checkEmptyResponse error path closes the same stream twice since checkEmptyResponse already closes internally.
llm/pipeline/middleware_test.go Adds regression test TestMiddleware_RawRequest_Error_CleanupMiddlewares verifying that already-executed middlewares receive OnOutboundRawError when a later middleware fails, and that even un-executed middlewares receive the callback.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant PR as processRequest
    participant MW as applyRawRequestMiddlewares
    participant NS as notStream / stream
    participant EM as applyRawErrorResponseMiddlewares

    C->>PR: Process(ctx, request)
    PR->>MW: applyRawRequestMiddlewares(ctx, req)
    alt Middleware fails (e.g. ChannelQueueError)
        MW-->>PR: err
        PR->>EM: applyRawErrorResponseMiddlewares(ctx, err) NEW
        PR-->>C: error
    else Middleware succeeds
        MW-->>PR: httpReq
        PR->>NS: notStream / stream
        alt Executor fails
            NS->>EM: applyRawErrorResponseMiddlewares (pre-existing)
            NS-->>PR: error
        else Post-executor step fails (transform, middlewares, empty response)
            NS->>EM: applyRawErrorResponseMiddlewares(ctx, err) NEW
            NS-->>PR: error
        else All steps succeed
            NS-->>PR: result
            PR-->>C: result
        end
    end
Loading

Reviews (2): Last reviewed commit: "fix(pipeline): close stream on checkEmpt..." | Re-trigger Greptile

Comment thread llm/pipeline/stream.go
Comment thread llm/pipeline/middleware_test.go Outdated
checkEmptyResponse did not close the llmStream when llmStream.Err()
returned a non-nil error, and the caller in stream() lost the reference
to the original stream. Save the stream before calling checkEmptyResponse
and close it on error, matching the save-before-wrap pattern used
elsewhere in stream().

Also add assertion message for unexecuted-middleware notification
invariant in the regression test.
@looplj looplj merged commit 99d5b0d into looplj:unstable Apr 29, 2026
4 checks passed
@djdembeck djdembeck deleted the fix/pipeline-stuck-processing-status branch April 29, 2026 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants