Skip to content

Auto-mode classifier times out too easily; loosen stage timeouts and disable thinking in all stages #4676

@qqqys

Description

@qqqys

Summary

In AUTO approval mode, the two-stage LLM classifier (packages/core/src/permissions/classifier.ts) fails closed on any timeout — a timed-out judge call returns shouldBlock=true, unavailable=true and the action is blocked as an "infrastructure failure". The current stage timeouts are quite aggressive and, in practice, are easy to trip on slow networks or long transcripts, causing legitimate actions to be spuriously blocked.

Current behavior

packages/core/src/permissions/classifier.ts:37-39

export const STAGE1_TIMEOUT_MS = 3_000;   // fast stage
export const STAGE2_TIMEOUT_MS = 10_000;  // thinking/review stage

Both stages wrap the side query with AbortSignal.timeout(...). On timeout the request aborts and failClosed() / the stage-2 catch path returns an unavailable block. There is:

  • no per-fetch timeout separate from the overall stage budget,
  • no stall watchdog / retry headroom beyond runSideQuery's maxAttempts: 2,
  • a single fixed budget regardless of transcript size.

3s / 10s is tight when the classifier call includes a large transcript or the network is slow; the user then sees the action blocked with "classifier unavailable" even though nothing is actually wrong with the action.

Request 1 — loosen the stage timeouts

Please consider raising the stage budgets (and/or adding a separate per-fetch timeout + a non-aborting stall log) so that a slow-but-healthy classifier call is not treated as a hard block. Comparable auto-mode classifiers in the same design lineage use far more generous budgets (tens of seconds for the fast stage, ~2 min for the review stage, plus a per-fetch timeout and retries). The exact numbers can be tuned, but 3s/10s appears to be the source of avoidable false blocks.

Request 2 — disable thinking in ALL stages

packages/core/src/permissions/classifier.ts:222 currently enables thoughts in stage 2:

config: {
  temperature: 0,
  maxOutputTokens: 4096,
  thinkingConfig: { includeThoughts: true }, // stage 2
},

Stage 1 already sets includeThoughts: false. For a latency-sensitive permission gate, thinking should be disabled in every stage — enabling it on stage 2 makes the review path slower and more expensive, which directly worsens the timeout problem above. The model can still write its reasoning into the thinking field of the structured output without a reasoning budget being allocated.

Suggested direction

  1. Raise STAGE1_TIMEOUT_MS / STAGE2_TIMEOUT_MS to more forgiving values, and optionally add an independent per-request fetch timeout + retry/stall handling so a transient slow call doesn't fail closed.
  2. Set thinkingConfig: { includeThoughts: false } in stage 2 as well, so thinking is off across the board.
  3. (Optional) Make the timeouts configurable or scale them with transcript size, since the budget needs differ a lot between a tiny tool call and a large transcript.

Impact

Reduces spurious "Auto mode classifier unavailable; action blocked for safety" blocks that interrupt otherwise-valid AUTO-mode sessions, and makes the judge cheaper/faster by not allocating a thinking budget on the review stage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    category/coreCore engine and logiccategory/performancePerformance and optimizationpriority/P2Medium - Moderately impactful, noticeable problemscope/latencyResponse time optimizationtype/bugSomething isn't working as expectedwelcome-pr

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions