Auto-mode classifier times out too easily; loosen stage timeouts and disable thinking in all stages

### Summary

In AUTO approval mode, the two-stage LLM classifier (`packages/core/src/permissions/classifier.ts`) fails **closed** on any timeout — a timed-out judge call returns `shouldBlock=true, unavailable=true` and the action is blocked as an "infrastructure failure". The current stage timeouts are quite aggressive and, in practice, are easy to trip on slow networks or long transcripts, causing legitimate actions to be spuriously blocked.

### Current behavior

`packages/core/src/permissions/classifier.ts:37-39`

```ts
export const STAGE1_TIMEOUT_MS = 3_000;   // fast stage
export const STAGE2_TIMEOUT_MS = 10_000;  // thinking/review stage
```

Both stages wrap the side query with `AbortSignal.timeout(...)`. On timeout the request aborts and `failClosed()` / the stage-2 catch path returns an `unavailable` block. There is:
- no per-fetch timeout separate from the overall stage budget,
- no stall watchdog / retry headroom beyond `runSideQuery`'s `maxAttempts: 2`,
- a single fixed budget regardless of transcript size.

3s / 10s is tight when the classifier call includes a large transcript or the network is slow; the user then sees the action blocked with "classifier unavailable" even though nothing is actually wrong with the action.

### Request 1 — loosen the stage timeouts

Please consider raising the stage budgets (and/or adding a separate per-fetch timeout + a non-aborting stall log) so that a slow-but-healthy classifier call is not treated as a hard block. Comparable auto-mode classifiers in the same design lineage use far more generous budgets (tens of seconds for the fast stage, ~2 min for the review stage, plus a per-fetch timeout and retries). The exact numbers can be tuned, but 3s/10s appears to be the source of avoidable false blocks.

### Request 2 — disable thinking in ALL stages

`packages/core/src/permissions/classifier.ts:222` currently enables thoughts in stage 2:

```ts
config: {
  temperature: 0,
  maxOutputTokens: 4096,
  thinkingConfig: { includeThoughts: true }, // stage 2
},
```

Stage 1 already sets `includeThoughts: false`. For a latency-sensitive permission gate, **thinking should be disabled in every stage** — enabling it on stage 2 makes the review path slower and more expensive, which directly worsens the timeout problem above. The model can still write its reasoning into the `thinking` field of the structured output without a reasoning budget being allocated.

### Suggested direction

1. Raise `STAGE1_TIMEOUT_MS` / `STAGE2_TIMEOUT_MS` to more forgiving values, and optionally add an independent per-request fetch timeout + retry/stall handling so a transient slow call doesn't fail closed.
2. Set `thinkingConfig: { includeThoughts: false }` in stage 2 as well, so thinking is off across the board.
3. (Optional) Make the timeouts configurable or scale them with transcript size, since the budget needs differ a lot between a tiny tool call and a large transcript.

### Impact

Reduces spurious "Auto mode classifier unavailable; action blocked for safety" blocks that interrupt otherwise-valid AUTO-mode sessions, and makes the judge cheaper/faster by not allocating a thinking budget on the review stage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-mode classifier times out too easily; loosen stage timeouts and disable thinking in all stages #4676

Summary

Current behavior

Request 1 — loosen the stage timeouts

Request 2 — disable thinking in ALL stages

Suggested direction

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Auto-mode classifier times out too easily; loosen stage timeouts and disable thinking in all stages #4676

Description

Summary

Current behavior

Request 1 — loosen the stage timeouts

Request 2 — disable thinking in ALL stages

Suggested direction

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions