Skip to content

Auto-memory recall blocks every user turn for 5s before timing out #3759

@tanzhenxin

Description

@tanzhenxin

What happened?

Every user turn is delayed by close to 5 seconds before the main model sees the prompt, because the auto-memory recall selector is awaited on the request path and consistently times out.

The selector is fired off early as a non-blocking promise so that other prep work (history compaction, IDE context gathering, etc.) can run concurrently. Just before the main request is sent, the code awaits the recall promise. When the recall does not finish within the prep window — which is the situation in this issue — the main query is delayed by roughly 5s − prepDuration. On a typical turn where prep takes a few hundred milliseconds, that is close to a full 5-second per-turn slowdown. The user perceives every prompt as starting late.

The underlying error is an AbortError: This operation was aborted from the side query's own 5-second deadline:

"error": {
  "message": "This operation was aborted",
  "stack": "AbortError: This operation was aborted
    ...
    at AbortSignal.<anonymous> (file:///.../@qwen-code/qwen-code/cli.js:155551:61)
    ...
    at Timeout._onTimeout (node:internal/abort_controller:139:7)
    at listOnTimeout (node:internal/timers:594:17)"
}

The abort itself is swallowed by a .catch — there is no error in the UI — but the await still waits for the timer to fire. So the failure mode is silent and consistent: every turn looks slow, and the recall feature also silently degrades to "no memories surfaced" since the selector never produces a result.

What did you expect to happen?

The recall selector should complete within its budget on a typical setup so that the main agent is not delayed by the recall path on every turn.

Client information

Client Information
$ qwen /about
# will provide on request

Login information

API Key (OpenAI-compatible, DeepSeek). Main model: deepseek-v4-pro.

Anything else we need to know?

Root of the problem

The recall selector currently runs against the main session model with a 5-second deadline. On a reasoning-heavy main model like deepseek-v4-pro, that budget is not realistic — thinking tokens alone can exceed it before any structured output is produced — so the deadline fires every turn. The user pays the latency penalty regardless of whether the recall feature would have helped on that turn.

Proposed solution: route this side query through fastModel

The recall selector is a small, schema-constrained, latency-sensitive classification task — exactly the shape of work that fastModel was introduced for. Other background tasks (auto session titles, tool-use summaries, kebab-case rename) already use fastModel; the recall selector appears to have been overlooked.

The proposed change is to have the selector prefer fastModel when the user has configured one (via /model --fast <id> or settings.json), and fall back to the main model when it is unset. Users who already have a fast model configured see the per-turn latency penalty disappear immediately and get reliable recall results. Users who have not continue on today's behavior with no regression, and can opt in by setting fastModel.

This is preferred over simply raising the 5-second deadline: the heavy main model is the wrong tool for this workload, and a longer deadline would only deepen the per-turn latency penalty when the call eventually succeeds.

Metadata

Metadata

Assignees

Labels

status/needs-triageIssue needs to be triaged and labeledtype/bugSomething isn't working as expected

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions