What happened?
Every user turn is delayed by close to 5 seconds before the main model sees the prompt, because the auto-memory recall selector is awaited on the request path and consistently times out.
The selector is fired off early as a non-blocking promise so that other prep work (history compaction, IDE context gathering, etc.) can run concurrently. Just before the main request is sent, the code awaits the recall promise. When the recall does not finish within the prep window — which is the situation in this issue — the main query is delayed by roughly 5s − prepDuration. On a typical turn where prep takes a few hundred milliseconds, that is close to a full 5-second per-turn slowdown. The user perceives every prompt as starting late.
The underlying error is an AbortError: This operation was aborted from the side query's own 5-second deadline:
"error": {
"message": "This operation was aborted",
"stack": "AbortError: This operation was aborted
...
at AbortSignal.<anonymous> (file:///.../@qwen-code/qwen-code/cli.js:155551:61)
...
at Timeout._onTimeout (node:internal/abort_controller:139:7)
at listOnTimeout (node:internal/timers:594:17)"
}
The abort itself is swallowed by a .catch — there is no error in the UI — but the await still waits for the timer to fire. So the failure mode is silent and consistent: every turn looks slow, and the recall feature also silently degrades to "no memories surfaced" since the selector never produces a result.
What did you expect to happen?
The recall selector should complete within its budget on a typical setup so that the main agent is not delayed by the recall path on every turn.
Client information
Client Information
$ qwen /about
# will provide on request
Login information
API Key (OpenAI-compatible, DeepSeek). Main model: deepseek-v4-pro.
Anything else we need to know?
Root of the problem
The recall selector currently runs against the main session model with a 5-second deadline. On a reasoning-heavy main model like deepseek-v4-pro, that budget is not realistic — thinking tokens alone can exceed it before any structured output is produced — so the deadline fires every turn. The user pays the latency penalty regardless of whether the recall feature would have helped on that turn.
Proposed solution: route this side query through fastModel
The recall selector is a small, schema-constrained, latency-sensitive classification task — exactly the shape of work that fastModel was introduced for. Other background tasks (auto session titles, tool-use summaries, kebab-case rename) already use fastModel; the recall selector appears to have been overlooked.
The proposed change is to have the selector prefer fastModel when the user has configured one (via /model --fast <id> or settings.json), and fall back to the main model when it is unset. Users who already have a fast model configured see the per-turn latency penalty disappear immediately and get reliable recall results. Users who have not continue on today's behavior with no regression, and can opt in by setting fastModel.
This is preferred over simply raising the 5-second deadline: the heavy main model is the wrong tool for this workload, and a longer deadline would only deepen the per-turn latency penalty when the call eventually succeeds.
What happened?
Every user turn is delayed by close to 5 seconds before the main model sees the prompt, because the auto-memory recall selector is awaited on the request path and consistently times out.
The selector is fired off early as a non-blocking promise so that other prep work (history compaction, IDE context gathering, etc.) can run concurrently. Just before the main request is sent, the code awaits the recall promise. When the recall does not finish within the prep window — which is the situation in this issue — the main query is delayed by roughly
5s − prepDuration. On a typical turn where prep takes a few hundred milliseconds, that is close to a full 5-second per-turn slowdown. The user perceives every prompt as starting late.The underlying error is an
AbortError: This operation was abortedfrom the side query's own 5-second deadline:The abort itself is swallowed by a
.catch— there is no error in the UI — but theawaitstill waits for the timer to fire. So the failure mode is silent and consistent: every turn looks slow, and the recall feature also silently degrades to "no memories surfaced" since the selector never produces a result.What did you expect to happen?
The recall selector should complete within its budget on a typical setup so that the main agent is not delayed by the recall path on every turn.
Client information
Client Information
Login information
API Key (OpenAI-compatible, DeepSeek). Main model:
deepseek-v4-pro.Anything else we need to know?
Root of the problem
The recall selector currently runs against the main session model with a 5-second deadline. On a reasoning-heavy main model like
deepseek-v4-pro, that budget is not realistic — thinking tokens alone can exceed it before any structured output is produced — so the deadline fires every turn. The user pays the latency penalty regardless of whether the recall feature would have helped on that turn.Proposed solution: route this side query through
fastModelThe recall selector is a small, schema-constrained, latency-sensitive classification task — exactly the shape of work that
fastModelwas introduced for. Other background tasks (auto session titles, tool-use summaries, kebab-case rename) already usefastModel; the recall selector appears to have been overlooked.The proposed change is to have the selector prefer
fastModelwhen the user has configured one (via/model --fast <id>orsettings.json), and fall back to the main model when it is unset. Users who already have a fast model configured see the per-turn latency penalty disappear immediately and get reliable recall results. Users who have not continue on today's behavior with no regression, and can opt in by settingfastModel.This is preferred over simply raising the 5-second deadline: the heavy main model is the wrong tool for this workload, and a longer deadline would only deepen the per-turn latency penalty when the call eventually succeeds.