Bug Description
In openviking/models/vlm/backends/openai_vlm.py, all four completion methods (get_completion, get_completion_async, get_vision_completion, get_vision_completion_async) accept a thinking: bool = False parameter, but this parameter is never used to set enable_thinking in the API request body.
Models like qwen3.5-plus and qwen3.5-flash (DashScope) default to thinking mode (chain-of-thought reasoning). Since enable_thinking is never disabled, every memory extraction call triggers full CoT reasoning, causing severe timeouts.
Steps to Reproduce
- Configure openviking with
qwen3.5-plus or qwen3.5-flash as the VLM model via DashScope
- Integrate with OpenClaw with
autoCapture: true
- Send a message through the channel (e.g., Feishu)
- Observe OpenClaw logs after the agent replies
Expected Behavior
When thinking=False (the default), the backend should pass extra_body={"enable_thinking": False} to the API, disabling unnecessary chain-of-thought reasoning for simple memory extraction tasks.
Actual Behavior
auto-capture failed: AbortError: This operation was aborted appears in OpenClaw logs after 15~60 seconds. The root cause is that qwen3.5-plus/flash defaults to thinking mode, and each /extract API call spends 60+ seconds on CoT reasoning before timing out.
Minimal Reproducible Example
# Fix: add to each kwargs block in openai_vlm.py before the API call
if not thinking:
kwargs["extra_body"] = {"enable_thinking": False}
Error Logs
2026-03-24T15:13:27 openviking: auto-capture failed: AbortError: This operation was aborted
2026-03-24T15:30:02 openviking: auto-capture failed: AbortError: This operation was aborted
# Timing: capture-check triggers at T+0, AbortError at T+15s (default) or T+60s (after raising timeoutMs)
OpenViking Version
0.2.9
Python Version
3.11
Operating System
Linux
Model Backend
OpenAI
Additional Context
Measured API latency on DashScope:
- qwen3.5-plus (thinking ON, default): ~9s for a simple "hello" → 60s+ for conversation extract
- qwen3.5-flash (thinking ON, default): ~5s for "hello"
- qwen-turbo (no thinking): ~0.8s
The thinking parameter already exists in the method signature — it just needs to be wired to extra_body.
Bug Description
In
openviking/models/vlm/backends/openai_vlm.py, all four completion methods (get_completion,get_completion_async,get_vision_completion,get_vision_completion_async) accept athinking: bool = Falseparameter, but this parameter is never used to setenable_thinkingin the API request body.Models like
qwen3.5-plusandqwen3.5-flash(DashScope) default to thinking mode (chain-of-thought reasoning). Sinceenable_thinkingis never disabled, every memory extraction call triggers full CoT reasoning, causing severe timeouts.Steps to Reproduce
qwen3.5-plusorqwen3.5-flashas the VLM model via DashScopeautoCapture: trueExpected Behavior
When
thinking=False(the default), the backend should passextra_body={"enable_thinking": False}to the API, disabling unnecessary chain-of-thought reasoning for simple memory extraction tasks.Actual Behavior
auto-capture failed: AbortError: This operation was abortedappears in OpenClaw logs after 15~60 seconds. The root cause is that qwen3.5-plus/flash defaults to thinking mode, and each/extractAPI call spends 60+ seconds on CoT reasoning before timing out.Minimal Reproducible Example
Error Logs
2026-03-24T15:13:27 openviking: auto-capture failed: AbortError: This operation was aborted 2026-03-24T15:30:02 openviking: auto-capture failed: AbortError: This operation was aborted # Timing: capture-check triggers at T+0, AbortError at T+15s (default) or T+60s (after raising timeoutMs)OpenViking Version
0.2.9
Python Version
3.11
Operating System
Linux
Model Backend
OpenAI
Additional Context
Measured API latency on DashScope:
The
thinkingparameter already exists in the method signature — it just needs to be wired toextra_body.