fix(gateway,xiaomi): reasoning_content echo-back for MiMo + prevent infinite retry#24784
fix(gateway,xiaomi): reasoning_content echo-back for MiMo + prevent infinite retry#24784wanxinhao wants to merge 1 commit into
Conversation
|
Partial overlap with #24465 (earliest open PR for Xiaomi MiMo reasoning_content replay). The reasoning_content echo-back portion of this PR (run_agent.py + anthropic_adapter.py) competes with #24465, #24603, #24737. However, this PR additionally fixes a gateway infinite retry loop on stale reasoning_content 400 errors (gateway/run.py) which is net-new. Related: #24443 (tracking issue), #24401 (cross-provider thinking block strip failure) |
When MiMo reasoning_content echo-back returns HTTP 400, gateway/run.py treated it as a transient error and retried indefinitely. This fix classifies reasoning_content-related 400s as non-recoverable errors and auto-resets the stale session instead of looping. Related: NousResearch#24443 (tracking issue), NousResearch#24401
Update: Narrowed scope to gateway-only fixBased on triage feedback from @alt-glitch, I've stripped the reasoning_content echo-back changes from this PR (run_agent.py + agent/anthropic_adapter.py) since they overlap with #24465, #24603, and #24737. This PR now contains only the gateway/run.py fix: preventing infinite retry loops when reasoning_content 400 errors hit stale sessions. The core reasoning_content replay fix is covered by #24603 (most complete implementation with 101 tests). This PR is complementary — it handles the gateway-level fallout that #24603 doesn't address. Suggested merge order: #24603 first, then this PR. |
2bfa28b to
310e70f
Compare
Summary
Xiaomi MiMo's Anthropic-compatible
/anthropicendpoint requiresreasoning_contenton every assistant message when thinking mode is enabled. Omitting it causes HTTP 400:This PR adds three fixes:
1.
run_agent.py— Xiaomi reasoning pad detectionAdded
_needs_xiaomi_tool_reasoning()matching providerxiaomi/mimo/xiaomi-mimoorbase_urlcontainingxiaomimimo.com. Included in_needs_thinking_reasoning_pad()so the padding logic applies to MiMo tool-call replays (same pattern as DeepSeek #15250 and Kimi #17400).2.
agent/anthropic_adapter.py— Anthropic adapter compatibility_is_xiaomi_anthropic_endpoint()function_preserve_unsigned_thinkingwhitelist so unsigned thinking blocks survive signature stripping on third-party endpointsreasoning_content→ thinking block insertion to always insert even when_already_has_thinkingis True, because signed blocks fromreasoning_detailswill be stripped by the third-party endpoint code below3.
gateway/run.py— Prevent infinite retry loopAdded
is_reasoning_echo_failuredetection matching"reasoning_content","thinking mode","must be passed back"in error messages. When triggered:compression_exhausted)Without this, the gateway enters an infinite retry loop on the same stale session history that always produces the same 400 error. Observed: a session retried for 2+ hours hitting the same error every time.
Testing
Related