fix(conversation_loop): NameError on rate limit — route _pool_may_recover_from_rate_limit through _ra() (#27465)#27468
Conversation
…gh _ra() A bare ``_pool_may_recover_from_rate_limit(...)`` reference in the rate-limit branch of the conversation loop raised ``NameError`` on every 429 from a provider, because the function lives in ``run_agent`` and is not imported at module scope in ``agent/conversation_loop.py``. The crash short-circuited the eager-fallback path so users hitting a provider quota (Ollama Cloud weekly cap, etc.) got the NameError traceback instead of automatic failover to the configured next provider. The fix is one character of intent: route the call through the existing lazy ``_ra()`` accessor that the rest of the module uses for ``run_agent`` symbols (``_ra().handle_function_call``, ``_ra().AIAgent``, ``_ra()._set_interrupt``). This matches the documented "patch-friendly module reference" idiom in the ``_ra()`` docstring at line 76 -- callers that monkeypatch ``run_agent.X`` will see the patched value here too. Fixes NousResearch#27465.
…hape Add a regression suite specifically for NousResearch#27465 covering two layers: 1. Source invariant -- a regex check that no bare ``_pool_may_recover_from_rate_limit(`` call survives in ``agent/conversation_loop.py``, plus a positive assertion that the surviving call goes through ``_ra()._pool_may_recover_from_rate_limit(``. This is what catches a future refactor that drops the prefix again, without needing to drive the full retry loop into the 429 branch. 2. Symbol resolution -- exercise ``conversation_loop._ra()._pool_may_recover_from_rate_limit`` end-to-end against the same MagicMock-based pool shape used by the canonical tests in ``test_provider_fallback.py`` so the contract (None -> False, single-credential -> False, multi-credential available -> True, cooldown -> False, cloudcode-pa:// -> False, keyword args -> bool) can't drift independently. 11 new tests, no production code touched.
|
ddalggak review — superseded conclusion. Blocking first: this is a duplicate implementation of the #27370/#27402/#27465 rate-limit NameError fix, and current Evidence checked:
Recommendation:
|
Closing as fixed-on-mainThanks for the patch — appreciated! Closing as redundant: the same one-line Several contributors converged on the same fix here (#27370, #27465, #27468, #27583, #27686, #27732, #27734, #27735, #27750, #27891, #27903, #28304) — your diagnosis was correct in every case. Sorry for the duplicate-work cleanup; the volume of independent reports made it hard to coordinate. |
What does this PR do?
Fixes the
NameError: name '_pool_may_recover_from_rate_limit' is not definedcrash that hit every 429 from a provider in v0.14.0.Root cause:
agent/conversation_loop.pyat line 2254 called the helper as a bare name:…but the function is defined in
run_agent.pyand is not imported at module scope inconversation_loop.py. The result: instead of falling through to the configured fallback provider, every rate-limit error blew up with theNameErrortraceback. Reproduces deterministically against any provider that returns 429 (e.g. Ollama Cloud weekly quota exhaustion, as reported).The fix: route the call through the existing
_ra()lazy accessor at the top ofconversation_loop.pythat the rest of the module already uses forrun_agentsymbols (_ra().handle_function_call,_ra().AIAgent,_ra()._set_interrupt, etc.). This matches the documented "patch-friendly module reference" idiom in_ra()'s docstring at line 76 — callers thatmonkeypatch.setattr(run_agent, ...)continue to see the patched value here too.One-character intent change, plus a regression-test suite specifically pinned at the two layers that would have caught this before it shipped.
Related Issue
Closes #27465.
Type of Change
Changes Made
Production fix (
agent/conversation_loop.py, +7 / −1):_pool_may_recover_from_rate_limit(...)with_ra()._pool_may_recover_from_rate_limit(...)at the only call site, matching the existing pattern at lines 487/489/3699/3748 (_ra()._set_interrupt,_ra().AIAgent,_ra().handle_function_call).Regression tests (
tests/agent/test_conversation_loop_pool_recover_reference.py, new file, +198):11 tests across two layers.
Source-invariant layer (catches the original bug shape):
test_conversation_loop_source_exists— sanity check.test_no_bare_pool_may_recover_call_in_conversation_loop— regex pins that no unqualified_pool_may_recover_from_rate_limit(call survives in the file. This is the strongest regression guard — it would have failed on the v0.14.0 release commit.test_call_site_uses_lazy_accessor— positive assertion that at least one_ra()._pool_may_recover_from_rate_limit(call exists, so a future refactor that drops the call entirely (or accidentally renames it) is also caught.test_conversation_loop_compiles_cleanly—ast.parsesmoke test.Symbol-resolution layer (catches API drift):
test_ra_accessor_resolves_pool_may_recover—_ra()._pool_may_recover_from_rate_limitis callable and stable across calls (no stale module binding).test_ra_accessor_pool_may_recover_none_pool_returns_falsetest_ra_accessor_pool_may_recover_single_credential_returns_false— pin theentries() <= 1short-circuit ([Feature]: 遇到http 529错误的时候,应该尝试切换成fallback的模型 #11314 contract).test_ra_accessor_pool_may_recover_multi_credential_returns_truetest_ra_accessor_pool_may_recover_cooled_down_returns_falsetest_ra_accessor_pool_may_recover_handles_keyword_args— the call site passesprovider=andbase_url=; pin the kwargs surface so a future rename inrun_agentdoesn't silently break the in-loop call with aTypeError(same retry branch as the originalNameError).test_ra_accessor_pool_may_recover_cloudcode_returns_false— pin thecloudcode-pa://exception path (Prefer fallback for Gemini CloudCode rate limits #13636 contract).The MagicMock-based
_fake_poolfixture mirrors the canonical helper intests/run_agent/test_provider_fallback.py::TestPoolRotationRoomso the pool contract can't drift independently between the two test files.How to Test
_pool_may_recover_from_rate_limitdirect-test suite broke:test_gemini_fast_fallback.py).agent/conversation_loop.pyimports cleanly and the lazy accessor resolves:monkeypatch.setattrin a unit test to raise a 429-style error fromclient.chat.completions.create). Pre-fix the agent crashes withNameError. Post-fix the loop either falls back to the configuredfallback_model(when_pool_may_recover_from_rate_limitreturnsFalse) or rotates within the credential pool (when it returnsTrue).The pre-existing
test_provider_fallback.py::TestFallbackChainAdvancement/TestFallbackChainDedupfailures (16/22) reproduce identically onorigin/mainwithgit stashof these changes — they are unrelated and not introduced by this PR.Checklist
Code
fix(conversation_loop):,test(conversation_loop):)scripts/run_tests.shand all relevant tests pass (11/11 new, no regressions on the related rate-limit test files)Documentation & Housekeeping
cli-config.yaml.exampleif I added/changed config keys — N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/AScreenshots / Logs
Before / after the call site: