Skip to content

fix(conversation_loop): NameError on rate limit — route _pool_may_recover_from_rate_limit through _ra() (#27465)#27468

Closed
xxxigm wants to merge 2 commits into
NousResearch:mainfrom
xxxigm:fix/27465-pool-may-recover-nameerror
Closed

fix(conversation_loop): NameError on rate limit — route _pool_may_recover_from_rate_limit through _ra() (#27465)#27468
xxxigm wants to merge 2 commits into
NousResearch:mainfrom
xxxigm:fix/27465-pool-may-recover-nameerror

Conversation

@xxxigm

@xxxigm xxxigm commented May 17, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes the NameError: name '_pool_may_recover_from_rate_limit' is not defined crash that hit every 429 from a provider in v0.14.0.

Root cause: agent/conversation_loop.py at line 2254 called the helper as a bare name:

pool_may_recover = _pool_may_recover_from_rate_limit(
    agent._credential_pool,
    provider=agent.provider,
    base_url=getattr(agent, "base_url", None),
)

…but the function is defined in run_agent.py and is not imported at module scope in conversation_loop.py. The result: instead of falling through to the configured fallback provider, every rate-limit error blew up with the NameError traceback. Reproduces deterministically against any provider that returns 429 (e.g. Ollama Cloud weekly quota exhaustion, as reported).

The fix: route the call through the existing _ra() lazy accessor at the top of conversation_loop.py that the rest of the module already uses for run_agent symbols (_ra().handle_function_call, _ra().AIAgent, _ra()._set_interrupt, etc.). This matches the documented "patch-friendly module reference" idiom in _ra()'s docstring at line 76 — callers that monkeypatch.setattr(run_agent, ...) continue to see the patched value here too.

One-character intent change, plus a regression-test suite specifically pinned at the two layers that would have caught this before it shipped.

Related Issue

Closes #27465.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

Production fix (agent/conversation_loop.py, +7 / −1):

Regression tests (tests/agent/test_conversation_loop_pool_recover_reference.py, new file, +198):

11 tests across two layers.

Source-invariant layer (catches the original bug shape):

  • test_conversation_loop_source_exists — sanity check.
  • test_no_bare_pool_may_recover_call_in_conversation_loop — regex pins that no unqualified _pool_may_recover_from_rate_limit( call survives in the file. This is the strongest regression guard — it would have failed on the v0.14.0 release commit.
  • test_call_site_uses_lazy_accessor — positive assertion that at least one _ra()._pool_may_recover_from_rate_limit( call exists, so a future refactor that drops the call entirely (or accidentally renames it) is also caught.
  • test_conversation_loop_compiles_cleanlyast.parse smoke test.

Symbol-resolution layer (catches API drift):

  • test_ra_accessor_resolves_pool_may_recover_ra()._pool_may_recover_from_rate_limit is callable and stable across calls (no stale module binding).
  • test_ra_accessor_pool_may_recover_none_pool_returns_false
  • test_ra_accessor_pool_may_recover_single_credential_returns_false — pin the entries() <= 1 short-circuit ([Feature]: 遇到http 529错误的时候,应该尝试切换成fallback的模型 #11314 contract).
  • test_ra_accessor_pool_may_recover_multi_credential_returns_true
  • test_ra_accessor_pool_may_recover_cooled_down_returns_false
  • test_ra_accessor_pool_may_recover_handles_keyword_args — the call site passes provider= and base_url=; pin the kwargs surface so a future rename in run_agent doesn't silently break the in-loop call with a TypeError (same retry branch as the original NameError).
  • test_ra_accessor_pool_may_recover_cloudcode_returns_false — pin the cloudcode-pa:// exception path (Prefer fallback for Gemini CloudCode rate limits #13636 contract).

The MagicMock-based _fake_pool fixture mirrors the canonical helper in tests/run_agent/test_provider_fallback.py::TestPoolRotationRoom so the pool contract can't drift independently between the two test files.

How to Test

  1. Set up the venv:
    python3 -m venv .venv && source .venv/bin/activate && pip install -e ".[all,dev]"
    
  2. Run the new regression suite:
    scripts/run_tests.sh tests/agent/test_conversation_loop_pool_recover_reference.py -v
    
    Expected: 11 passed.
  3. Verify nothing in the existing _pool_may_recover_from_rate_limit direct-test suite broke:
    scripts/run_tests.sh tests/agent/test_conversation_loop_pool_recover_reference.py tests/agent/test_gemini_fast_fallback.py
    
    Expected: 18 passed (11 new + 7 pre-existing in test_gemini_fast_fallback.py).
  4. Confirm agent/conversation_loop.py imports cleanly and the lazy accessor resolves:
    python -c "from agent import conversation_loop; \
               print(callable(conversation_loop._ra()._pool_may_recover_from_rate_limit))"
    # -> True
    
  5. End-to-end smoke (manual): point at a provider you can deliberately rate-limit (or use monkeypatch.setattr in a unit test to raise a 429-style error from client.chat.completions.create). Pre-fix the agent crashes with NameError. Post-fix the loop either falls back to the configured fallback_model (when _pool_may_recover_from_rate_limit returns False) or rotates within the credential pool (when it returns True).

The pre-existing test_provider_fallback.py::TestFallbackChainAdvancement / TestFallbackChainDedup failures (16/22) reproduce identically on origin/main with git stash of these changes — they are unrelated and not introduced by this PR.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(conversation_loop):, test(conversation_loop):)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've run scripts/run_tests.sh and all relevant tests pass (11/11 new, no regressions on the related rate-limit test files)
  • I've added tests for my changes (11 new test cases pinning both the source shape and the runtime behaviour)
  • I've tested on my platform: macOS 15.2 (Darwin 24.6.0), Python 3.12

Documentation & Housekeeping

  • I've updated relevant documentation — N/A (one-line fix to an existing call site; no public API changed)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — pure Python module-attribute resolution; no platform-specific paths
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

$ scripts/run_tests.sh tests/agent/test_conversation_loop_pool_recover_reference.py
4 workers [11 items]
============================== 11 passed in 1.66s ==============================

$ python -c "from agent import conversation_loop; print(callable(conversation_loop._ra()._pool_may_recover_from_rate_limit))"
True

Before / after the call site:

-                    pool_may_recover = _pool_may_recover_from_rate_limit(
+                    pool_may_recover = _ra()._pool_may_recover_from_rate_limit(
                         agent._credential_pool,
                         provider=agent.provider,
                         base_url=getattr(agent, "base_url", None),
                     )

xxxigm added 2 commits May 17, 2026 21:25
…gh _ra()

A bare ``_pool_may_recover_from_rate_limit(...)`` reference in the
rate-limit branch of the conversation loop raised ``NameError`` on
every 429 from a provider, because the function lives in ``run_agent``
and is not imported at module scope in ``agent/conversation_loop.py``.
The crash short-circuited the eager-fallback path so users hitting a
provider quota (Ollama Cloud weekly cap, etc.) got the NameError
traceback instead of automatic failover to the configured next
provider.

The fix is one character of intent: route the call through the
existing lazy ``_ra()`` accessor that the rest of the module uses for
``run_agent`` symbols (``_ra().handle_function_call``,
``_ra().AIAgent``, ``_ra()._set_interrupt``).  This matches the
documented "patch-friendly module reference" idiom in the ``_ra()``
docstring at line 76 -- callers that monkeypatch ``run_agent.X`` will
see the patched value here too.

Fixes NousResearch#27465.
…hape

Add a regression suite specifically for NousResearch#27465 covering two layers:

1. Source invariant -- a regex check that no bare
   ``_pool_may_recover_from_rate_limit(`` call survives in
   ``agent/conversation_loop.py``, plus a positive assertion that the
   surviving call goes through ``_ra()._pool_may_recover_from_rate_limit(``.
   This is what catches a future refactor that drops the prefix again,
   without needing to drive the full retry loop into the 429 branch.

2. Symbol resolution -- exercise ``conversation_loop._ra()._pool_may_recover_from_rate_limit``
   end-to-end against the same MagicMock-based pool shape used by the
   canonical tests in ``test_provider_fallback.py`` so the contract
   (None -> False, single-credential -> False, multi-credential
   available -> True, cooldown -> False, cloudcode-pa:// -> False,
   keyword args -> bool) can't drift independently.

11 new tests, no production code touched.
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder duplicate This issue or pull request already exists labels May 17, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #27359 — identical one-line fix for NameError on _pool_may_recover_from_rate_limit() in conversation_loop.py:2254. #27359 and #27374 already address this. Canonical issue: #27370.

@JeremyDev87

Copy link
Copy Markdown

ddalggak review — superseded conclusion.

Blocking first: this is a duplicate implementation of the #27370/#27402/#27465 rate-limit NameError fix, and current test is failing.

Evidence checked:

Recommendation:

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Closing as fixed-on-main

Thanks for the patch — appreciated! Closing as redundant: the same one-line _ra()._pool_may_recover_from_rate_limit(...) wrap landed via PR #28345 (salvage of @AceWattGit's PR #28159), merged 2026-05-19, before this PR could be reviewed. Same code change, just a different cherry-pick.

Several contributors converged on the same fix here (#27370, #27465, #27468, #27583, #27686, #27732, #27734, #27735, #27750, #27891, #27903, #28304) — your diagnosis was correct in every case. Sorry for the duplicate-work cleanup; the volume of independent reports made it hard to coordinate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder duplicate This issue or pull request already exists P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: NameError '_pool_may_recover_from_rate_limit' in conversation_loop.py on rate limit (v0.14.0)

4 participants