fix(conversation_loop): NameError on rate limit — route _pool_may_recover_from_rate_limit through _ra() (#27465) by xxxigm · Pull Request #27468 · NousResearch/hermes-agent

xxxigm · 2026-05-17T14:28:14Z

What does this PR do?

Fixes the NameError: name '_pool_may_recover_from_rate_limit' is not defined crash that hit every 429 from a provider in v0.14.0.

Root cause: agent/conversation_loop.py at line 2254 called the helper as a bare name:

pool_may_recover = _pool_may_recover_from_rate_limit(
    agent._credential_pool,
    provider=agent.provider,
    base_url=getattr(agent, "base_url", None),
)

…but the function is defined in run_agent.py and is not imported at module scope in conversation_loop.py. The result: instead of falling through to the configured fallback provider, every rate-limit error blew up with the NameError traceback. Reproduces deterministically against any provider that returns 429 (e.g. Ollama Cloud weekly quota exhaustion, as reported).

The fix: route the call through the existing _ra() lazy accessor at the top of conversation_loop.py that the rest of the module already uses for run_agent symbols (_ra().handle_function_call, _ra().AIAgent, _ra()._set_interrupt, etc.). This matches the documented "patch-friendly module reference" idiom in _ra()'s docstring at line 76 — callers that monkeypatch.setattr(run_agent, ...) continue to see the patched value here too.

One-character intent change, plus a regression-test suite specifically pinned at the two layers that would have caught this before it shipped.

Related Issue

Closes #27465.

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

Production fix (agent/conversation_loop.py, +7 / −1):

Replace _pool_may_recover_from_rate_limit(...) with _ra()._pool_may_recover_from_rate_limit(...) at the only call site, matching the existing pattern at lines 487/489/3699/3748 (_ra()._set_interrupt, _ra().AIAgent, _ra().handle_function_call).
Add a code comment cross-referencing bug: NameError '_pool_may_recover_from_rate_limit' in conversation_loop.py on rate limit (v0.14.0) #27465 so the same regression isn't reintroduced by a future refactor.

Regression tests (tests/agent/test_conversation_loop_pool_recover_reference.py, new file, +198):

11 tests across two layers.

Source-invariant layer (catches the original bug shape):

test_conversation_loop_source_exists — sanity check.
test_no_bare_pool_may_recover_call_in_conversation_loop — regex pins that no unqualified _pool_may_recover_from_rate_limit( call survives in the file. This is the strongest regression guard — it would have failed on the v0.14.0 release commit.
test_call_site_uses_lazy_accessor — positive assertion that at least one _ra()._pool_may_recover_from_rate_limit( call exists, so a future refactor that drops the call entirely (or accidentally renames it) is also caught.
test_conversation_loop_compiles_cleanly — ast.parse smoke test.

Symbol-resolution layer (catches API drift):

test_ra_accessor_resolves_pool_may_recover — _ra()._pool_may_recover_from_rate_limit is callable and stable across calls (no stale module binding).
test_ra_accessor_pool_may_recover_none_pool_returns_false
test_ra_accessor_pool_may_recover_single_credential_returns_false — pin the entries() <= 1 short-circuit ([Feature]: 遇到http 529错误的时候，应该尝试切换成fallback的模型 #11314 contract).
test_ra_accessor_pool_may_recover_multi_credential_returns_true
test_ra_accessor_pool_may_recover_cooled_down_returns_false
test_ra_accessor_pool_may_recover_handles_keyword_args — the call site passes provider= and base_url=; pin the kwargs surface so a future rename in run_agent doesn't silently break the in-loop call with a TypeError (same retry branch as the original NameError).
test_ra_accessor_pool_may_recover_cloudcode_returns_false — pin the cloudcode-pa:// exception path (Prefer fallback for Gemini CloudCode rate limits #13636 contract).

The MagicMock-based _fake_pool fixture mirrors the canonical helper in tests/run_agent/test_provider_fallback.py::TestPoolRotationRoom so the pool contract can't drift independently between the two test files.

How to Test

Set up the venv:

python3 -m venv .venv && source .venv/bin/activate && pip install -e ".[all,dev]"

Run the new regression suite:

scripts/run_tests.sh tests/agent/test_conversation_loop_pool_recover_reference.py -v

Expected: 11 passed.

Verify nothing in the existing _pool_may_recover_from_rate_limit direct-test suite broke:
```
scripts/run_tests.sh tests/agent/test_conversation_loop_pool_recover_reference.py tests/agent/test_gemini_fast_fallback.py
```
Expected: 18 passed (11 new + 7 pre-existing in test_gemini_fast_fallback.py).

Confirm agent/conversation_loop.py imports cleanly and the lazy accessor resolves:

python -c "from agent import conversation_loop; \
           print(callable(conversation_loop._ra()._pool_may_recover_from_rate_limit))"
# -> True

End-to-end smoke (manual): point at a provider you can deliberately rate-limit (or use monkeypatch.setattr in a unit test to raise a 429-style error from client.chat.completions.create). Pre-fix the agent crashes with NameError. Post-fix the loop either falls back to the configured fallback_model (when _pool_may_recover_from_rate_limit returns False) or rotates within the credential pool (when it returns True).

The pre-existing test_provider_fallback.py::TestFallbackChainAdvancement / TestFallbackChainDedup failures (16/22) reproduce identically on origin/main with git stash of these changes — they are unrelated and not introduced by this PR.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(conversation_loop):, test(conversation_loop):)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix (no unrelated commits)
I've run scripts/run_tests.sh and all relevant tests pass (11/11 new, no regressions on the related rate-limit test files)
I've added tests for my changes (11 new test cases pinning both the source shape and the runtime behaviour)
I've tested on my platform: macOS 15.2 (Darwin 24.6.0), Python 3.12

Documentation & Housekeeping

I've updated relevant documentation — N/A (one-line fix to an existing call site; no public API changed)
I've updated cli-config.yaml.example if I added/changed config keys — N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — pure Python module-attribute resolution; no platform-specific paths
I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

$ scripts/run_tests.sh tests/agent/test_conversation_loop_pool_recover_reference.py
4 workers [11 items]
============================== 11 passed in 1.66s ==============================

$ python -c "from agent import conversation_loop; print(callable(conversation_loop._ra()._pool_may_recover_from_rate_limit))"
True

Before / after the call site:

-                    pool_may_recover = _pool_may_recover_from_rate_limit(
+                    pool_may_recover = _ra()._pool_may_recover_from_rate_limit(
                         agent._credential_pool,
                         provider=agent.provider,
                         base_url=getattr(agent, "base_url", None),
                     )

…gh _ra() A bare ``_pool_may_recover_from_rate_limit(...)`` reference in the rate-limit branch of the conversation loop raised ``NameError`` on every 429 from a provider, because the function lives in ``run_agent`` and is not imported at module scope in ``agent/conversation_loop.py``. The crash short-circuited the eager-fallback path so users hitting a provider quota (Ollama Cloud weekly cap, etc.) got the NameError traceback instead of automatic failover to the configured next provider. The fix is one character of intent: route the call through the existing lazy ``_ra()`` accessor that the rest of the module uses for ``run_agent`` symbols (``_ra().handle_function_call``, ``_ra().AIAgent``, ``_ra()._set_interrupt``). This matches the documented "patch-friendly module reference" idiom in the ``_ra()`` docstring at line 76 -- callers that monkeypatch ``run_agent.X`` will see the patched value here too. Fixes NousResearch#27465.

…hape Add a regression suite specifically for NousResearch#27465 covering two layers: 1. Source invariant -- a regex check that no bare ``_pool_may_recover_from_rate_limit(`` call survives in ``agent/conversation_loop.py``, plus a positive assertion that the surviving call goes through ``_ra()._pool_may_recover_from_rate_limit(``. This is what catches a future refactor that drops the prefix again, without needing to drive the full retry loop into the 429 branch. 2. Symbol resolution -- exercise ``conversation_loop._ra()._pool_may_recover_from_rate_limit`` end-to-end against the same MagicMock-based pool shape used by the canonical tests in ``test_provider_fallback.py`` so the contract (None -> False, single-credential -> False, multi-credential available -> True, cooldown -> False, cloudcode-pa:// -> False, keyword args -> bool) can't drift independently. 11 new tests, no production code touched.

alt-glitch · 2026-05-17T14:44:38Z

Duplicate of #27359 — identical one-line fix for NameError on _pool_may_recover_from_rate_limit() in conversation_loop.py:2254. #27359 and #27374 already address this. Canonical issue: #27370.

JeremyDev87 · 2026-05-17T15:48:42Z

ddalggak review — superseded conclusion.

Blocking first: this is a duplicate implementation of the #27370/#27402/#27465 rate-limit NameError fix, and current test is failing.

Evidence checked:

Production change overlaps the exact same agent/conversation_loop.py call site as fix(agent): wrap _pool_may_recover_from_rate_limit with _ra() lazy import #27374/fix(agent): resolve NameError in rate-limit fallback decision #27359.
This PR closes bug: NameError '_pool_may_recover_from_rate_limit' in conversation_loop.py on rate limit (v0.14.0) #27465, but bug: NameError '_pool_may_recover_from_rate_limit' in conversation_loop.py on rate limit (v0.14.0) #27465 itself is marked/commented as duplicate of the earlier canonical issue [Bug] NameError: '_pool_may_recover_from_rate_limit' is not defined in conversation_loop.py #27370/NameError: '_pool_may_recover_from_rate_limit' is not defined in conversation_loop.py #27402.
The added regression test may be useful, but the production fix should not exist in three competing PRs.

Recommendation:

Prefer fix(agent): wrap _pool_may_recover_from_rate_limit with _ra() lazy import #27374 as the canonical narrow fix once checks run green.
If maintainers want this regression coverage, port the useful test onto the canonical PR or open it as a follow-up after the one-line fix lands.

kshitijk4poor · 2026-05-19T15:38:36Z

Closing as fixed-on-main

Thanks for the patch — appreciated! Closing as redundant: the same one-line _ra()._pool_may_recover_from_rate_limit(...) wrap landed via PR #28345 (salvage of @AceWattGit's PR #28159), merged 2026-05-19, before this PR could be reviewed. Same code change, just a different cherry-pick.

Several contributors converged on the same fix here (#27370, #27465, #27468, #27583, #27686, #27732, #27734, #27735, #27750, #27891, #27903, #28304) — your diagnosis was correct in every case. Sorry for the duplicate-work cleanup; the volume of independent reports made it hard to coordinate.

xxxigm added 2 commits May 17, 2026 21:25

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder duplicate This issue or pull request already exists labels May 17, 2026

JeremyDev87 mentioned this pull request May 17, 2026

fix(agent): wrap _pool_may_recover_from_rate_limit with _ra() lazy import #27374

Closed

2 tasks

kshitijk4poor closed this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(conversation_loop): NameError on rate limit — route _pool_may_recover_from_rate_limit through _ra() (#27465)#27468

fix(conversation_loop): NameError on rate limit — route _pool_may_recover_from_rate_limit through _ra() (#27465)#27468
xxxigm wants to merge 2 commits into
NousResearch:mainfrom
xxxigm:fix/27465-pool-may-recover-nameerror

xxxigm commented May 17, 2026

Uh oh!

alt-glitch commented May 17, 2026

Uh oh!

JeremyDev87 commented May 17, 2026

Uh oh!

kshitijk4poor commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xxxigm commented May 17, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

alt-glitch commented May 17, 2026

Uh oh!

JeremyDev87 commented May 17, 2026

Uh oh!

kshitijk4poor commented May 19, 2026

Closing as fixed-on-main

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants