Skip to content

fix(agent): check credential pool exhaustion before restoring primary runtime (#15298)#15434

Open
Tranquil-Flow wants to merge 2 commits into
NousResearch:mainfrom
Tranquil-Flow:fix/restore-primary-check-credential-pool
Open

fix(agent): check credential pool exhaustion before restoring primary runtime (#15298)#15434
Tranquil-Flow wants to merge 2 commits into
NousResearch:mainfrom
Tranquil-Flow:fix/restore-primary-check-credential-pool

Conversation

@Tranquil-Flow

@Tranquil-Flow Tranquil-Flow commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

_restore_primary_runtime() checks a 60-second rate-limit timer (_rate_limited_until) but does not consult the credential pool's exhaustion state. After the 60-second timer expires, it attempts to restore the primary provider every turn even if the credential pool still marks that provider as exhausted. This burns retries and generates noise during extended outages.

Related Issue

Fixes #15298

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • run_agent.py — after the _rate_limited_until check in _restore_primary_runtime(), added a guard that consults self._credential_pool.has_available(). If the pool exists for the primary provider and reports all credentials exhausted, restoration is skipped. This integrates with the existing CredentialPool architecture (which already tracks per-credential exhaustion with cooldowns) rather than adding another independent timer.
  • tests/run_agent/test_primary_runtime_restore.py — 4 new tests in TestCredentialPoolExhaustionGate

How to Test

  1. pytest tests/run_agent/test_primary_runtime_restore.py::TestCredentialPoolExhaustionGate -v — 4 new tests cover:
    • Pool exhausted, same provider: restoration blocked
    • Pool has credentials: restoration proceeds
    • Pool for different provider: not blocked
    • No pool at all: not blocked
  2. Tested on macOS (Python 3.11)

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS (Python 3.11)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

N/A — see commit description and PR diff.

… runtime (NousResearch#15298)

After the 60-second rate-limit timer expires, restore_primary_runtime()
attempted to restore the primary provider every turn even if the
credential pool still marked all credentials as exhausted. This burned
retries and generated noise during extended outages.

Now, after the _rate_limited_until check, if the agent has a
_credential_pool for the primary provider and the pool reports no
available credentials (all exhausted), restoration is skipped. This
integrates with the existing credential pool architecture (has_available /
_mark_exhausted) rather than adding another independent timer.

Adapted from the original PR: main has refactored _restore_primary_runtime
into agent.agent_runtime_helpers.restore_primary_runtime; the gate now
lives in the helper. Tests reach the helper transparently through the
forwarder on AIAgent.
@Tranquil-Flow Tranquil-Flow force-pushed the fix/restore-primary-check-credential-pool branch from 8c75dd0 to 287fccd Compare May 25, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_restore_primary_runtime() doesn't check credential cooldown — burns retries every turn while provider is exhausted

2 participants