feat(nous): re-probe single-provider primary during rate-limit cooldown by teknium1 · Pull Request #41610 · NousResearch/hermes-agent

teknium1 · 2026-06-08T00:06:26Z

Summary

A single-provider Nous setup now recovers from a rate-limit cooldown as soon as the rolling cap frees up, instead of staying silent until the worst-case reset_at.

Port of openclaw/openclaw#90717 (single-provider primary re-probe during cooldown), adapted to hermes-agent's nous_rate_guard + agent-loop architecture.

Root cause: When agent.provider == "nous" hits a rate/subscription cap and no fallback provider is configured, the agent loop returned a hard "try again after the reset" failure and stayed blocked until the recorded reset_at literally arrived. That reset is the provider's worst case for the bucket window (x-ratelimit-reset-requests-1h reports the full hour). RPH/RPM are rolling windows that usually recover earlier, so a single-provider user could sit silent for minutes-to-days on a cap that already cleared.

Changes

agent/nous_rate_guard.py: new should_probe_nous_during_cooldown(has_fallback=...) — a throttled recovery probe (one per 30s slot, coordinated cross-session via a state file). Returns False when a fallback chain exists (prefer the fallback), True for a single-provider primary when the throttle slot is open. clear_nous_rate_limit() now also clears the probe slot.
agent/conversation_loop.py: in the no-fallback branch of the Nous rate-limit guard, fall through to a real (probe) API call when the throttle slot is open instead of hard-failing. A failed probe re-records the cooldown via the existing 429 handler; a successful one clears it. If the slot isn't open yet, the original "no fallback" failure response is preserved.
tests/agent/test_nous_rate_guard.py: 7 regression tests (fallback short-circuit, first-probe-allowed, throttle suspension, post-window re-probe, far-future-reset still probes, slot persistence, clear removes slot).

Adaptation notes

OpenClaw's fix lives in model-fallback.ts's shouldProbePrimaryDuringCooldown (auth-profile usage stats, 30s probe throttle). hermes-agent has no equivalent profile-store cooldown engine — the relevant cooldown is the file-based nous_rate_guard. So the same behavior (single-provider primary re-probes on a throttle rather than suspending until reset) is implemented at the guard + agent-loop boundary. The 30s throttle and "fallback present ⇒ no probe" semantics match the source.

Validation

	Before	After
Nous cooldown, no fallback, cap recovered early	silent until `reset_at` (up to days)	re-probes every 30s, resumes on recovery
Nous cooldown, fallback configured	activates fallback	activates fallback (unchanged)
Nous cooldown, no fallback, throttle slot closed	hard-fail message	hard-fail message (unchanged)

tests/agent/test_nous_rate_guard.py: 39/39 pass (32 existing + 7 new).
tests/run_agent/test_provider_fallback.py: 22/22 pass.
E2E with real imports + isolated HERMES_HOME: throttle window, fallback short-circuit, far-future-reset probe, and clear-removes-slot all verified.

Port from openclaw/openclaw#90717. When the Nous Portal primary hits a rate/subscription cap and NO fallback provider is configured, the agent loop hard-failed until the recorded reset_at literally arrived. The recorded reset is the provider's worst case for the bucket window (x-ratelimit-reset-requests-1h reports the full hour), but RPH/RPM are rolling windows that usually recover earlier — so a single-provider setup could go silent for minutes-to-days for a cap that already freed up. Adds should_probe_nous_during_cooldown() to nous_rate_guard.py: a throttled recovery probe (one per 30s slot, cross-session via a state file) that lets a single-provider primary test whether the cap recovered ahead of reset_at. With a fallback chain available we still prefer the fallback (no probe). A failed probe re-records the cooldown via the existing 429 handler; a successful one clears it. clear_nous_rate_limit() also clears the probe slot.

github-actions · 2026-06-08T00:07:11Z

🔎 Lint report: `openclaw-port/nous-single-provider-reprobe` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 10031 on HEAD, 10031 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5197 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

teknium1 · 2026-06-08T03:27:58Z

Closing — the premise doesn't hold against how the Nous guard actually works.

is_genuine_nous_rate_limit() only trips the breaker on case (a): the caller's own Nous account bucket (RPM/RPH/TPM/TPH) confirmed exhausted — remaining == 0 with a reset window ≥ 60s. Transient upstream capacity (case b) is already filtered out and never records a cooldown.

So by the time the no-fallback branch runs, we've proven the account quota is empty. Probing it every 30s then just hammers a confirmed-empty bucket — twice before the minimum reset window even elapses — against a provider explicitly signalling backoff. The PR body's justification ("reset_at is worst-case, rolling windows recover earlier") is the early-recovery case, which the guard already classifies as (b) and doesn't break on.

The natural retry already exists: the user sends their next message when they want to try again — throttled by a human, not a 30s timer, and without poking a quota that asked for space. No automated background probe needed.

AGENTS.md was almost entirely how-to/mechanics with the want/don't-want guidance implicit and scattered. Adds a single authoritative intent layer near the top, calibrated against what actually merges and what actually gets rejected. - 'What Hermes Is': framing + the two properties that drive design (prompt-cache integrity, narrow-waist core). - 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work: what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe to close on the three allowed reasons AND when NOT to close one. Taste-based 'won't implement / out of scope' closes stay human-only by design. - 'What we want' calibrated against the last ~55 merges: fix real bugs well, expand reach at the edges (platforms/channels/providers/models/desktop — large features land routinely), refactor god-files into clean modules, keep the CORE narrow. 'Expansive at the edges, conservative at the waist.' - 'What we don't want': speculative hooks, .env-for-non-secrets, needless core tools, lazy-read escape hatches, feature-destroying fixes, ungated telemetry, change-detector tests, core-touching plugins. - 'Before you call it a bug — verify the premise (and when NOT to close)': distilled from real closes (#41741 intentional-design-not-a-gap, #41610 wrong-premise, #42327 fix-never-executes, #42393 deliberate-omission, #41999 overreach). Doubles as sweeper guidance to avoid wrongly closing legitimate PRs. - 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool > plugin > MCP server in the catalog > new core tool (last resort). Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay where readers need them.

#42641) AGENTS.md was almost entirely how-to/mechanics with the want/don't-want guidance implicit and scattered. Adds a single authoritative intent layer near the top, calibrated against what actually merges and what actually gets rejected. - 'What Hermes Is': framing + the two properties that drive design (prompt-cache integrity, narrow-waist core). - 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work: what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe to close on the three allowed reasons AND when NOT to close one. Taste-based 'won't implement / out of scope' closes stay human-only by design. - 'What we want' calibrated against the last ~55 merges: fix real bugs well, expand reach at the edges (platforms/channels/providers/models/desktop — large features land routinely), refactor god-files into clean modules, keep the CORE narrow. 'Expansive at the edges, conservative at the waist.' - 'What we don't want': speculative hooks, .env-for-non-secrets, needless core tools, lazy-read escape hatches, feature-destroying fixes, ungated telemetry, change-detector tests, core-touching plugins. - 'Before you call it a bug — verify the premise (and when NOT to close)': distilled from real closes (#41741 intentional-design-not-a-gap, #41610 wrong-premise, #42327 fix-never-executes, #42393 deliberate-omission, #41999 overreach). Doubles as sweeper guidance to avoid wrongly closing legitimate PRs. - 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool > plugin > MCP server in the catalog > new core tool (last resort). Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay where readers need them.

NousResearch#42641) AGENTS.md was almost entirely how-to/mechanics with the want/don't-want guidance implicit and scattered. Adds a single authoritative intent layer near the top, calibrated against what actually merges and what actually gets rejected. - 'What Hermes Is': framing + the two properties that drive design (prompt-cache integrity, narrow-waist core). - 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work: what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe to close on the three allowed reasons AND when NOT to close one. Taste-based 'won't implement / out of scope' closes stay human-only by design. - 'What we want' calibrated against the last ~55 merges: fix real bugs well, expand reach at the edges (platforms/channels/providers/models/desktop — large features land routinely), refactor god-files into clean modules, keep the CORE narrow. 'Expansive at the edges, conservative at the waist.' - 'What we don't want': speculative hooks, .env-for-non-secrets, needless core tools, lazy-read escape hatches, feature-destroying fixes, ungated telemetry, change-detector tests, core-touching plugins. - 'Before you call it a bug — verify the premise (and when NOT to close)': distilled from real closes (NousResearch#41741 intentional-design-not-a-gap, NousResearch#41610 wrong-premise, NousResearch#42327 fix-never-executes, NousResearch#42393 deliberate-omission, NousResearch#41999 overreach). Doubles as sweeper guidance to avoid wrongly closing legitimate PRs. - 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool > plugin > MCP server in the catalog > new core tool (last resort). Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay where readers need them.

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/nous Nous Research API (OAuth) labels Jun 8, 2026

teknium1 closed this Jun 8, 2026

teknium1 deleted the openclaw-port/nous-single-provider-reprobe branch June 8, 2026 03:27

teknium1 mentioned this pull request Jun 9, 2026

docs(agents): add Design Philosophy + Contribution Rubric to AGENTS.md #42641

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nous): re-probe single-provider primary during rate-limit cooldown#41610

feat(nous): re-probe single-provider primary during rate-limit cooldown#41610
teknium1 wants to merge 1 commit into
mainfrom
openclaw-port/nous-single-provider-reprobe

teknium1 commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

teknium1 commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented Jun 8, 2026

Summary

Changes

Adaptation notes

Validation

Uh oh!

github-actions Bot commented Jun 8, 2026

🔎 Lint report: openclaw-port/nous-single-provider-reprobe vs origin/main

ruff

ty (type checker)

Uh oh!

teknium1 commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🔎 Lint report: `openclaw-port/nous-single-provider-reprobe` vs `origin/main`