feat(nous): re-probe single-provider primary during rate-limit cooldown#41610
feat(nous): re-probe single-provider primary during rate-limit cooldown#41610teknium1 wants to merge 1 commit into
Conversation
Port from openclaw/openclaw#90717. When the Nous Portal primary hits a rate/subscription cap and NO fallback provider is configured, the agent loop hard-failed until the recorded reset_at literally arrived. The recorded reset is the provider's worst case for the bucket window (x-ratelimit-reset-requests-1h reports the full hour), but RPH/RPM are rolling windows that usually recover earlier — so a single-provider setup could go silent for minutes-to-days for a cap that already freed up. Adds should_probe_nous_during_cooldown() to nous_rate_guard.py: a throttled recovery probe (one per 30s slot, cross-session via a state file) that lets a single-provider primary test whether the cap recovered ahead of reset_at. With a fallback chain available we still prefer the fallback (no probe). A failed probe re-records the cooldown via the existing 429 handler; a successful one clears it. clear_nous_rate_limit() also clears the probe slot.
🔎 Lint report:
|
|
Closing — the premise doesn't hold against how the Nous guard actually works.
So by the time the no-fallback branch runs, we've proven the account quota is empty. Probing it every 30s then just hammers a confirmed-empty bucket — twice before the minimum reset window even elapses — against a provider explicitly signalling backoff. The PR body's justification ("reset_at is worst-case, rolling windows recover earlier") is the early-recovery case, which the guard already classifies as (b) and doesn't break on. The natural retry already exists: the user sends their next message when they want to try again — throttled by a human, not a 30s timer, and without poking a quota that asked for space. No automated background probe needed. |
AGENTS.md was almost entirely how-to/mechanics with the want/don't-want
guidance implicit and scattered. Adds a single authoritative intent layer
near the top, calibrated against what actually merges and what actually
gets rejected.
- 'What Hermes Is': framing + the two properties that drive design
(prompt-cache integrity, narrow-waist core).
- 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work:
what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe
to close on the three allowed reasons AND when NOT to close one. Taste-based
'won't implement / out of scope' closes stay human-only by design.
- 'What we want' calibrated against the last ~55 merges: fix real bugs well,
expand reach at the edges (platforms/channels/providers/models/desktop —
large features land routinely), refactor god-files into clean modules,
keep the CORE narrow. 'Expansive at the edges, conservative at the waist.'
- 'What we don't want': speculative hooks, .env-for-non-secrets, needless
core tools, lazy-read escape hatches, feature-destroying fixes, ungated
telemetry, change-detector tests, core-touching plugins.
- 'Before you call it a bug — verify the premise (and when NOT to close)':
distilled from real closes (#41741 intentional-design-not-a-gap, #41610
wrong-premise, #42327 fix-never-executes, #42393 deliberate-omission,
#41999 overreach). Doubles as sweeper guidance to avoid wrongly closing
legitimate PRs.
- 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool
> plugin > MCP server in the catalog > new core tool (last resort).
Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay
where readers need them.
#42641) AGENTS.md was almost entirely how-to/mechanics with the want/don't-want guidance implicit and scattered. Adds a single authoritative intent layer near the top, calibrated against what actually merges and what actually gets rejected. - 'What Hermes Is': framing + the two properties that drive design (prompt-cache integrity, narrow-waist core). - 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work: what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe to close on the three allowed reasons AND when NOT to close one. Taste-based 'won't implement / out of scope' closes stay human-only by design. - 'What we want' calibrated against the last ~55 merges: fix real bugs well, expand reach at the edges (platforms/channels/providers/models/desktop — large features land routinely), refactor god-files into clean modules, keep the CORE narrow. 'Expansive at the edges, conservative at the waist.' - 'What we don't want': speculative hooks, .env-for-non-secrets, needless core tools, lazy-read escape hatches, feature-destroying fixes, ungated telemetry, change-detector tests, core-touching plugins. - 'Before you call it a bug — verify the premise (and when NOT to close)': distilled from real closes (#41741 intentional-design-not-a-gap, #41610 wrong-premise, #42327 fix-never-executes, #42393 deliberate-omission, #41999 overreach). Doubles as sweeper guidance to avoid wrongly closing legitimate PRs. - 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool > plugin > MCP server in the catalog > new core tool (last resort). Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay where readers need them.
NousResearch#42641) AGENTS.md was almost entirely how-to/mechanics with the want/don't-want guidance implicit and scattered. Adds a single authoritative intent layer near the top, calibrated against what actually merges and what actually gets rejected. - 'What Hermes Is': framing + the two properties that drive design (prompt-cache integrity, narrow-waist core). - 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work: what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe to close on the three allowed reasons AND when NOT to close one. Taste-based 'won't implement / out of scope' closes stay human-only by design. - 'What we want' calibrated against the last ~55 merges: fix real bugs well, expand reach at the edges (platforms/channels/providers/models/desktop — large features land routinely), refactor god-files into clean modules, keep the CORE narrow. 'Expansive at the edges, conservative at the waist.' - 'What we don't want': speculative hooks, .env-for-non-secrets, needless core tools, lazy-read escape hatches, feature-destroying fixes, ungated telemetry, change-detector tests, core-touching plugins. - 'Before you call it a bug — verify the premise (and when NOT to close)': distilled from real closes (NousResearch#41741 intentional-design-not-a-gap, NousResearch#41610 wrong-premise, NousResearch#42327 fix-never-executes, NousResearch#42393 deliberate-omission, NousResearch#41999 overreach). Doubles as sweeper guidance to avoid wrongly closing legitimate PRs. - 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool > plugin > MCP server in the catalog > new core tool (last resort). Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay where readers need them.
Summary
A single-provider Nous setup now recovers from a rate-limit cooldown as soon as the rolling cap frees up, instead of staying silent until the worst-case
reset_at.Port of openclaw/openclaw#90717 (single-provider primary re-probe during cooldown), adapted to hermes-agent's
nous_rate_guard+ agent-loop architecture.Root cause: When
agent.provider == "nous"hits a rate/subscription cap and no fallback provider is configured, the agent loop returned a hard "try again after the reset" failure and stayed blocked until the recordedreset_atliterally arrived. That reset is the provider's worst case for the bucket window (x-ratelimit-reset-requests-1hreports the full hour). RPH/RPM are rolling windows that usually recover earlier, so a single-provider user could sit silent for minutes-to-days on a cap that already cleared.Changes
agent/nous_rate_guard.py: newshould_probe_nous_during_cooldown(has_fallback=...)— a throttled recovery probe (one per 30s slot, coordinated cross-session via a state file). ReturnsFalsewhen a fallback chain exists (prefer the fallback),Truefor a single-provider primary when the throttle slot is open.clear_nous_rate_limit()now also clears the probe slot.agent/conversation_loop.py: in the no-fallback branch of the Nous rate-limit guard, fall through to a real (probe) API call when the throttle slot is open instead of hard-failing. A failed probe re-records the cooldown via the existing 429 handler; a successful one clears it. If the slot isn't open yet, the original "no fallback" failure response is preserved.tests/agent/test_nous_rate_guard.py: 7 regression tests (fallback short-circuit, first-probe-allowed, throttle suspension, post-window re-probe, far-future-reset still probes, slot persistence, clear removes slot).Adaptation notes
OpenClaw's fix lives in
model-fallback.ts'sshouldProbePrimaryDuringCooldown(auth-profile usage stats, 30s probe throttle). hermes-agent has no equivalent profile-store cooldown engine — the relevant cooldown is the file-basednous_rate_guard. So the same behavior (single-provider primary re-probes on a throttle rather than suspending until reset) is implemented at the guard + agent-loop boundary. The 30s throttle and "fallback present ⇒ no probe" semantics match the source.Validation
reset_at(up to days)tests/agent/test_nous_rate_guard.py: 39/39 pass (32 existing + 7 new).tests/run_agent/test_provider_fallback.py: 22/22 pass.HERMES_HOME: throttle window, fallback short-circuit, far-future-reset probe, and clear-removes-slot all verified.