Skip to content

feat(nous): re-probe single-provider primary during rate-limit cooldown#41610

Closed
teknium1 wants to merge 1 commit into
mainfrom
openclaw-port/nous-single-provider-reprobe
Closed

feat(nous): re-probe single-provider primary during rate-limit cooldown#41610
teknium1 wants to merge 1 commit into
mainfrom
openclaw-port/nous-single-provider-reprobe

Conversation

@teknium1

@teknium1 teknium1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

A single-provider Nous setup now recovers from a rate-limit cooldown as soon as the rolling cap frees up, instead of staying silent until the worst-case reset_at.

Port of openclaw/openclaw#90717 (single-provider primary re-probe during cooldown), adapted to hermes-agent's nous_rate_guard + agent-loop architecture.

Root cause: When agent.provider == "nous" hits a rate/subscription cap and no fallback provider is configured, the agent loop returned a hard "try again after the reset" failure and stayed blocked until the recorded reset_at literally arrived. That reset is the provider's worst case for the bucket window (x-ratelimit-reset-requests-1h reports the full hour). RPH/RPM are rolling windows that usually recover earlier, so a single-provider user could sit silent for minutes-to-days on a cap that already cleared.

Changes

  • agent/nous_rate_guard.py: new should_probe_nous_during_cooldown(has_fallback=...) — a throttled recovery probe (one per 30s slot, coordinated cross-session via a state file). Returns False when a fallback chain exists (prefer the fallback), True for a single-provider primary when the throttle slot is open. clear_nous_rate_limit() now also clears the probe slot.
  • agent/conversation_loop.py: in the no-fallback branch of the Nous rate-limit guard, fall through to a real (probe) API call when the throttle slot is open instead of hard-failing. A failed probe re-records the cooldown via the existing 429 handler; a successful one clears it. If the slot isn't open yet, the original "no fallback" failure response is preserved.
  • tests/agent/test_nous_rate_guard.py: 7 regression tests (fallback short-circuit, first-probe-allowed, throttle suspension, post-window re-probe, far-future-reset still probes, slot persistence, clear removes slot).

Adaptation notes

OpenClaw's fix lives in model-fallback.ts's shouldProbePrimaryDuringCooldown (auth-profile usage stats, 30s probe throttle). hermes-agent has no equivalent profile-store cooldown engine — the relevant cooldown is the file-based nous_rate_guard. So the same behavior (single-provider primary re-probes on a throttle rather than suspending until reset) is implemented at the guard + agent-loop boundary. The 30s throttle and "fallback present ⇒ no probe" semantics match the source.

Validation

Before After
Nous cooldown, no fallback, cap recovered early silent until reset_at (up to days) re-probes every 30s, resumes on recovery
Nous cooldown, fallback configured activates fallback activates fallback (unchanged)
Nous cooldown, no fallback, throttle slot closed hard-fail message hard-fail message (unchanged)
  • tests/agent/test_nous_rate_guard.py: 39/39 pass (32 existing + 7 new).
  • tests/run_agent/test_provider_fallback.py: 22/22 pass.
  • E2E with real imports + isolated HERMES_HOME: throttle window, fallback short-circuit, far-future-reset probe, and clear-removes-slot all verified.

Port from openclaw/openclaw#90717.

When the Nous Portal primary hits a rate/subscription cap and NO fallback
provider is configured, the agent loop hard-failed until the recorded
reset_at literally arrived. The recorded reset is the provider's worst case
for the bucket window (x-ratelimit-reset-requests-1h reports the full hour),
but RPH/RPM are rolling windows that usually recover earlier — so a
single-provider setup could go silent for minutes-to-days for a cap that
already freed up.

Adds should_probe_nous_during_cooldown() to nous_rate_guard.py: a throttled
recovery probe (one per 30s slot, cross-session via a state file) that lets
a single-provider primary test whether the cap recovered ahead of reset_at.
With a fallback chain available we still prefer the fallback (no probe).

A failed probe re-records the cooldown via the existing 429 handler; a
successful one clears it. clear_nous_rate_limit() also clears the probe slot.
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: openclaw-port/nous-single-provider-reprobe vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 10031 on HEAD, 10031 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5197 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder provider/nous Nous Research API (OAuth) labels Jun 8, 2026
@teknium1 teknium1 closed this Jun 8, 2026
@teknium1 teknium1 deleted the openclaw-port/nous-single-provider-reprobe branch June 8, 2026 03:27
@teknium1

teknium1 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Closing — the premise doesn't hold against how the Nous guard actually works.

is_genuine_nous_rate_limit() only trips the breaker on case (a): the caller's own Nous account bucket (RPM/RPH/TPM/TPH) confirmed exhausted — remaining == 0 with a reset window ≥ 60s. Transient upstream capacity (case b) is already filtered out and never records a cooldown.

So by the time the no-fallback branch runs, we've proven the account quota is empty. Probing it every 30s then just hammers a confirmed-empty bucket — twice before the minimum reset window even elapses — against a provider explicitly signalling backoff. The PR body's justification ("reset_at is worst-case, rolling windows recover earlier") is the early-recovery case, which the guard already classifies as (b) and doesn't break on.

The natural retry already exists: the user sends their next message when they want to try again — throttled by a human, not a 30s timer, and without poking a quota that asked for space. No automated background probe needed.

teknium1 added a commit that referenced this pull request Jun 9, 2026
AGENTS.md was almost entirely how-to/mechanics with the want/don't-want
guidance implicit and scattered. Adds a single authoritative intent layer
near the top, calibrated against what actually merges and what actually
gets rejected.

- 'What Hermes Is': framing + the two properties that drive design
  (prompt-cache integrity, narrow-waist core).
- 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work:
  what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe
  to close on the three allowed reasons AND when NOT to close one. Taste-based
  'won't implement / out of scope' closes stay human-only by design.
  - 'What we want' calibrated against the last ~55 merges: fix real bugs well,
    expand reach at the edges (platforms/channels/providers/models/desktop —
    large features land routinely), refactor god-files into clean modules,
    keep the CORE narrow. 'Expansive at the edges, conservative at the waist.'
  - 'What we don't want': speculative hooks, .env-for-non-secrets, needless
    core tools, lazy-read escape hatches, feature-destroying fixes, ungated
    telemetry, change-detector tests, core-touching plugins.
  - 'Before you call it a bug — verify the premise (and when NOT to close)':
    distilled from real closes (#41741 intentional-design-not-a-gap, #41610
    wrong-premise, #42327 fix-never-executes, #42393 deliberate-omission,
    #41999 overreach). Doubles as sweeper guidance to avoid wrongly closing
    legitimate PRs.
- 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool
  > plugin > MCP server in the catalog > new core tool (last resort).

Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay
where readers need them.
teknium1 added a commit that referenced this pull request Jun 10, 2026
#42641)

AGENTS.md was almost entirely how-to/mechanics with the want/don't-want
guidance implicit and scattered. Adds a single authoritative intent layer
near the top, calibrated against what actually merges and what actually
gets rejected.

- 'What Hermes Is': framing + the two properties that drive design
  (prompt-cache integrity, narrow-waist core).
- 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work:
  what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe
  to close on the three allowed reasons AND when NOT to close one. Taste-based
  'won't implement / out of scope' closes stay human-only by design.
  - 'What we want' calibrated against the last ~55 merges: fix real bugs well,
    expand reach at the edges (platforms/channels/providers/models/desktop —
    large features land routinely), refactor god-files into clean modules,
    keep the CORE narrow. 'Expansive at the edges, conservative at the waist.'
  - 'What we don't want': speculative hooks, .env-for-non-secrets, needless
    core tools, lazy-read escape hatches, feature-destroying fixes, ungated
    telemetry, change-detector tests, core-touching plugins.
  - 'Before you call it a bug — verify the premise (and when NOT to close)':
    distilled from real closes (#41741 intentional-design-not-a-gap, #41610
    wrong-premise, #42327 fix-never-executes, #42393 deliberate-omission,
    #41999 overreach). Doubles as sweeper guidance to avoid wrongly closing
    legitimate PRs.
- 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool
  > plugin > MCP server in the catalog > new core tool (last resort).

Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay
where readers need them.
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
NousResearch#42641)

AGENTS.md was almost entirely how-to/mechanics with the want/don't-want
guidance implicit and scattered. Adds a single authoritative intent layer
near the top, calibrated against what actually merges and what actually
gets rejected.

- 'What Hermes Is': framing + the two properties that drive design
  (prompt-cache integrity, narrow-waist core).
- 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work:
  what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe
  to close on the three allowed reasons AND when NOT to close one. Taste-based
  'won't implement / out of scope' closes stay human-only by design.
  - 'What we want' calibrated against the last ~55 merges: fix real bugs well,
    expand reach at the edges (platforms/channels/providers/models/desktop —
    large features land routinely), refactor god-files into clean modules,
    keep the CORE narrow. 'Expansive at the edges, conservative at the waist.'
  - 'What we don't want': speculative hooks, .env-for-non-secrets, needless
    core tools, lazy-read escape hatches, feature-destroying fixes, ungated
    telemetry, change-detector tests, core-touching plugins.
  - 'Before you call it a bug — verify the premise (and when NOT to close)':
    distilled from real closes (NousResearch#41741 intentional-design-not-a-gap, NousResearch#41610
    wrong-premise, NousResearch#42327 fix-never-executes, NousResearch#42393 deliberate-omission,
    NousResearch#41999 overreach). Doubles as sweeper guidance to avoid wrongly closing
    legitimate PRs.
- 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool
  > plugin > MCP server in the catalog > new core tool (last resort).

Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay
where readers need them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/nous Nous Research API (OAuth) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants