Skip to content

Fix sticky fallback persistence timing and harden TTL parsing#7542

Closed
dkfjtang wants to merge 3 commits into
NousResearch:mainfrom
dkfjtang:pr/sticky-fallback-runtime-fix
Closed

Fix sticky fallback persistence timing and harden TTL parsing#7542
dkfjtang wants to merge 3 commits into
NousResearch:mainfrom
dkfjtang:pr/sticky-fallback-runtime-fix

Conversation

@dkfjtang

Copy link
Copy Markdown

Summary

This PR fixes the sticky fallback persistence semantics in Hermes and hardens TTL parsing.

What changed

  • move sticky fallback persistence from fallback activation time to successful fallback completion
  • add _maybe_persist_sticky_fallback() so only successful fallback runtimes are persisted
  • keep sticky restore behavior across turns
  • fall back to 600 seconds when sticky_fallback_ttl is invalid or non-positive
  • add targeted tests for:
    • activation does not persist sticky immediately
    • successful fallback does persist sticky
    • primary runtime does not persist sticky
    • expired sticky cache is cleaned up
    • fallback A fail / fallback B success restores B on the next turn
    • invalid TTL parsing falls back safely

Why

Previously, sticky state could be written as soon as a fallback was activated.
That was too early: activation does not guarantee success.
In failure chains, this could incorrectly pin a failed fallback as sticky for future turns.

Validation

pytest tests/run_agent/test_provider_fallback.py tests/run_agent/test_primary_runtime_restore.py -q

Result:

  • 44 passed

root added 3 commits April 11, 2026 02:07
- upgrade agent-browser from ^0.13.0 to ^0.25.3

- refresh root package-lock.json after npm install

- refresh whatsapp-bridge package-lock.json via npm audit fix

- clear npm audit findings for browser tools and whatsapp bridge
- agent/anthropic_adapter.py: update adapter logic

- agent/auxiliary_client.py: update auxiliary client handling

- agent/model_metadata.py: update model metadata

- gateway/run.py: update gateway runtime logic

- tests/*: update corresponding test coverage
- move sticky fallback persistence from activation-time to success-time
- add _maybe_persist_sticky_fallback() guard for fallback-only persistence
- fall back to 600s when sticky_fallback_ttl is invalid or non-positive
- add tests for activation-vs-success persistence, expired sticky cleanup,
  fallback A fail / fallback B success restore, and TTL parsing fallback
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 29, 2026
@dkfjtang dkfjtang closed this by deleting the head repository May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants