fix(gateway): keep auto-continue recovery ephemeral#25561
Open
qWaitCrypto wants to merge 1 commit into
Open
Conversation
a9cdf0c to
b2436b4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes gateway auto-continue recovery so an interrupt recovery hint stays ephemeral and cannot be persisted or compacted into long-lived session context.
When a gateway session was interrupted with a trailing tool result, the next turn prepended an auto-continue system note directly into the user message. If that same turn triggered preflight compression, the synthetic note could be saved into the compressed child session and replayed later as if the user had authored it.
This PR keeps gateway recovery notes API-only via
persist_user_message, records a durable acknowledgement for each recovered trailing tool batch, and propagates compression-createdsession_idchanges even when the interrupted turn returns no final response.Related Issue
Fixes #25242
Type of Change
Changes Made
gateway/run.pypersist_user_messageso gateway recovery notes remain API-only and do not persist into transcript history or compression summaries.agent.session_idbefore empty/interrupted-result early returns.gateway/session.pyauto_continue_tool_tail_keyandauto_continue_tool_tail_ack_atonSessionEntry.mark_auto_continue_tool_tail_ack()for successful recovery attempts that reached the model.run_agent.pytests/gateway/test_auto_continue_recovery.pyHow to Test
Reproduced the pre-fix persistence shape with the old gateway behavior:
Result before fix:
Verified the fixed recovery state blocks replay of the same trailing tool batch and preserves clean user text:
Result after fix:
Ran syntax checks:
Result: passed.
Ran focused gateway/session regression tests:
Result:
Ran the existing persisted-user-message override test with a writable temporary Hermes home:
Result:
Attempted broader gateway usage tests:
Result: timed out after collecting and starting
TestUsageCachedAgent::test_cached_agent_shows_detailed_usage; no assertion output was produced before timeout. This appears unrelated to the changed recovery/session paths.Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — N/Acli-config.yaml.exampleif I added/changed config keys — N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/AScreenshots / Logs
Relevant reproduction output and test logs are included in "How to Test" above.