Skip to content

fix(gateway): prevent blank auto-resume messages#23314

Open
kaiyisg wants to merge 3 commits into
NousResearch:mainfrom
kaiyisg:fix/gateway-auto-resume-blank-messages
Open

fix(gateway): prevent blank auto-resume messages#23314
kaiyisg wants to merge 3 commits into
NousResearch:mainfrom
kaiyisg:fix/gateway-auto-resume-blank-messages

Conversation

@kaiyisg

@kaiyisg kaiyisg commented May 10, 2026

Copy link
Copy Markdown

What does this PR do?

Prevents gateway restart auto-resume from sending an empty internal message to the agent.

When a gateway is interrupted mid-turn, startup recovery can synthesize the next internal turn before the original user message has been persisted to the transcript. Previously that synthetic MessageEvent used text="", so the model could treat recovery as a user-sent blank message and reply that the message came through empty.

This change records the in-flight user message on the session entry while the turn is processing. Startup auto-resume replays that captured message when available. If no captured message exists, it uses an explicit nonblank internal continuation instruction so recovery does not look like an empty user turn.

Related Issue

No issue filed. Related restart-resume work: #18896, #20776.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/session.py: persist in_flight_user_message and in_flight_marked_at on SessionEntry.
  • gateway/session.py: add SessionStore.mark_in_flight() and SessionStore.clear_in_flight() helpers.
  • gateway/run.py: mark the prepared inbound message as in-flight before agent processing.
  • gateway/run.py: clear the in-flight marker after a completed turn.
  • gateway/run.py: use captured in-flight text for restart auto-resume, with a nonblank internal fallback instruction.
  • tests/gateway/test_restart_resume_pending.py: cover session serialization, mark/clear behavior, replaying captured text, and nonblank fallback recovery.

How to Test

  1. Trigger a gateway turn, restart/interrupt the gateway before completion, and let startup auto-resume the pending session.
  2. Confirm the resumed turn continues the interrupted user request instead of producing a "message came through empty" response.
  3. Run the focused regression checks below.

Commands run locally:

venv/bin/python -m py_compile gateway/session.py gateway/run.py tests/gateway/test_restart_resume_pending.py
venv/bin/python -m pytest -o addopts= tests/gateway/test_restart_resume_pending.py
python3 scripts/check-windows-footguns.py --diff origin/main

Results:

71 passed in tests/gateway/test_restart_resume_pending.py
✓ No Windows footguns found (3 file(s) scanned).

Note: scripts/run_tests.sh tests/gateway/test_restart_resume_pending.py could not run in this local checkout because the shared venv has no pip, and the wrapper tries to install pytest-split before running tests.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 26.2

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Observed locally in a messaging gateway: after a restart interrupted a run turn, the old auto-resume path produced a synthetic internal turn with blank content and the bot replied that the message came through empty. This PR makes that synthetic turn nonblank and, when possible, replays the original in-flight message.

@kaiyisg kaiyisg marked this pull request as ready for review May 10, 2026 16:56
@kaiyisg kaiyisg force-pushed the fix/gateway-auto-resume-blank-messages branch from 526f86f to a7603e5 Compare May 10, 2026 16:59
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants