Skip to content

fix(gateway): route synthetic background events to their originating session#9537

Closed
etcircle wants to merge 1 commit into
NousResearch:mainfrom
etcircle:fix/background-watch-thread-routing
Closed

fix(gateway): route synthetic background events to their originating session#9537
etcircle wants to merge 1 commit into
NousResearch:mainfrom
etcircle:fix/background-watch-thread-routing

Conversation

@etcircle

@etcircle etcircle commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

Summary

  • route queued watch notifications via the originating session instead of the currently active foreground event
  • attach session_key plus routing metadata to queued watch events in tools/process_registry.py
  • resolve synthetic watch/completion event sources via session_key -> session_store.origin, with safe fallback parsing
  • preserve the correct chat_type for synthetic completion events so threaded group traffic does not get rebuilt as bogus DM sessions

This addresses the cross-topic/session bleed described in #9532 and the related synthetic-session reconstruction bug behind the screenshot symptom.

Test Plan

  • python -m pytest tests/tools/test_watch_patterns.py -q
  • python -m pytest tests/gateway/test_background_process_notifications.py -q
  • python -m pytest tests/gateway/test_internal_event_bypass_pairing.py -q
  • python -m pytest tests/gateway/test_background_command.py -q
  • python -m pytest tests/gateway/test_session_env.py -q

Notes:

  • tests/gateway/test_run_progress_topics.py is currently flaky / baseline-drifty on pristine latest origin/main as well, so it is intentionally not part of this fix scope.

Closes #9532

@etcircle

Copy link
Copy Markdown
Contributor Author

Refreshed this branch on top of latest origin/main (e69526be) and re-ran the investigation with CGC.

What changed in this refresh:

  • kept the original fix direction (watch events must not route from the current foreground event)
  • added session_key to queued watch_match / watch_disabled events
  • switched gateway injection to use the queued watch event (evt), not the current event
  • resolved the source primarily via session_key -> session_store.origin
  • kept metadata/session-key parsing fallback so the injected synthetic message still carries the correct topic/group identity
  • added regression coverage in:
    • tests/gateway/test_background_process_notifications.py
    • tests/tools/test_watch_patterns.py

Why the session_key part matters:

  • the original issue is cross-topic bleed from using original_event.source
  • a second local symptom was bogus synthetic session shapes like agent:main:telegram:dm:-100...:1795
  • routing from the canonical session-store origin fixes both the misdelivery and the DM-shaped transcript pollution

Targeted verification on this refreshed branch:

  • tests/gateway/test_background_process_notifications.py
  • tests/tools/test_watch_patterns.py
  • tests/gateway/test_internal_event_bypass_pairing.py
  • tests/gateway/test_run_progress_topics.py
  • tests/gateway/test_session_env.py
  • tests/gateway/test_background_command.py

Result: 89 passed locally.

@etcircle etcircle force-pushed the fix/background-watch-thread-routing branch from e7907da to de4d5b6 Compare April 15, 2026 15:41
@etcircle etcircle changed the title fix(gateway): route watch notifications to their originating thread fix(gateway): route synthetic background events to their originating session Apr 15, 2026
@etcircle

Copy link
Copy Markdown
Contributor Author

Refreshed this PR on top of the current latest origin/main and expanded it to cover the second synthetic-session bug as well.

What changed in the refresh:

  • watch queue events now include session_key + routing metadata
  • watch notification injection now routes from queued event/session identity, not the currently active foreground message
  • synthetic completion notifications now resolve source the same way, so the correct chat_type is preserved
  • added targeted regression coverage for both the watch queue payload and completion routing through group/topic sessions

Why the extra bit matters:

  • the issue is not only “wrong topic selected at drain time”
  • there was also a synthetic-session reconstruction bug that could create impossible keys like agent:main:telegram:dm:-100...:1795
  • that lines up with the screenshot symptom where background-process/system traffic appeared to spill into the wrong topic/session bucket

Local verification on the refreshed branch:

  • python -m pytest tests/tools/test_watch_patterns.py -q
  • python -m pytest tests/gateway/test_background_process_notifications.py -q
  • python -m pytest tests/gateway/test_internal_event_bypass_pairing.py -q
  • python -m pytest tests/gateway/test_background_command.py -q
  • python -m pytest tests/gateway/test_session_env.py -q

All passed locally.

One explicit non-scope note: tests/gateway/test_run_progress_topics.py is currently flaky / baseline-drifty even on pristine latest origin/main, so I did not mix that unrelated mess into this PR.

kshitijk4poor added a commit that referenced this pull request Apr 15, 2026
- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
teknium1 pushed a commit that referenced this pull request Apr 15, 2026
- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
teknium1 pushed a commit that referenced this pull request Apr 15, 2026
- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #10460 (salvaged through #10446 by @kshitijk4poor). Your commits were cherry-picked onto current main with your authorship preserved in git log. Thanks @etcircle!

@teknium1 teknium1 closed this Apr 15, 2026
kagura-agent pushed a commit to kagura-agent/hermes-agent that referenced this pull request Apr 16, 2026
…rch#9537)

- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…rch#9537)

- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…rch#9537)

- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…rch#9537)

- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…rch#9537)

- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…rch#9537)

- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: background process watch notifications can leak into the wrong threaded session

2 participants