Skip to content

fix(mattermost): isolate thread sessions#12299

Open
imapotato123 wants to merge 1 commit into
NousResearch:mainfrom
imapotato123:fix/mattermost-threading
Open

fix(mattermost): isolate thread sessions#12299
imapotato123 wants to merge 1 commit into
NousResearch:mainfrom
imapotato123:fix/mattermost-threading

Conversation

@imapotato123

@imapotato123 imapotato123 commented Apr 18, 2026

Copy link
Copy Markdown

Summary

Refreshes the Mattermost threading fix onto current main and keeps the diff focused on the two thread/session behaviors still needed in gateway/platforms/mattermost.py:

  • Outbound thread root selection: when reply_mode=thread, prefer metadata["thread_id"] as the authoritative Mattermost root id. If metadata is absent, preserve upstream's _resolve_root_id(reply_to) fallback so reply ids are resolved to the actual thread root.
  • Per-thread session keying: when reply_mode=thread, top-level Mattermost posts now use their own post id as source.thread_id. Replies already carry root_id, so they join the same session. This prevents unrelated top-level channel threads from sharing one Hermes session/context.

When reply_mode=off, the existing flat channel behavior is preserved for top-level messages.

Supersedes #12283. Fixes #12290.

Notes

Upstream has since added _resolve_root_id(), so this refresh intentionally composes with upstream's resolver rather than replacing it.

Test plan

  • uv run --with pytest==9.0.2 --with pytest-asyncio==1.3.0 --with pytest-timeout==2.4.0 --with aiohttp==3.13.2 pytest tests/gateway/test_mattermost.py -q -o addopts=
    • 46 passed
  • python3 -m py_compile gateway/platforms/mattermost.py tests/gateway/test_mattermost.py
  • git diff --check origin/main..HEAD

@mxnstrexgl

Copy link
Copy Markdown

✓ Automated scan: no security concerns. Mattermost threading fix.

@imapotato123 imapotato123 force-pushed the fix/mattermost-threading branch from 877fe2c to 408e701 Compare April 21, 2026 18:24
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery labels Apr 21, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Supersedes #12283. Related to #12063, #12096, #6617 — same Mattermost threading code path.

@lipaysamart

Copy link
Copy Markdown

Are there any merger plans?

@imapotato123

Copy link
Copy Markdown
Author

Heads-up — the original PR included three fixes; the edit_message **kwargs hunk was independently shipped by upstream 6c0c6259 (in v2026.4.23) with a stricter *, finalize: bool = False form and a parametrized regression test across all 7 non-DingTalk adapters. A prior rebase of this branch onto upstream main already silently dropped the now-redundant hunk, so the diff currently reflects only the two genuinely-needed fixes (root_id resolution from metadata["thread_id"] and per-thread session keying for top-level messages). Updating the description to match the actual diff — both remaining fixes still apply, since upstream/main:gateway/platforms/mattermost.py still has the broken behavior at lines 273-274 and 655-656.

@imapotato123 imapotato123 force-pushed the fix/mattermost-threading branch from 408e701 to 5ffb964 Compare April 28, 2026 21:53
@imapotato123

Copy link
Copy Markdown
Author

Rebased onto current main (clean merge). Also cleaned up the commit message to accurately reflect the two fixes in the diff.

This is a small change (+19/−5) fixing two thread-handling bugs in the Mattermost platform — happy to address any feedback. Would appreciate a review when someone has a chance. 🙏

@imapotato123 imapotato123 force-pushed the fix/mattermost-threading branch from 5ffb964 to 95bae59 Compare April 29, 2026 23:26
@imapotato123

Copy link
Copy Markdown
Author

Rebased onto current main. This is a small 2-fix PR for Mattermost threading (root_id resolution + per-thread session keying). Would appreciate a review when you get a chance. cc @teknium1

@imapotato123

Copy link
Copy Markdown
Author

Rebased/refreshed this PR onto current main and narrowed the scope to the Mattermost thread/session bugs still missing upstream.

Changes in the refreshed branch:

  • Dropped the older root-id hunk that conflicted with upstream's _resolve_root_id() work.
  • Preserved upstream's _resolve_root_id(reply_to) fallback for normal replies.
  • Added support for metadata["thread_id"] as the authoritative Mattermost thread root for outbound deliveries where reply_to is absent or points at a reply.
  • In thread reply mode, top-level Mattermost posts now use their own post id as source.thread_id, so unrelated top-level channel threads do not share one Hermes session.
  • Added focused tests for metadata thread roots and top-level thread session isolation.

Validation:

  • PYTHONPATH=$PWD /Users/admin/.hermes/hermes-agent/venv/bin/python -m pytest tests/gateway/test_mattermost.py -q -o addopts=
  • Result: 46 passed

I also tried the broader tests/gateway/ suite locally, but that run exhausted the macOS open-file limit in unrelated async/voice/webhook tests. The focused Mattermost suite passes cleanly.

@imapotato123 imapotato123 changed the title fix(mattermost): complete threading support fix(mattermost): isolate thread sessions May 21, 2026
Use explicit Mattermost thread metadata when available while preserving root-id resolution for reply_to. Treat top-level posts as their own thread roots in thread mode so unrelated channel threads do not share a session.
@imapotato123 imapotato123 force-pushed the fix/mattermost-threading branch from 0a1feb9 to e54003a Compare May 24, 2026 01:34

Copy link
Copy Markdown
Author

Refreshed this PR onto current main (72ff3e909). The branch is still a single focused Mattermost threading/session commit.

Validation from the refreshed branch:

  • uv run --with pytest==9.0.2 --with pytest-asyncio==1.3.0 --with pytest-timeout==2.4.0 --with aiohttp==3.13.2 pytest tests/gateway/test_mattermost.py -q -o addopts=
    • 46 passed
  • python3 -m py_compile gateway/platforms/mattermost.py tests/gateway/test_mattermost.py
  • git diff --check origin/main..HEAD

@johnkattenhorn

Copy link
Copy Markdown

Independent confirmation from a real deployment

Just hit this bug on a self-hosted Mattermost install (LXC running Hermes off main at bc3f1f4, single-user EA setup) and can confirm this patch fixes it.

Repro before the patch (reply_mode=thread)

User posts top-level in ~hermes-home: bot replies, creating a thread. User replies inside that thread with a follow-up that requires context from turn 1 ("what was the number I asked you to remember?"). Bot loses context — runs session_search, finds nothing, replies "I don't know what you're asking about."

Inspection of ~/.hermes/sessions/sessions.json showed two distinct sessions for the same user in the same channel:

agent:main:mattermost:group:<channel>:7srqno17s7g5tg5fj6czzsc9ir
  origin.thread_id = null
  (top-level post — keyed by user_id because thread_id was None)

agent:main:mattermost:group:<channel>:55cwfjrtc7f13mhq7ca9z3uyqa
  origin.thread_id = "55cwfjrtc7f13mhq7ca9z3uyqa"
  (thread reply — keyed by Mattermost root_id)

Same user, same channel, but the second turn lands in a fresh session and the conversation is split. This is exactly the failure mode described in #18279.

Day-one workaround was MATTERMOST_REPLY_MODE=off (flat inline replies) — functional, but the thread experience is part of the point of using Mattermost.

After applying this PR

Patched gateway/platforms/mattermost.py with git apply against the diff in this PR, flipped MATTERMOST_REPLY_MODE back to thread, restarted the systemd gateway. Re-ran the same repro. Bot answered correctly with full context.

New sessions.json after the patched turn shows a single session, keyed by the top-level post id:

agent:main:mattermost:group:<channel>:nrgh4e8xwpd8indhf9811e591e
  origin.thread_id = "nrgh4e8xwpd8indhf9811e591e"
  (both top-level and thread reply share this session — keyed by post.id per the new branch in _handle_ws_event)

That matches the intent of the change exactly — thread_id = post.get("id") on top-level posts in thread mode means both turns share the session, and the Slack-style consistency note in the PR description rings true.

Test coverage

The three new tests in this PR cover the exact failure mode I hit:

  • test_top_level_post_in_thread_mode_uses_post_id_as_thread_id — top-level post side
  • test_send_thread_prefers_metadata_thread_id_over_reply_to — outbound side
  • test_send_thread_uses_metadata_thread_id_when_no_reply_to — covers the "cron post into a channel without an inbound reply_to" case (which would have bitten me on the scheduled-summary feature next)

Carrying this locally

For the record — vendored the diff as a local hot-fix on my install until this merges, and documented the carry / drop process in my repo's hermes-agent docs so it doesn't get stuck applied after upstream lands.

Happy to provide more traces / sessions.json fragments if useful. +1 to merging this — it's the smallest diff that fixes the bug, the tests directly cover the failure mode, and it's been validated on a real (if small) deployment.

johnkattenhorn added a commit to johnkattenhorn/hermes-agent that referenced this pull request Jun 4, 2026
Top-level Mattermost posts in reply_mode=thread were keyed by user_id
while thread replies were keyed by thread_id, so the same user in the same
channel landed in two separate sessions and the bot lost context across turns.
Prefer an explicit thread root from metadata on send, and key top-level posts
in thread mode by their own post id on receive.

Carries upstream PR NousResearch#12299 (still open). Runtime fix only; upstream PR test
hunks omitted (refactored test file diverged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(mattermost): edit_message rejects finalize kwarg after DingTalk AI Cards merge

5 participants