Skip to content

fix: apply login credentials to client after Matrix password auth#5829

Open
teyrebaz33 wants to merge 5 commits into
NousResearch:mainfrom
teyrebaz33:fix/matrix-password-login-missing-device-id
Open

fix: apply login credentials to client after Matrix password auth#5829
teyrebaz33 wants to merge 5 commits into
NousResearch:mainfrom
teyrebaz33:fix/matrix-password-login-missing-device-id

Conversation

@teyrebaz33

Copy link
Copy Markdown
Contributor

Problem

Fixes #5819.

When using MATRIX_PASSWORD authentication, the bot connects and processes the initial sync, but silently ignores all subsequent messages with zero log output.

Root Cause

After a successful client.login(), the LoginResponse was being discarded. matrix-nio does not automatically apply the response to the client — access_token, device_id, and user_id must be set manually. Without these, subsequent sync() calls are effectively unauthenticated and deliver no new events.

The access-token path already handles this correctly via restore_login(). The password path was missing equivalent credential application.

Fix

Apply all three credentials from LoginResponse to the AsyncClient immediately after a successful password login:

if isinstance(resp, nio.LoginResponse):
    if getattr(resp, "user_id", ""):
        self._user_id = resp.user_id
        client.user_id = resp.user_id
    if getattr(resp, "device_id", ""):
        client.device_id = resp.device_id
    if getattr(resp, "access_token", ""):
        client.access_token = resp.access_token

Tests

Added TestMatrixPasswordAuth.test_connect_applies_credentials_from_login_response which verifies that all three fields from LoginResponse are propagated to the client after a successful password login.

@Schnurzel700

Copy link
Copy Markdown

Thanks for the PR! I have some additional information that might help confirm the fix:

Even when using a manually generated MATRIX_ACCESS_TOKEN, the issue persists. It’s not "silent" in the sense of zero output; rather, the bot seems to be stuck in a loop.

Every time I send a new message to the bot (even in a fresh, unencrypted room), the log immediately repeats the exact same decryption warnings from the initial sync:
Plaintext

    WARNING nio.crypto.log: Received a undecryptable Megolm event from a unknown device: @[USER]:matrix.org [DEVICE_ID]
    WARNING nio.crypto.log: Error decrypting megolm event, no session found with session id [SESSION_ID] for room ![ROOM]:matrix.org

The bot never responds. When I try to use a command like /sethome, the Matrix client (Element) just shows "Unknown command", indicating that the event is sent to the server, but the Hermes gateway isn't successfully "consuming" and processing it due to the loop.

The bot clearly "sees" that there is a new event on the server (hence the triggered log output), but it seems to "forget" its sync position or credentials, forcing it to re-play the old encrypted timeline instead of moving forward.

@teyrebaz33

Copy link
Copy Markdown
Contributor Author

Thanks for the additional context @Schnurzel700. The symptom you describe with access-token auth (decryption warnings replaying on each new message) points to a separate issue from what this PR fixes — when credentials are already applied correctly via restore_login(), the replay loop is caused by E2EE key management: the bot receives encrypted events it cannot decrypt (no session key), buffers them in _pending_megolm, and retries on every maintenance cycle. This is expected behavior for encrypted events with missing keys, not a credential problem.

This PR specifically fixes the password-login path where access_token, device_id, and user_id from LoginResponse were never applied to the client — making all subsequent syncs unauthenticated. That said, if your room is configured as unencrypted but Element is still sending encrypted events, that may be a separate E2EE negotiation issue outside this PR's scope.

teyrebaz33 added a commit to teyrebaz33/hermes-agent that referenced this pull request Apr 7, 2026
Comprehensive walkthrough covering:
- Three methods for finding bugs (usage, code reading, open issues)
- Verifying a real bug before writing code
- Tracing root causes through the full code path
- Opening well-structured issues
- Writing minimal, pattern-following fixes
- Writing regression tests
- Pre-PR checklist with exact commands
- Opening PRs with clear descriptions
- Handling CI failures and the salvage pattern

All examples drawn from real merged PRs (NousResearch#5080, NousResearch#5829).
@teyrebaz33 teyrebaz33 force-pushed the fix/matrix-password-login-missing-device-id branch 3 times, most recently from 3f67f76 to fe8e312 Compare April 8, 2026 15:22
…usResearch#5819)

When using MATRIX_PASSWORD, the LoginResponse from matrix-nio was being
discarded after a successful login. The client was left without an
access_token, user_id, and device_id, so all subsequent sync() calls
were effectively unauthenticated — silently delivering no new events.

Apply all three credentials from the LoginResponse to the AsyncClient
immediately after login, mirroring what the access-token path already
does via restore_login(). Also update self._user_id so session tracking
and deduplication use the server-canonical user ID.

Adds a regression test that asserts all three fields are propagated
from LoginResponse to the client after a successful password login.
…on restart

Without updating self._device_id from the LoginResponse, each gateway
restart passes None as ctor_device_id to AsyncClient, causing matrix-nio
to create a new device identity on every login and flooding the account
with stale sessions.

Reported in NousResearch#5819 follow-up by Schnurzel700.
…dd clock-skew tolerance

Two fixes for the silent Matrix gateway issue reported in NousResearch#5819:

1. receive_response() was missing from _sync_loop. matrix-nio's sync()
   only fetches data over HTTP — receive_response() is required to
   actually dispatch events to registered callbacks. Without this call,
   the bot syncs data but never fires _on_room_message or any other
   registered handler.

2. The startup grace period compared server-side UTC timestamps against
   local system time without clock-skew tolerance. Added a 60-second
   tolerance to prevent new messages from being incorrectly discarded
   as stale when local and server clocks drift.

Root cause identified and verified by Schnurzel700 in NousResearch#5819.
The initial sync call also requires receive_response() to trigger
registered callbacks, not just the sync loop. Without this, the bot
deadlocks/remains silent at startup even with the sync loop fix.

Identified from working implementation shared by Schnurzel700.
…ag and add dispatch diagnostic

- Replace clock-skew-prone timestamp grace period with _initial_sync_done flag
- Add debug log before handle_message dispatch to diagnose silent failures
- Approach based on working implementation from Schnurzel700 (NousResearch#5819)
@teyrebaz33 teyrebaz33 force-pushed the fix/matrix-password-login-missing-device-id branch from fe8e312 to 5c6855a Compare April 8, 2026 23:13
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround platform/matrix Matrix adapter (E2EE) comp/gateway Gateway runner, session dispatch, delivery labels Apr 30, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

This fix appears to be superseded by #7881 (merged), which resolved the same issue #5819. If the bug persists on current main, please rebase and confirm.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Superseded by #7881

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround platform/matrix Matrix adapter (E2EE) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bot connects and syncs old messages, but silently ignores all new messages (no log output)

3 participants