Skip to content

fix(server): update ROUTER identity on client reconnect#382

Merged
from2001 merged 7 commits intodevelopfrom
fix/update-router-identity-on-reconnect
Mar 18, 2026
Merged

fix(server): update ROUTER identity on client reconnect#382
from2001 merged 7 commits intodevelopfrom
fix/update-router-identity-on-reconnect

Conversation

@hacha
Copy link
Collaborator

@hacha hacha commented Mar 18, 2026

Summary

  • Fix stale ZMQ ROUTER identity causing RPC and NetworkVariable messages to be silently lost after client reconnect (e.g. sleep/wake cycle on Pico 4 UE)
  • When a client reconnects before the 5-second timeout, its DEALER socket gets a new identity, but the server was not updating the stored identity — sending control messages to the old, disconnected socket

Root Cause

In _handle_client_transform(), the else branch (existing client) updated transform_data, last_update, client_no, and is_stealth but not identity. Transform sync was unaffected because it uses PUB-SUB (topic-based), but RPC/NV uses ROUTER-DEALER (identity-based).

Fix

Server-side (server.py)

  1. Identity update: Always update stored ROUTER identity in the existing-client branch
  2. Reconnect detection: Compare old vs new identity to detect reconnections
  3. Targeted NV resync: On reconnect, unicast NV state only to the reconnecting client (not broadcast to entire room) via new _sync_network_variables_to_client() method
  4. ID mapping re-broadcast: Mark room_id_mapping_dirty on reconnect so the client gets the current device-to-clientNo table

Client-side (NetSyncManager.cs)

  1. State reset on resume: Reset _clientNo, _hasInvokedReady, _shouldCheckReady, _shouldSendHandshake, and NV initial sync flag in OnApplicationPause(false) before calling StartNetworking() — mirrors the same reset logic in connection error handler and room switch paths. Without this, stale state keeps IsReady == true while the new connection is being established.

Tests

  1. Reconnect regression test: New test_reconnect_identity.py verifying identity update and NV resync on reconnect with same device_id
  2. Windows test fixes: UTF-8 encoding for log file reading; relaxed monotonic time assertion for Windows timer resolution

Fixes #381

Test plan

  • black src/ tests/ && ruff check src/ tests/ && mypy src/ — all pass
  • pytest — 193 passed, 2 skipped
  • New reconnect integration test passes
  • Verify on device: connect → sleep → wake → confirm RPC/NV reception works
  • Verify repeated sleep/wake cycles do not degrade control message delivery

🤖 Generated with Claude Code

hacha and others added 4 commits March 18, 2026 19:01
…C/NV message loss

When a client reconnects before the server timeout (e.g. after a
sleep/wake cycle), its DEALER socket is recreated with a new ZMQ
identity. The server was not updating the stored identity in the
existing-client branch, causing all ROUTER-based control messages
(RPC, NetworkVariable sync, device ID mapping) to be sent to the
stale identity and silently lost.

Fixes #381

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a client reconnects (e.g. after sleep/wake) before the server
timeout removes it, the identity update alone is not sufficient —
the client may have missed NetworkVariable changes and device ID
mapping updates during sleep. Detect identity changes as reconnection
events and trigger NV full sync and ID mapping re-broadcast to the
reconnected client.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On OnApplicationPause(false), reset _clientNo, _hasInvokedReady,
_shouldCheckReady, _shouldSendHandshake, and the NV initial sync
flag before calling StartNetworking(). Without this, stale state
from the previous session keeps IsReady true while the new
connection is still being established, which can cause RPCs to be
sent prematurely and NV initial sync to be skipped.

This mirrors the same reset logic already used in the connection
error handler and room switch paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_logging_cli: read log file with explicit utf-8 encoding to
  avoid cp932 decode errors on Windows
- test_timing_monotonic: use >= instead of > for monotonic time
  comparison, as Windows timer resolution can return equal values
  for consecutive calls with short sleep intervals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses control-message loss after a client reconnects by ensuring the server updates the stored ZMQ ROUTER identity for an existing device ID, and adds reconnect-handling behavior to re-send state needed for RPC/NetworkVariables delivery.

Changes:

  • Server: detect reconnects (identity change), update stored identity, and trigger ID-mapping rebroadcast + NV sync on reconnect.
  • Unity client: reset handshake/ready/NV-initial-sync state on app resume before restarting networking.
  • Tests: relax monotonic timing assertion for Windows timer resolution and read log files with explicit UTF-8 encoding.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
STYLY-NetSync-Unity/Packages/com.styly.styly-netsync/Runtime/NetSyncManager.cs Resets client readiness/handshake/NV sync state on resume to avoid stale “ready” state across reconnects.
STYLY-NetSync-Server/src/styly_netsync/server.py Updates stored ROUTER identity on reconnect and triggers additional resync signals (ID mapping + NV).
STYLY-NetSync-Server/tests/test_timing_monotonic.py Adjusts monotonic-time test to allow non-decreasing readings (Windows resolution).
STYLY-NetSync-Server/tests/test_logging_cli.py Reads log file as UTF-8 for consistent parsing across platforms.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address PR #382 review comments:

1. Refactor NV sync to extract payload-building helpers
   (_build_global_var_sync_payload, _build_client_var_sync_payload)
   and add _sync_network_variables_to_client() for targeted unicast.
   On reconnect, only the reconnecting client receives the NV snapshot
   instead of broadcasting to the entire room.

2. Add test_reconnect_identity.py covering the reconnect scenario:
   same device_id with new DEALER identity, verifying identity update
   and NV resync delivery.

3. Rename test_monotonic_time_always_increases to
   test_monotonic_time_never_decreases with updated docstring/print
   to match the relaxed >= assertion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@from2001
Copy link
Collaborator

@codex Review this PR.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@from2001 from2001 merged commit dc6a55f into develop Mar 18, 2026
@from2001 from2001 deleted the fix/update-router-identity-on-reconnect branch March 18, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ROUTER identity not updated on client reconnect, causing RPC/NV messages to be lost

3 participants