Skip to content

Avoid client remove by timeout#274

Merged
from2001 merged 3 commits intodevelopfrom
from2001/avoid-remove-by-timeout
Jan 12, 2026
Merged

Avoid client remove by timeout#274
from2001 merged 3 commits intodevelopfrom
from2001/avoid-remove-by-timeout

Conversation

@from2001
Copy link
Collaborator

This PR is to avoid following timeout removals.

16:01:49 | INFO     | Client 3425365f... (client number: 270) removed (timeout)
16:01:49 | INFO     | Client 9b9ac76c... (client number: 277) removed (timeout)
16:01:49 | INFO     | Client d1ff092b... (client number: 289) removed (timeout)
16:01:49 | INFO     | Client 5a806d4a... (client number: 275) removed (timeout)
16:01:49 | INFO     | Client 3ad0dac1... (client number: 290) removed (timeout)
16:01:49 | INFO     | Client 384e8f86... (client number: 291) removed (timeout)
16:01:49 | INFO     | Client 5c117e77... (client number: 280) removed (timeout)

This pull request introduces several improvements to the server's ID mapping broadcast mechanism and increases the robustness and throughput of ZeroMQ socket communication. The main focus is on debouncing (rate-limiting) ID mapping broadcasts to reduce redundant network traffic and on tuning socket options for higher message buffering and throughput.

ID Mapping Broadcast Improvements:

  • Added a debouncing mechanism for ID mapping broadcasts to rooms, reducing redundant updates by only broadcasting after a configurable interval (ID_MAPPING_DEBOUNCE_INTERVAL). This includes tracking dirty rooms and last broadcast times, and flushing eligible broadcasts in the periodic loop. [1] [2] [3] [4] [5] [6]
  • Updated client removal and new client join logic to mark rooms as "dirty" for debounced broadcasting instead of immediately broadcasting ID mappings. [1] [2]
  • Ensured cleanup of debounce tracking data when rooms are deleted.

ZeroMQ Socket Configuration Enhancements:

  • Increased the send and receive high-water mark (HWM) for PUB, DEALER, ROUTER, and SUB sockets to 10,000 to support higher message throughput and reduce risk of message drops under load. [1] [2] [3] [4]

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request aims to reduce client timeout removals by implementing a debouncing mechanism for ID mapping broadcasts and increasing ZeroMQ socket buffer sizes. The debouncing approach rate-limits ID mapping broadcasts to occur at most once every 500ms per room, which should reduce network traffic and prevent timeout issues caused by excessive broadcast activity.

Changes:

  • Added debouncing mechanism for ID mapping broadcasts with a 500ms interval
  • Increased ZeroMQ socket high-water marks (HWM) to 10,000 for improved message buffering
  • Modified client join and removal logic to mark rooms as dirty for debounced broadcasting instead of immediate broadcasts

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
STYLY-NetSync-Server/src/styly_netsync/server.py Implements debouncing for ID mapping broadcasts, adds tracking dictionaries, increases socket HWM values, and integrates debounced flush into periodic loop
STYLY-NetSync-Server/src/styly_netsync/client_simulator.py Increases socket HWM values to match server configuration for better throughput

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@claude
Copy link
Contributor

claude bot commented Jan 12, 2026

Code Review

Found 1 issue that requires attention:

Race Condition: Inconsistent Lock Usage for Debounce Dictionaries

Location: STYLY-NetSync-Server/src/styly_netsync/server.py at line 931

Issue:

The new room_id_mapping_dirty and room_last_id_mapping_broadcast dictionaries are accessed from multiple threads without consistent synchronization:

Inconsistent Access Patterns:

  1. Line 931 (ReceiveThread): Sets room_id_mapping_dirty[room_id] = True OUTSIDE the lock
  2. Lines 1299-1310 (PeriodicThread): Iterates and modifies both dictionaries WITHOUT holding any lock
  3. Lines 1489-1490 (PeriodicThread): Sets the value INSIDE _rooms_lock
  4. Lines 1514-1517 (PeriodicThread): Deletes entries INSIDE _rooms_lock

Comparison with Existing Pattern:

The existing room_dirty_flags dictionary is consistently accessed within _rooms_lock (see lines 906, 927, 1407, 1426, 1487), which is the correct pattern.

Reference:

self.room_dirty_flags[room_id] = True # Mark room as dirty
# Cache the binary data if available
if raw_payload:
# Store by client number for efficient broadcast
self.client_binary_cache[client_no] = raw_payload

Potential Issues:

  • Missed ID mapping broadcasts if a dirty flag set by ReceiveThread is not visible to the flush
  • KeyError exceptions if a room is deleted during cleanup while flush is iterating
  • Race conditions during the compound read-modify operations in _flush_debounced_id_mapping_broadcasts

Recommended Fix:

  1. Move line 931 inside the existing _rooms_lock block (move it before line 927), OR
  2. Wrap the dictionary operations in _flush_debounced_id_mapping_broadcasts with with self._rooms_lock:
  3. Ensure all modifications to these dictionaries consistently use the lock

Checked for bugs and CLAUDE.md compliance.

@from2001
Copy link
Collaborator Author

@copilot Review the latest code.

Copy link
Contributor

Copilot AI commented Jan 12, 2026

@from2001 I've opened a new pull request, #276, to work on those changes. Once the pull request is ready, I'll request review from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants