-
Notifications
You must be signed in to change notification settings - Fork 16
Closed
Labels
enhancementNew features or improvements to existing functionalityNew features or improvements to existing functionality
Description
Describe the Request
Some relay registration failures are not recoverable from (forbidden and unauthorized) so we should stop the endpoint rather than sitting in the auto-reconnect loop.
Specifically, these two error codes can cause an exit.
proxystore/proxystore/p2p/relay/server.py
Lines 334 to 343 in 28f76dd
| except UnauthorizedError as e: | |
| await websocket.close( | |
| code=4001, | |
| reason=f'{e.__class__.__name__}: {e}', | |
| ) | |
| except ForbiddenError as e: | |
| await websocket.close( | |
| code=4002, | |
| reason=f'{e.__class__.__name__}: {e}', | |
| ) |
We can raise certain errors here.
proxystore/proxystore/p2p/relay/client.py
Lines 290 to 305 in 28f76dd
| except ( | |
| # Exceptions that we should wait and retry again for | |
| ConnectionRefusedError, | |
| asyncio.TimeoutError, | |
| websockets.exceptions.ConnectionClosed, | |
| ) as e: | |
| if not retry: | |
| raise | |
| logger.warning( | |
| f'Registration with relay server at {self._address} ' | |
| f'failed because of {e}. Retrying connection in ' | |
| f'{backoff_seconds} seconds', | |
| ) | |
| await asyncio.sleep(backoff_seconds) | |
| backoff_seconds = min(backoff_seconds * 2, 60) |
It would also be good to catch this kind of exception and log it nicer. (It does get logged because it happens in a background task so nothing crashes, but we can handle it better.)
File "/home/coordinator/venv/lib/python3.12/site-packages/proxystore/utils/tasks.py", line 32, in _execute_and_log_traceback
await coro(*args, **kwargs)
File "/home/coordinator/venv/lib/python3.12/site-packages/proxystore/p2p/relay/client.py", line 226, in _reconnect_on_close
await self.connect()
File "/home/coordinator/venv/lib/python3.12/site-packages/proxystore/p2p/relay/client.py", line 279, in connect
self._websocket = await self._register(
^^^^^^^^^^^^^^^^^^^^^
File "/home/coordinator/venv/lib/python3.12/site-packages/proxystore/p2p/relay/client.py", line 173, in _register
websocket = await websockets.client.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/coordinator/venv/lib/python3.12/site-packages/websockets/legacy/client.py", line 647, in __await_impl_timeout__
return await self.__await_impl__()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/coordinator/venv/lib/python3.12/site-packages/websockets/legacy/client.py", line 651, in __await_impl__
_transport, _protocol = await self._create_connection()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1978, in create_connection
socket.gaierror: [Errno -3] Temporary failure in name resolution
Sample Code
No response
Metadata
Metadata
Assignees
Labels
enhancementNew features or improvements to existing functionalityNew features or improvements to existing functionality