Skip to content

Avoid relay registration retry for certain registration failures #619

@gpauloski

Description

@gpauloski

Describe the Request

Some relay registration failures are not recoverable from (forbidden and unauthorized) so we should stop the endpoint rather than sitting in the auto-reconnect loop.

Specifically, these two error codes can cause an exit.

except UnauthorizedError as e:
await websocket.close(
code=4001,
reason=f'{e.__class__.__name__}: {e}',
)
except ForbiddenError as e:
await websocket.close(
code=4002,
reason=f'{e.__class__.__name__}: {e}',
)

We can raise certain errors here.

except (
# Exceptions that we should wait and retry again for
ConnectionRefusedError,
asyncio.TimeoutError,
websockets.exceptions.ConnectionClosed,
) as e:
if not retry:
raise
logger.warning(
f'Registration with relay server at {self._address} '
f'failed because of {e}. Retrying connection in '
f'{backoff_seconds} seconds',
)
await asyncio.sleep(backoff_seconds)
backoff_seconds = min(backoff_seconds * 2, 60)

It would also be good to catch this kind of exception and log it nicer. (It does get logged because it happens in a background task so nothing crashes, but we can handle it better.)

  File "/home/coordinator/venv/lib/python3.12/site-packages/proxystore/utils/tasks.py", line 32, in _execute_and_log_traceback
    await coro(*args, **kwargs)
  File "/home/coordinator/venv/lib/python3.12/site-packages/proxystore/p2p/relay/client.py", line 226, in _reconnect_on_close
    await self.connect()
  File "/home/coordinator/venv/lib/python3.12/site-packages/proxystore/p2p/relay/client.py", line 279, in connect
    self._websocket = await self._register(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/home/coordinator/venv/lib/python3.12/site-packages/proxystore/p2p/relay/client.py", line 173, in _register
    websocket = await websockets.client.connect(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/coordinator/venv/lib/python3.12/site-packages/websockets/legacy/client.py", line 647, in __await_impl_timeout__
    return await self.__await_impl__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/coordinator/venv/lib/python3.12/site-packages/websockets/legacy/client.py", line 651, in __await_impl__
    _transport, _protocol = await self._create_connection()
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1978, in create_connection
socket.gaierror: [Errno -3] Temporary failure in name resolution

Sample Code

No response

Metadata

Metadata

Assignees

Labels

enhancementNew features or improvements to existing functionality

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions