Skip to content

Conversation

@gpauloski
Copy link
Collaborator

Description

This PR is a complete refactor of proxystore.connectors.dim with the aim of solving specific pain points with the previous designs and making the various implementations consistent with each other.

Unfortunately, there are still a couple problems:

  1. UCX still hangs occasionally in the integration tests. We've put a lot of dev hours into this, but the UCX library seems to create many undocumented side-effects that lead to deadlocks.
  2. The Margo integration tests are very slow. Specifically, the first RPC to a server takes about 1.5 minutes, and the test suite starts two servers to the total runtime is ~3 minutes.

In the future, we may move proxystore.connectors.dim into the proxystore-extensions package to isolate the slower/niche tests.

Fixes

Type of Change

  • Breaking Change (fix or enhancement which changes existing semantics of the public interface)
  • Enhancement (new features or improvements to existing functionality)
  • Bug (fixes for a bug or issue)
  • Internal (refactoring, style changes, testing, optimizations)
  • Documentation update (changes to documentation or examples)
  • Package (dependencies, versions, package metadata)
  • Development (CI workflows, pre-commit, linters, templates)
  • Security (security related changes)

Testing

Updated all of the unit tests. Some are heavily mocked, so integration tests have also been updated.

Pull Request Checklist

Please confirm the PR meets the following requirements.

  • Tags added to PR (e.g., breaking, bug, enhancement, internal, documentation, package, development, security).
  • Code changes pass pre-commit (e.g., black, mypy, ruff, etc.).
  • Tests have been added to show the fix is effective or that the new feature works.
  • New and existing unit tests pass locally with the changes.
  • Docs have been updated and reviewed if relevant.

@gpauloski gpauloski added internal Refactoring, style changes, testing, or code optimizations development CI workflows, PR/issue templates, repository configurations breaking Backwards incompatible change to public interfaces labels May 9, 2023
@gpauloski gpauloski force-pushed the issue-224 branch 3 times, most recently from f131243 to 88155e4 Compare May 9, 2023 15:24
charliesabino and others added 16 commits May 9, 2023 10:25
- Rename some attributes of RPC/RPCResponse for consistency.
- Update docstrings.
- Add more logging messages.
- Remove unused variables and code paths.
- Support user defined chunk length.
- Move serving logic outside of ZeroMQServer to its own function.
- Switch back to asyncio for ZeroMQServer serving as it's easier to
  handle system signals with the event loop.
- Adjusted some timeouts/polling intervals.
- Use 127.0.0.1 instead of localhost for MacOS compatibility.
- Updated/added tests for large message sizes, multiple connector
  instances for same local server, client/server side handling of server
  errors.
The connector fixtures used to return a ConnectorInfo type with the
connector type, arguments, and a contextmanager. This is no longer
needed so instead we just have the connectors fixtures return
initialized connectors themselves.
The send_rpc methods were always using the address of the local server
instead of the address associated with the key. I also factored out the
key into a standard key type shared across all DIMs so we get typing
info on RPC.key.
A lot of tests were not properly cleaning up resources, had weird side
effect, or just caused hanging when run with non-mocked Margo and UCX.

I fixed as many as I could, and just disabled some.
UCX tests are unreliable in CI because they will hang ~20% of the time
so separating the tests lets us add a timeout to UCX and allow the
overall run to pass even if UCX hangs.
@gpauloski gpauloski merged commit 0147743 into main May 9, 2023
@gpauloski gpauloski deleted the issue-224 branch May 9, 2023 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Backwards incompatible change to public interfaces development CI workflows, PR/issue templates, repository configurations internal Refactoring, style changes, testing, or code optimizations

Projects

None yet

4 participants