-
Notifications
You must be signed in to change notification settings - Fork 16
Refactor distributed in-memory connectors #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
f131243 to
88155e4
Compare
- Rename some attributes of RPC/RPCResponse for consistency. - Update docstrings. - Add more logging messages. - Remove unused variables and code paths. - Support user defined chunk length. - Move serving logic outside of ZeroMQServer to its own function. - Switch back to asyncio for ZeroMQServer serving as it's easier to handle system signals with the event loop. - Adjusted some timeouts/polling intervals. - Use 127.0.0.1 instead of localhost for MacOS compatibility. - Updated/added tests for large message sizes, multiple connector instances for same local server, client/server side handling of server errors.
The connector fixtures used to return a ConnectorInfo type with the connector type, arguments, and a contextmanager. This is no longer needed so instead we just have the connectors fixtures return initialized connectors themselves.
The send_rpc methods were always using the address of the local server instead of the address associated with the key. I also factored out the key into a standard key type shared across all DIMs so we get typing info on RPC.key.
A lot of tests were not properly cleaning up resources, had weird side effect, or just caused hanging when run with non-mocked Margo and UCX. I fixed as many as I could, and just disabled some.
UCX tests are unreliable in CI because they will hang ~20% of the time so separating the tests lets us add a timeout to UCX and allow the overall run to pass even if UCX hangs.
12 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
breaking
Backwards incompatible change to public interfaces
development
CI workflows, PR/issue templates, repository configurations
internal
Refactoring, style changes, testing, or code optimizations
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR is a complete refactor of
proxystore.connectors.dimwith the aim of solving specific pain points with the previous designs and making the various implementations consistent with each other.Unfortunately, there are still a couple problems:
In the future, we may move
proxystore.connectors.diminto theproxystore-extensionspackage to isolate the slower/niche tests.Fixes
proxystore.connectors.dim#224Type of Change
Testing
Updated all of the unit tests. Some are heavily mocked, so integration tests have also been updated.
Pull Request Checklist
Please confirm the PR meets the following requirements.
pre-commit(e.g., black, mypy, ruff, etc.).