[Object Spilling] Initial Iteration of S3 adapter.#11379
Merged
rkooo567 merged 13 commits intoray-project:masterfrom Oct 30, 2020
Merged
[Object Spilling] Initial Iteration of S3 adapter.#11379rkooo567 merged 13 commits intoray-project:masterfrom
rkooo567 merged 13 commits intoray-project:masterfrom
Conversation
ericl
reviewed
Oct 14, 2020
Contributor
|
How about we go with smart_open for now then? Seems like it can be the
generic purpose fallback even if other options then out to be faster for s3
in the future.
…On Wed, Oct 14, 2020, 11:22 PM SangBin Cho ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In python/ray/external_storage.py
<#11379 (comment)>:
> + the default boto3.resource('s3').Object(bucket_name, key).get()
+ put_config_override(dict): Configuration dict that will override
+ the default boto3.resource('s3').Object(bucket_name, key).put()
+ Raises:
+ RayError if it fails to setup a S3 client. For example, if boto3 is
+ not downloaded, it raises an RayError. It can also raise S3 related
+ exceptions.
+ """
+
+ def __init__(self,
+ bucket_name: str,
+ prefix: str = "ray_spilled_object_",
+ get_config_override: dict = None,
+ put_config_override: dict = None):
+ try:
+ import boto3
Ok. I read some doc/source code of both projects, and smart open seems to
be more mature and has more options (one great part is it supports all
major cloud providers). One drawback is that it doesn't really have asyncio
compatible APIs (but if we only rely on Process-level parallelization or
thread pool, it should be fine).
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#11379 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSUYLIIUEJVKXTR6WFLSK2ILJANCNFSM4SP2RV2A>
.
|
2 tasks
Contributor
Author
|
Finished the smart_open implementation. Here's the result. S3 is almost 600 times slower for fs Smart open Some possible root causes;
We need more performance analysis, but I won't handle them in this PR. That says, after this PR I will
|
rkooo567
commented
Oct 20, 2020
ericl
reviewed
Oct 21, 2020
ericl
reviewed
Oct 21, 2020
ericl
reviewed
Oct 21, 2020
ericl
reviewed
Oct 21, 2020
ericl
reviewed
Oct 21, 2020
Contributor
Author
|
Addressed all review. The next step;
|
ericl
approved these changes
Oct 29, 2020
Contributor
ericl
left a comment
There was a problem hiding this comment.
Looks good but I think it would be good to simplify the error handling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
This implements the initial (most primitive) prototype to support the S3 backend. It, unfortunately, is not properly working (extremely slow). There are 3 limitations in this PR, and I will have separate PRs to fix each of them.
This PR
Follow up
restore_objectsmethods. I will write a design doc before I implement this.Related issue number
#9850
Checks
scripts/format.shto lint the changes in this PR.