-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14345: [C++] Implement streaming reads #11436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ~GcsInputStream() override = default; | ||
|
|
||
| Status Close() override { | ||
| stream_.Close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to check if stream_.Close() is idempotent (not sure if this is a requirement but I believe all of the other interface implementations allow it). e.g. S3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting with v1.32.0 they are idempotent. Created ARROW-14385 to remind myself of updating any dependencies.
| } | ||
|
|
||
| Result<int64_t> Tell() const override { | ||
| if (!stream_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for this check here, and not in the other methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other methods an invalid stream would result in an error anyway. Happy to add the additional checks if you think they make the code easier to grok.
| std::string contents; | ||
| for (Result<std::shared_ptr<Buffer>> r = stream->Read(16); r.ok() && (*r)->size() != 0; | ||
| r = stream->Read(16)) { | ||
| auto buffer = *r; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is ASSERT_OK_AND_ASSIGN in Arrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did what you wanted, let me know if I missed the mark.
emkornfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just questions about approach. on somethings and checking for consistency with other file system APIs
|
PTAL |
|
+1 |
|
Benchmark runs are scheduled for baseline = 4ac62d5 and contender = a8e1c81. a8e1c81 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
Benchmark runs are scheduled for baseline = 4ac62d5 and contender = a8e1c81. a8e1c81 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
| if (!stream_.status().ok()) { | ||
| return internal::ToArrowStatus(stream_.status()); | ||
| } | ||
| return arrow::SliceMutableBufferSafe(std::move(buffer), 0, stream_.gcount()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This is merely slicing the output buffer, but not resizing it. If the actual size is much smaller than nbytes, then there is a significant memory wastage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. On the other hand, a short read from a large buffer could require a lot of copying. I am not sure which is more likely to be a problem. I am happy to make a change if you have some data or intuition as to which one is worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GCS return short reads except for the EOF case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all practical purposes, only the EOF case. The exception would be some kind of transient error that is stubborn enough to exhaust the retry policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. In any case, other implementations (such as S3) do resize the buffer, and there doesn't seem to be a reason to act differently here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created https://issues.apache.org/jira/browse/ARROW-14559 to track that change.
|
I just posted a belated comment on this PR. Sorry for being so late :-) |
No description provided.