Skip to content

Conversation

@coryan
Copy link
Contributor

@coryan coryan commented Dec 15, 2021

Recursive listing without limit is about as expensive as only listing
the top-level directories in GCS. Therefore, it is just more efficient
to filter the results on the client-side, as this requires fewer request
than listing only only 0, 1, or N levels in a directory hierarchy.

I also improved the tests to verify no objects with similar prefixes are
returned, for example, when listing objects starting with 'aaa' we do
not want 'aaab', but we want 'aaa/b'.

Recursive listing without limit is about as expensive as only listing
the top-level directories in GCS. Therefore, it is just *more* efficient
to filter the results on the client-side, as this requires fewer request
than listing only only 0, 1, or N levels in a directory hierarchy.

I also improved the tests to verify no objects with similar prefixes are
returned, for example, when listing objects starting with 'aaa' we do
not want 'aaab', but we want 'aaa/b'.
@github-actions
Copy link

@coryan coryan marked this pull request as ready for review December 16, 2021 00:35
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Just two minor comments.

return result;
}

std::size_t Depth(arrow::util::string_view path) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but as per the coding guidelines (which are copied over from the Google C++ style guide :-)), we generally use explicit-width types such as int32_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

arrow::fs::AssertFileInfo(fs.get(), PreexistingBucketPath(), FileType::Directory);
arrow::fs::AssertFileInfo(fs.get(), PreexistingObjectPath(), FileType::File);
for (auto const& info : hierarchy.contents) {
if (!fs::internal::IsAncestorOf(hierarchy.base_dir, info.path())) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could actually check that this entry still exists instead of ignoring it.
(same comment below too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

arrow::fs::AssertFileInfo(fs.get(), PreexistingBucketPath(), FileType::Directory);
arrow::fs::AssertFileInfo(fs.get(), PreexistingObjectPath(), FileType::File);
for (auto const& info : hierarchy.contents) {
if (!fs::internal::IsAncestorOf(hierarchy.base_dir, info.path())) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do the same change as well here? (check the expected type)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, sorry about that.

@pitrou pitrou closed this in 61745be Dec 20, 2021
@ursabot
Copy link

ursabot commented Dec 20, 2021

Benchmark runs are scheduled for baseline = e7c2ead and contender = 61745be. 61745be is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.31% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@coryan coryan deleted the ARROW-15121-gcsfs-implement-max-recursion branch December 20, 2021 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants