Skip to content

Conversation

@emkornfield
Copy link
Contributor

This is mostly adapted from s3. One place that it differs
is it doesn't allow for username and password. The only thing
that I could see supporting in the future is taking username as
user to impersonate, but we can see if there is demand for that feature.

I also added in two useful features in s3:

  • Option for setting a default location to create a bucket in.
  • Default key-value metadata to use for object creation if none
    is specified.

Also fixed a typo in s3fs error message.

This is mostly adapted from s3. One place that it differs
is it doesn't allow for username and password. The only thing
that I could see supporting in the future is taking username as
user to impersonate, but we can see if there is demand for that feature.

I also added in two useful features in s3:
- Option for setting a default location to create a bucket in.
- Default key-value metadata to use for object creation if none
  is specified.

Also fixed a typo in s3fs error message.
@emkornfield
Copy link
Contributor Author

CC @coryan

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the fixes.

@emkornfield emkornfield requested a review from pitrou February 22, 2022 03:39
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emkornfield . A couple minor comments and questions.

}
return std::make_shared<LocalFileSystem>(options, io_context);
}
if (scheme == "gs") {
Copy link
Member

@pitrou pitrou Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should we also accept "gcs" for compatibility with https://gcsfs.readthedocs.io/en/latest/#integration ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

"https"};
"https",
/*default_bucket_location=*/{},
/*default_metadata=*/nullptr};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of spelling all default parameter values explicitly, it would probably more readable and maintainable to set the non-default attributes individually e.g.;

GcsOptions options{};
options.credentials = std::make_shared<GcsCredentials>(google::cloud::MakeInsecureCredentials());
options.schema = "http";
return options;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


std::string endpoint_override;
std::string scheme;
// \brief Location to use for creating buckets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of value can a location have? Is it a "region" like in S3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are regions. I went back and forth on whether to call this location or region, but chose location because it corresponded with the API. I don't have a strong preference. @coryan any thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are locations that are not regions:

https://cloud.google.com/storage/docs/locations#location-mr

Probably not of much interest for analytics workloads, but who knows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always thought of multi-regions as magic regions instead of a different concept. Unfortunately, people do use multi-regions for analytics workloads and experience the associated pain points.

emkornfield and others added 4 commits February 22, 2022 09:55
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
This reverts commit 30bd844.
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks for the update @emkornfield

@pitrou pitrou closed this in 8244568 Feb 23, 2022
@ursabot
Copy link

ursabot commented Feb 23, 2022

Benchmark runs are scheduled for baseline = 53e1296 and contender = 8244568. 8244568 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.42% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.68% ⬆️0.34%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@grisaitis
Copy link

Is any additional work required to make this accessible from pyarrow / pyarrow.fs.GcsFileSystem? Would love to start using this from pyarrow.

@emkornfield
Copy link
Contributor Author

This change hadn't been released. I think with this change datsets should work. I'll hopefully have a PR up this week for pyarrow wrappers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants