Skip to content

Create ObjectStore from URL and Options (#4047)#4200

Merged
tustvold merged 7 commits intoapache:masterfrom
tustvold:parse-url
May 12, 2023
Merged

Create ObjectStore from URL and Options (#4047)#4200
tustvold merged 7 commits intoapache:masterfrom
tustvold:parse-url

Conversation

@tustvold
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes apache/arrow-rs-object-store#176

Rationale for this change

See ticket

What changes are included in this PR?

Adds parse_url and parse_url_opts

Are there any user-facing changes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to use from_env for consistency with delta-rs, and because I think it is what almost all users will expect.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually something we have been struggling a bit with, and I am unsure what the best solution is without introducing too much complexity.

Essentially we end up in a situation where any valid credentials configured in the environment could take precedence over a credentials passed via the storage options. Priority would be based on the order in which we check credentials during build.

To work around this I was contemplating several options, none of which really resonated so far. but they all boil down to having some sort of notion on what combinations of config options constitute a credential, then checking if the options contain a full credential and skipping parsing from env if they do.

Things get a bit messier if the options contain a partial credential that can be imputed via the environment, as especially in the azure case there are several permutations. Within delta-rs we are unfortunately in a situation right now, where this is affecting users.

This becomes even more relevant as we begin to introduce catalog integrations where different locations are associated with different credentials.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we just don't call from_env and leave downstreams to populate keys from the environment if they so wish?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

I can iterate on a solution in delta-rs and if we find something feasible bring this upstream?

@tustvold tustvold marked this pull request as draft May 11, 2023 14:08
/// Returns
/// - An [`ObjectStore`] of the corresponding type
/// - The [`Path`] into the [`ObjectStore`] of the addressed resource
pub fn parse_url(url: &Url) -> Result<(Box<dyn ObjectStore>, Path), super::Error> {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature is consistent with pyarrow from_uri.

It allows downstreams to determine whether they want the remaining path to be used to "namespace" the store, i.e. using PrefixStore (delta-rs), or just want the corresponding ObjectStore for the path (datafusion)

@tustvold tustvold marked this pull request as ready for review May 11, 2023 16:27
/// - `s3://<bucket>/<path>`
/// - `s3a://<bucket>/<path>`
/// - `https://s3.<bucket>.amazonaws.com`
/// - `https://s3.<region>.amazonaws.com/<bucket>`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a holdover from #4082

@tustvold
Copy link
Copy Markdown
Contributor Author

@roeap and @chitralverma I think I am finally happy with this, PTAL

@tustvold tustvold changed the title Add parse_url function (#4047) Create ObjectStore from URL and Options (#4047) May 11, 2023
Copy link
Copy Markdown
Contributor

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice - looking forward to integrating this!

@tustvold tustvold merged commit e1e1c79 into apache:master May 12, 2023
alamb pushed a commit to alamb/arrow-rs that referenced this pull request Mar 20, 2025
* Add parse_url function (#4047)

* Clippy

* Fix copypasta

* Fix wasm32 build

* More wasm fixes

* Return remaining path

* Don't use from_env
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

object_store: Instantiate object store from provided url with store options

2 participants