Create ObjectStore from URL and Options (#4047)#4200
Conversation
object_store/src/parse.rs
Outdated
There was a problem hiding this comment.
I opted to use from_env for consistency with delta-rs, and because I think it is what almost all users will expect.
There was a problem hiding this comment.
This is actually something we have been struggling a bit with, and I am unsure what the best solution is without introducing too much complexity.
Essentially we end up in a situation where any valid credentials configured in the environment could take precedence over a credentials passed via the storage options. Priority would be based on the order in which we check credentials during build.
To work around this I was contemplating several options, none of which really resonated so far. but they all boil down to having some sort of notion on what combinations of config options constitute a credential, then checking if the options contain a full credential and skipping parsing from env if they do.
Things get a bit messier if the options contain a partial credential that can be imputed via the environment, as especially in the azure case there are several permutations. Within delta-rs we are unfortunately in a situation right now, where this is affecting users.
This becomes even more relevant as we begin to introduce catalog integrations where different locations are associated with different credentials.
There was a problem hiding this comment.
Perhaps we just don't call from_env and leave downstreams to populate keys from the environment if they so wish?
There was a problem hiding this comment.
Sounds good!
I can iterate on a solution in delta-rs and if we find something feasible bring this upstream?
| /// Returns | ||
| /// - An [`ObjectStore`] of the corresponding type | ||
| /// - The [`Path`] into the [`ObjectStore`] of the addressed resource | ||
| pub fn parse_url(url: &Url) -> Result<(Box<dyn ObjectStore>, Path), super::Error> { |
There was a problem hiding this comment.
This signature is consistent with pyarrow from_uri.
It allows downstreams to determine whether they want the remaining path to be used to "namespace" the store, i.e. using PrefixStore (delta-rs), or just want the corresponding ObjectStore for the path (datafusion)
| /// - `s3://<bucket>/<path>` | ||
| /// - `s3a://<bucket>/<path>` | ||
| /// - `https://s3.<bucket>.amazonaws.com` | ||
| /// - `https://s3.<region>.amazonaws.com/<bucket>` |
|
@roeap and @chitralverma I think I am finally happy with this, PTAL |
roeap
left a comment
There was a problem hiding this comment.
very nice - looking forward to integrating this!
* Add parse_url function (#4047) * Clippy * Fix copypasta * Fix wasm32 build * More wasm fixes * Return remaining path * Don't use from_env
Which issue does this PR close?
Closes apache/arrow-rs-object-store#176
Rationale for this change
See ticket
What changes are included in this PR?
Adds
parse_urlandparse_url_optsAre there any user-facing changes?