Skip to content

Are the Azure paths inconsitent with other remote store paths? #721

@roeap

Description

@roeap

Description

for remote stores, we currently support Azure, AWS, and GCP which have the following uri schemes:

  • AWS: s3://<bucket>/path/to/table
  • GCP: gs://<bucket>/path/to/table
  • Azure adls2://<account>/<container>/path/to/table

The main source of difference is that - to the best of my knowledge - the concept of an account does not exists for s3/gs. Essentially buckets must be unique for a region, where containers must be unique per account. However regions also exist in azure. On the other hand, to root of an object store is bucket / container, and also from how urls / paths are constructed bucket and container are more or less the same. It seems others (see adlfs) felt like accounts are the appropriate lowest level in the path / uri where the account (much like region in S3) is configuration of the store.

Thus I propose to "drop" the account from our azure paths. While this is certainly a major breaking change, my hope is that users appreciate consistency with e.g. fsspec. Given that we aim to closely integrate with (py)arrow, it seems to me that this would be more consistent on that level as well.

From an implementation standpoint, we are already picking up the account from configuration, so the path segement is effectively unused.

As a side note - this would also be consistent in how object_store treats paths ...

cc @thovoll, @wjones127 @houqp

Use Case

have a nicer user facing API.

Related Issue(s)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions