Skip to content

[Python] FileSystem.from_uri doesn't decode %-encoded characters in path #33598

@asfimport

Description

@asfimport

When attempting to create a new filesystem object from a public dataset in S3, where there is a space in the bucket name, an error is raised.

 

Here's a minimal reproducer:

from pyarrow.fs import FileSystem
result = FileSystem.from_uri("s3://nyc-tlc/trip data/fhvhv_tripdata_2022-06.parquet") 

which fails with the following traceback:

 

Traceback (most recent call last):
  File "/Users/james/projects/dask/dask/test.py", line 3, in <module>
    result = FileSystem.from_uri("s3://nyc-tlc/trip data/fhvhv_tripdata_2022-06.parquet")
  File "pyarrow/_fs.pyx", line 470, in pyarrow._fs.FileSystem.from_uri
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Cannot parse URI: 's3://nyc-tlc/trip data/fhvhv_tripdata_2022-06.parquet'

 

Note that things work if I use a different dataset that doesn't have a space in the URI, or if I replace the portion of the URI that has a space with a \* wildcard

 

from pyarrow.fs import FileSystem
result = FileSystem.from_uri("s3://ursa-labs-taxi-data/2009/01/data.parquet") # works
 result = FileSystem.from_uri("s3://nyc-tlc/*/fhvhv_tripdata_2022-06.parquet") # works

 

The wildcard isn't necessarily equivalent to the original failing URI, but I think highlights that the space is somehow problematic.

Environment: - OS: macOS

PRs and other links:

Note: This issue was originally created as ARROW-18436. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions