Describe the bug, including details regarding any error messages, version, and platform.
Pyarrow fs incorrectly resolves valid S3 URIs with a whitespace as a local path:
from pyarrow.fs import _resolve_filesystem_and_path, FileSystem
uri = "s3://bucket/prefix with space/a=a"
resolved_filesystem, resolved_path = _resolve_filesystem_and_path(uri, None)
resolved_filesystem
<pyarrow._fs.LocalFileSystem at 0x10316ff30>
This causes subsequent calls such as getting the file info to fail:
path_info = resolved_filesystem.get_file_info(resolved_path)
pyarrow.lib.ArrowInvalid: Expected a local filesystem path, got a URI...
A quick look into the method indicates that a LocalFilesytem is chosen by default and returned if alternative filesystems are not detected which seems like a dubious strategy...
I assume this is where the S3 filesystem should be detected but a URI containing a whitespace seems to throw an exception although it's valid:
filesystem, path = FileSystem.from_uri(uri)
Cannot parse URI: 's3://bucket/prefix with space/a=a/'
Component(s)
Python
Describe the bug, including details regarding any error messages, version, and platform.
Pyarrow fs incorrectly resolves valid S3 URIs with a whitespace as a local path:
This causes subsequent calls such as getting the file info to fail:
A quick look into the method indicates that a LocalFilesytem is chosen by default and returned if alternative filesystems are not detected which seems like a dubious strategy...
I assume this is where the S3 filesystem should be detected but a URI containing a whitespace seems to throw an exception although it's valid:
Component(s)
Python