Skip to content

[C++] Allow filesystems to be implemented in separate library (+ move remote filesystems out of libarrow into their own shared libraries) #38309

@jorisvandenbossche

Description

@jorisvandenbossche

Currently all our supported file systems in C++ (local, hdfs, s3, google cloud (gs/gcs), and soon azure (abfs)) are incorporated in the core libarrow library. For example, when enabled they are hardcoded in FileSystemFromUri.

There is a desire to be able to separate those filesystems in their own libraries, such that they can be installed separately. The remote filesystems each come with their own (potentially quite large) dependencies, and one typically doesn't need all of them at the same time.
More generally, it might also be nice that filesystems can be implemented externally for filesystems we wouldn't consider including in the main arrow project.

My understanding is that this would require:

  • Some mechanism to "register" the filesystem to the core libarrow fs utilities. Especially the parsing from URI (i.e. when the user doesn't pass an already instantiated filesystem object) needs to know for each prefix to dispatch to which filesystem implementation.
  • Build some or our own filesystems as separate libraries. I think this would mainly be the cloud filesystems (s3, google cloud, azure), while the local filesystem would always be included in core libarrow (+ some of the composite ones like subtree).

cc @pitrou @zeroshade

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions