Currently all our supported file systems in C++ (local, hdfs, s3, google cloud (gs/gcs), and soon azure (abfs)) are incorporated in the core libarrow library. For example, when enabled they are hardcoded in FileSystemFromUri.
There is a desire to be able to separate those filesystems in their own libraries, such that they can be installed separately. The remote filesystems each come with their own (potentially quite large) dependencies, and one typically doesn't need all of them at the same time.
More generally, it might also be nice that filesystems can be implemented externally for filesystems we wouldn't consider including in the main arrow project.
My understanding is that this would require:
- Some mechanism to "register" the filesystem to the core libarrow fs utilities. Especially the parsing from URI (i.e. when the user doesn't pass an already instantiated filesystem object) needs to know for each prefix to dispatch to which filesystem implementation.
- Build some or our own filesystems as separate libraries. I think this would mainly be the cloud filesystems (s3, google cloud, azure), while the local filesystem would always be included in core libarrow (+ some of the composite ones like subtree).
cc @pitrou @zeroshade