-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-38309: [C++] build filesystems as separate modules #39067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Ok, so I think we should provide the building blocks for lazy loading of filesystem libraries when a given URI scheme is requested. Something like: // filesystem.{h,cc}
using FileSystemFactory = std::function<
Result<std::shared_ptr<FileSystem>>(
const Uri& uri, const io::IOContext& io_context, std::string* out_path)>;
using FileSystemLoader = std::function<Status(const std::string& scheme)>;
Status RegisterFileSystemFactory(std::vector<std::string> schemes,
FileSystemFactory factory);
Status RegisterFileSystemLoader(std::vector<std::string> schemes,
FileSystemLoader loader);
Result<FileSystemLoader> MakeSharedLibraryFileSystemLoader(
std::string library_path, std::string init_function) {
using InitFuncType = Status(*)();
auto loader = [=](const std::string& scheme) -> Status {
ARROW_ASSIGN_OR_RAISE(void* ptr, SharedLibrary::LookupSymbol(library_path, init_symbol));
return reinterpret_cast<InitFuncType>(ptr)();
};
}
Status RegisterSharedLibraryFileSystemLoader(
std::vector<std::string> schemes,
std::string library_path, std::string init_function) {
ARROW_ASSIGN_OR_RAISE(auto loader,
MakeSharedLibraryFileSystemLoader(std::move(library_path), std::move(init_function)));
return RegisterFileSystemLoader(std::move(schemes), std::move(loader));
}
// io_util.h
class SharedLibrary {
public:
static Result<SharedLibrary> Open(const std::string& path);
Result<void*> LookupSymbol(const std::string& symbol);
static Result<void*> LookupSymbol(const std::string& path, const std::string& symbol) {
ARROW_ASSIGN_OR_RAISE(auto lib, Open(path));
return lib.LookupSymbol(symbol);
}
};And then you can use it such as: // libarrow_s3.so
Status RegisterS3FileSystem() {
return RegisterFileSystemFactory({"s3"}, &MakeS3FileSystem);
}
// some init code somewhere
Status InitLoadableFileSystems() {
RETURN_NOT_OK(RegisterSharedLibraryFileSystemLoader(
{"s3"}, "libarrow_s3.so", "RegisterS3FileSystem");
// other filesystems here...
return Status::OK();
}
|
2cca700 to
3e0ebd2
Compare
3e0ebd2 to
fb82438
Compare
I don't think this presents much improvement over using direct factories. The main advantage I see with a FileSystemLoader approach is that initialization code can be run as part of loading the filesystem, including calls to However
|
fb82438 to
8bff100
Compare
This depends on how |
I agree, but again: customization will not be resident in libarrow.so, and will require instructions like "ensure X before calling FileSystemFromUri". That being the case, I don't think any instructions could be simpler than "ensure the library is loaded before calling FileSystemFromUri". In the specific case of a python package which renames or moves |
|
In light of the lack of an obvious approach, I think I'll defer support for autoloading for the moment. The main feature of interest is the registry anyway, and adding autoloading will not be more difficult after the registry is defined. |
|
It makes sense. |
8bff100 to
abaed2e
Compare
abaed2e to
f95dc66
Compare
094f3a7 to
4925905
Compare
|
@github-actions crossbow submit -g cpp -g r -g python -g wheel |
|
Revision: 4925905ea42d9571f65237603b0acc7547c9730f Submitted crossbow builds: ursacomputing/crossbow @ actions-348ec81e98 |
|
The |
|
The failure is in the orc external project. Since orc doesn't depend on arrow I think this must be spurious, and the retry has succeeded. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting to look very neat. The remaining comments are minor. Thank you!
4925905 to
c5d03eb
Compare
c5d03eb to
74e0605
Compare
74e0605 to
2987f7c
Compare
### Rationale for this change Failure to rebase and build when merging #39067 (which renamed `internal::Uri` -> `util::Uri`) led to a merge conflict since #40325 added more usages of `internal::Uri` ### What changes are included in this PR? Rename internal::Uri -> util::Uri ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: #40562 Authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
|
Seems this caused some nightly failures in the minimal C++ builds, see |
#40578) ### Rationale for this change When linking statically, pkg-config doesn't pick up the new dependency on libdl introduced by #39067 This produces [unresolved symbol errors](#39067 (comment)) ### What changes are included in this PR? Addition of `-ldl` to `ARROW_PC_LIBS_PRIVATE` to ensure linkage to the necessary library ### Are these changes tested? yes ### Are there any user-facing changes? no * GitHub Issue: #40577 Authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit b235f83. There was 1 benchmark result with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them. |
### Rationale for this change Module library `arrow_filesystem_example` is introduced in #39067 for filesystem testing: https://github.com/apache/arrow/blob/6cecbab5172b2b339277dde741bfff455646eb32/cpp/src/arrow/testing/CMakeLists.txt#L25 However when built with TSAN, linker flags such as `-fsanitize=thread` is not set, causing the link error in #40863. ### What changes are included in this PR? Add necessary linker flags for module library. ### Are these changes tested? Manually tested. ### Are there any user-facing changes? None. * GitHub Issue: #40863 Authored-by: Ruoxi Sun <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…40864) ### Rationale for this change Module library `arrow_filesystem_example` is introduced in apache#39067 for filesystem testing: https://github.com/apache/arrow/blob/6cecbab5172b2b339277dde741bfff455646eb32/cpp/src/arrow/testing/CMakeLists.txt#L25 However when built with TSAN, linker flags such as `-fsanitize=thread` is not set, causing the link error in apache#40863. ### What changes are included in this PR? Add necessary linker flags for module library. ### Are these changes tested? Manually tested. ### Are there any user-facing changes? None. * GitHub Issue: apache#40863 Authored-by: Ruoxi Sun <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
Each filesystem implementation carries unique and potentially heavy dependencies, so it'd be useful to build them separately. Furthermore, one typically doesn't need all of them at the same time and building separate modules would allow them to be dynamically loaded as necessary. Finally, defining this interface allows custom filesystem implementations to be supported seamlessly.
What changes are included in this PR?
An initial sketch of a registry, with documentation as if the registry were complete to illustrate intended usage.
Are these changes tested?
A toy module is added and a single unit test too.
Are there any user-facing changes?
Users would be able to add their own filesystem implementations to the registry