Make extending VirtualFileSystem thread-safe#20547
Merged
Mytherin merged 2 commits intoduckdb:v1.5-variegatafrom Jan 15, 2026
Merged
Make extending VirtualFileSystem thread-safe#20547Mytherin merged 2 commits intoduckdb:v1.5-variegatafrom
VirtualFileSystem thread-safe#20547Mytherin merged 2 commits intoduckdb:v1.5-variegatafrom
Conversation
…s copied when changed, allowing safe concurrent modifications to the file system
Collaborator
Author
|
CC @carlopi |
carlopi
reviewed
Jan 15, 2026
Contributor
carlopi
left a comment
There was a problem hiding this comment.
I think this is the way to go, thanks!
I was thinking, but for later, whether implementing autoloading of file systems can be done via mock pre-registered file-system, that will be then disabled once the actual one is loaded, unsure if that makes more sense / simplifies the code or just adds corner cases.
Collaborator
Author
|
I think that just complicates it. I would like to make file system overrides prefix-only so we can clean-up / simplify the code for selecting a file system, though. |
evertlammerts
added a commit
to duckdb/duckdb-python
that referenced
this pull request
Jan 16, 2026
Fixes related to: * duckdb/duckdb#20564 * duckdb/duckdb#20508 * duckdb/duckdb#20547
d-justen
pushed a commit
to d-justen/duckdb
that referenced
this pull request
Jan 19, 2026
Extensions that are loaded can modify the `VirtualFileSystem`, primarily
through registering additional file systems. For example, `httpfs`
registers the `HTTPFSFileSystem` and the `S3FileSystem`. Currently this
is not thread-safe, and problems can arise when one connection loads an
extension while another connection is using the file system. While this
is a rare scenario, it can trigger more frequently with extension
auto-loading, and should be fixed.
This PR fixes the issue by moving the `VirtualFileSystem` state into a
separate struct - the `FileSystemRegistry`:
```cpp
struct FileSystemHandle {
unique_ptr<FileSystem> file_system;
};
struct FileSystemRegistry {
vector<shared_ptr<FileSystemHandle>> sub_systems;
map<FileCompressionType, shared_ptr<FileSystemHandle>> compressed_fs;
const shared_ptr<FileSystemHandle> default_fs;
unordered_set<string> disabled_file_systems;
};
```
The `FileSystemRegistry` is read-only - i.e. after it is created, we
don't modify it anymore. Instead, when registering additional file
systems, we create a copy of the current registry and apply the
modification. For example:
```cpp
shared_ptr<FileSystemRegistry> FileSystemRegistry::RegisterSubSystem(unique_ptr<FileSystem> fs) const {
auto new_registry = make_shared_ptr<FileSystemRegistry>(*this);
new_registry->sub_systems.push_back(make_shared_ptr<FileSystemHandle>(std::move(fs)));
return new_registry;
}
```
We then use an atomic operation to overwrite the current file system
registry:
```cpp
void VirtualFileSystem::RegisterSubSystem(unique_ptr<FileSystem> fs) {
lock_guard<mutex> guard(registry_lock);
auto new_registry = file_system_registry->RegisterSubSystem(std::move(fs));
file_system_registry.atomic_store(new_registry);
}
```
On read, we start off by obtaining a shared reference to the *current*
registry using an atomic load. Since the registry is read-only, we can
then use that registry as much as we want.
```cpp
FileSystem &VirtualFileSystem::FindFileSystem(const string &path, optional_ptr<FileOpener> opener) {
return FindFileSystem(path, FileOpener::TryGetDatabase(opener));
auto registry = file_system_registry.atomic_load();
// ..
}
```
#### `UnregisterSubSystem`
The `UnregisterSubSystem` is removed in this PR. This does not appear to
be used anywhere - and removing (destroying) a file-system is
potentially unsafe in the current implementation. As such, we remove
this API call to prevent it from being used in the future.
This was referenced Jan 19, 2026
Mytherin
added a commit
that referenced
this pull request
Jan 20, 2026
…n callback registration APIs to enable this (#20599) Follow-up from #20547 There are various extension callbacks that extensions can register, e.g.: * `ParserExtension` * `StorageExtension` * `OperatorExtension` * `OptimizerExtension` * `ExtensionCallback` Currently registering these extensions is not done in a thread-safe manner. As a result, loading extensions that register these callbacks while concurrent activity is taking place in the database is not safe. This usually doesn't matter as extensions are generally loaded up-front, but long-term it would be better to just use thread-safe interfaces here. We use the same trick as we did in #20547 for making these registrations thread-safe while limiting the impact on reading as much as possible - i.e. the cost of making this thread safe is almost entirely pushed onto the registration (which is generally rarely done - i.e. we only register a few extensions in total). As a consequence of this, the API for registering these extensions is now different, e.g. instead of doing this: ```cpp config.storage_extensions["ducklake"] = make_uniq<DuckLakeStorageExtension>(); ``` We now need to do this: ```cpp StorageExtension::Register(config, "ducklake", make_shared_ptr<DuckLakeStorageExtension>()); ``` The same applies for the other types: ```cpp OptimizerExtension::Register(config, std::move(postgres_optimizer)); ExtensionCallback::Register(config, make_shared_ptr<PostgresExtensionCallback>()); ParserExtension::Register(config, QuackExtension()); ``` ### Transient Includes Removal Now that these extensions are no longer part of the DBConfig, their includes have also been removed from `config.hpp`. This uncovered the fact that we were transiently including a lot of stuff in the DBConfig through these extensions. Since the DBConfig is also included in a lot of places, this meant we were including a lot of files in a lot of places. I've opted to remove these includes to limit include propagation - but this does mean that a lot of files that previously depended on these transient includes now need to actually include what they are using. Common culprits based on fixing this seem to be: * `duckdb/planner/expression/bound_columnref_expression.hpp ` * `duckdb/parser/expression/columnref_expression.hpp` * `duckdb/planner/binder.hpp` * `duckdb/catalog/catalog_search_path.hpp`
Mytherin
added a commit
that referenced
this pull request
Jan 28, 2026
…`OpenerFileSystem` (#20694) This PR adds back the `UnregisterSubSystem ` method removed in #20547. This is made thread-safe by **only** unregistering, i.e. the file system is not actually deleted. The file system is kept around as we don't have any guarantees at this point about existing files still being open that use that file system. As such it only prevents new files from using the file system. CC @evertlammerts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extensions that are loaded can modify the
VirtualFileSystem, primarily through registering additional file systems. For example,httpfsregisters theHTTPFSFileSystemand theS3FileSystem. Currently this is not thread-safe, and problems can arise when one connection loads an extension while another connection is using the file system. While this is a rare scenario, it can trigger more frequently with extension auto-loading, and should be fixed.This PR fixes the issue by moving the
VirtualFileSystemstate into a separate struct - theFileSystemRegistry:The
FileSystemRegistryis read-only - i.e. after it is created, we don't modify it anymore. Instead, when registering additional file systems, we create a copy of the current registry and apply the modification. For example:We then use an atomic operation to overwrite the current file system registry:
On read, we start off by obtaining a shared reference to the current registry using an atomic load. Since the registry is read-only, we can then use that registry as much as we want.
UnregisterSubSystemThe
UnregisterSubSystemis removed in this PR. This does not appear to be used anywhere - and removing (destroying) a file-system is potentially unsafe in the current implementation. As such, we remove this API call to prevent it from being used in the future.