Skip to content

Support WebAssembly UDFs#88747

Merged
vdimir merged 125 commits intomasterfrom
vdimir/wasm_udf
Feb 26, 2026
Merged

Support WebAssembly UDFs#88747
vdimir merged 125 commits intomasterfrom
vdimir/wasm_udf

Conversation

@vdimir
Copy link
Copy Markdown
Member

@vdimir vdimir commented Oct 17, 2025

Changelog category (leave one):

  • Experimental Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

  • Add experimental support for WebAssembly-based user-defined functions (UDFs), allowing custom function logic to be implemented in WebAssembly and executed within ClickHouse. Special thanks to Alexey Smirnov (@lioshik) for contributing the Wasmtime backend support.

Close #36892

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

TODOs:

@bacek
Copy link
Copy Markdown
Contributor

bacek commented Feb 24, 2026

@vdimir just curious, how much effort is to add WASI support? Probably not in this PR, but as follow up?

@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Feb 25, 2026

@vdimir just curious, how much effort would it take to add WASI support? Probably not in this PR, but maybe as a follow-up?

@bacek I did a bit of research on this earlier. Overall, it looks feasible and actually like a good thing to have, since it would allow us to use more tools/compilers for wasm UDFs, treating them more like regular programs.

From what I saw, wasmtime already provides some stub implementations. So in general, we would just need to decide which parts should have real implementations and which should remain restricted (e.g., we could implement guest–host interaction via stdin/stdout similar to executable UDFs, but forbid writing to arbitrary files).

That said, my research was done quite a while ago and was focused on WASI Preview 1, which is essentially a set of syscall-like functions. Now WASI Preview 2 introduces the component model, and I haven’t looked into it yet. My assumption is that it’s still about implementing a set of host-side functions with specific names and signatures, but I’d need to verify that.

So for now, I’d consider it a “good to have” rather than something urgent.

@vdimir vdimir added this pull request to the merge queue Feb 26, 2026
Merged via the queue into master with commit d28828e Feb 26, 2026
146 of 148 checks passed
@vdimir vdimir deleted the vdimir/wasm_udf branch February 26, 2026 09:56
@vdimir vdimir mentioned this pull request Feb 26, 2026
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 26, 2026
@darkleaf
Copy link
Copy Markdown
Contributor

Is there any plans to implement User defined aggregation function?

@vdimir vdimir mentioned this pull request Feb 26, 2026
@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Feb 26, 2026

Is there any plans to implement User defined aggregation function?

@darkleaf

It would definitely be a great feature to have, but there’s currently no roadmap or timeline for it. It might even be more useful than regular UDFs, however, it also requires more design work.

At the moment, ClickHouse does not support aggregate UDFs (the existing executable UDFs are aslo limited to regular functions). On the guest side, it would probably require implementing a set of methods (e.g., addRows / mergeStates / finalize.), which needs to be carefully considered as well.

@bacek
Copy link
Copy Markdown
Contributor

bacek commented Feb 26, 2026

@vdimir just curious, how much effort would it take to add WASI support? Probably not in this PR, but maybe as a follow-up?

@bacek I did a bit of research on this earlier. Overall, it looks feasible and actually like a good thing to have, since it would allow us to use more tools/compilers for wasm UDFs, treating them more like regular programs.

From what I saw, wasmtime already provides some stub implementations. So in general, we would just need to decide which parts should have real implementations and which should remain restricted (e.g., we could implement guest–host interaction via stdin/stdout similar to executable UDFs, but forbid writing to arbitrary files).

That said, my research was done quite a while ago and was focused on WASI Preview 1, which is essentially a set of syscall-like functions. Now WASI Preview 2 introduces the component model, and I haven’t looked into it yet. My assumption is that it’s still about implementing a set of host-side functions with specific names and signatures, but I’d need to verify that.

So for now, I’d consider it a “good to have” rather than something urgent.

BTW, I added wasi preview1 (I needed it for my pet project with libgeos wiring, because it pulls fd_* functions).

About 10 lines of code + run vendor.sh

diff --git a/rust/workspace/wasmtime/CMakeLists.txt b/rust/workspace/wasmtime/CMakeLists.txt
index 41417047c46..219c3dbc4bd 100644
--- a/rust/workspace/wasmtime/CMakeLists.txt
+++ b/rust/workspace/wasmtime/CMakeLists.txt
@@ -39,6 +39,8 @@ set(WASMTIME_FEATURE_GC_DRC OFF)
 set(WASMTIME_FEATURE_ASYNC OFF)
 set(WASMTIME_FEATURE_WINCH OFF)
 
+set(WASMTIME_FEATURE_WASI ON)
+
 set(WASMTIME_HEADER_DST ${WASMTIME_BINARY_DIR}/include)
 set(WASMTIME_HEADER_SRC ${WASMTIME_SOURCE_DIR}/crates/c-api/include)
 target_include_directories(_ch_wasmtime SYSTEM INTERFACE ${WASMTIME_HEADER_DST})
diff --git a/rust/workspace/wasmtime/Cargo.toml b/rust/workspace/wasmtime/Cargo.toml
index 4e0430c7edb..6dc53b2a36d 100644
--- a/rust/workspace/wasmtime/Cargo.toml
+++ b/rust/workspace/wasmtime/Cargo.toml
@@ -9,4 +9,4 @@ crate-type = ["staticlib"]
 # Dummy package to build wasmtime within the ClickHouse workspace and re-export its symbols.
 # Wasmtime provides C/C++ header files located, so we put it to contrib, and we use it from there.
 [dependencies]
-wasmtime-c-api = { path = "../../../contrib/wasmtime/crates/c-api" , package = "wasmtime-c-api-impl" , default-features = false , features = ["addr2line", "coredump", "cranelift", "gc", "gc-null", "debug-builtins", "demangle", "wat"] }
+wasmtime-c-api = { path = "../../../contrib/wasmtime/crates/c-api" , package = "wasmtime-c-api-impl" , default-features = false , features = ["addr2line", "coredump", "cranelift", "gc", "gc-null", "debug-builtins", "demangle", "wat", "wasi"] }
diff --git a/src/Interpreters/WebAssembly/WasmTimeRuntime.cpp b/src/Interpreters/WebAssembly/WasmTimeRuntime.cpp
index 7dc10d1f3ce..8012eed4ff4 100644
--- a/src/Interpreters/WebAssembly/WasmTimeRuntime.cpp
+++ b/src/Interpreters/WebAssembly/WasmTimeRuntime.cpp
@@ -19,6 +19,8 @@
 #include <Interpreters/WebAssembly/WasmTypes.h>
 
 #include <wasmtime.hh>
+#include <wasmtime/wasi.hh>
+
 
 namespace ProfileEvents
 {
@@ -369,6 +371,15 @@ public:
     std::unique_ptr<WasmCompartment> instantiate(Config cfg) const override
     {
         wasmtime::Store store(engine);
+        wasmtime::WasiConfig wasi;
+        wasi.inherit_argv();
+        wasi.inherit_stdin();
+        wasi.inherit_stdout();
+        wasi.inherit_stderr();
+
+        store.context().set_wasi(std::move(wasi)).unwrap();
+
+
         if (cfg.memory_limit)
             store.limiter(cfg.memory_limit, -1, -1, -1, -1);
         if (cfg.fuel_limit)
@@ -379,6 +390,8 @@ public:
         }
 
         wasmtime::Linker linker(engine);
+        linker.define_wasi().unwrap();
+
         for (const auto & host_function : host_functions)
         {
             const auto & func_decl = host_function.getFunctionDeclaration();

@bacek
Copy link
Copy Markdown
Contributor

bacek commented Feb 26, 2026

@vdimir just curious, how much effort would it take to add WASI support? Probably not in this PR, but maybe as a follow-up?
So for now, I’d consider it a “good to have” rather than something urgent.

BTW, I added wasi preview1 (I needed it for my pet project with libgeos wiring, because it pulls fd_* functions).

I also hacked getHostFunction to not choke on fd_close and co. But I think proper way is to store module name somewhere and skip irrelevant functions.

There is an Imports in my module for reference:

   0x241 | 03 65 6e 76 | import [func 0] Import { module: "env", name: "clickhouse_throw", ty: Func(1) }
   0x258 | 03 65 6e 76 | import [func 1] Import { module: "env", name: "clickhouse_log", ty: Func(1) }
   0x26d | 16 77 61 73 | import [func 2] Import { module: "wasi_snapshot_preview1", name: "fd_close", ty: Func(0) }

@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Feb 27, 2026

@bacek this looks promising. How does it work in general? What does the define_wasi implementation do by default - does it bypass system calls, trap, or act as a no-op?

It seems that when you use wasi.inherit_*, it accesses system resources, which is probably not what we really want.

How does your module handle files? Does it actually perform read/write operations on files or stdin/stdout (in which case the implementation should provide some kind of virtualized filesystem), or does it only rely on imports, meaning a no-op implementation would also work?

By the way, if it’s more convenient, feel free to reach out to me on telegram https://t.me/vdimir

@bacek
Copy link
Copy Markdown
Contributor

bacek commented Feb 28, 2026

@bacek this looks promising. How does it work in general? What does the define_wasi implementation do by default - does it bypass system calls, trap, or act as a no-op?

Technically it is independent implementation based on cap-std.

It seems that when you use wasi.inherit_*, it accesses system resources, which is probably not what we really want.

Definitely not. It's just direct copy from https://github.com/bytecodealliance/wasmtime/blob/main/examples/wasip1/main.cc

How does your module handle files? Does it actually perform read/write operations on files or stdin/stdout (in which case the implementation should provide some kind of virtualized filesystem), or does it only rely on imports, meaning a no-op implementation would also work?

I have no idea. It doesn't work from my c++ sample at all. Actual library I'm using is not FS dependent, but somehow clang manages to put fd_* functions into Imports section.

@bacek
Copy link
Copy Markdown
Contributor

bacek commented Mar 2, 2026

How does your module handle files? Does it actually perform read/write operations on files or stdin/stdout (in which case the implementation should provide some kind of virtualized filesystem), or does it only rely on imports, meaning a no-op implementation would also work?

I have no idea. It doesn't work from my c++ sample at all. Actual library I'm using is not FS dependent, but somehow clang manages to put fd_* functions into Imports section.

Actually, I've got it working. TL;DR: it's pretty rudimentary support for FS. I'm pretty sure even it's not even VFS with directories and stuff.

CH change is this:

+        if (!wasi.preopen_dir(
+                "/tmp/ch-udf/",
+                ".",
+                WASMTIME_WASI_DIR_PERMS_READ | WASMTIME_WASI_DIR_PERMS_WRITE,
+                WASMTIME_WASI_FILE_PERMS_READ | WASMTIME_WASI_FILE_PERMS_WRITE))
+        {
+            throw Exception(ErrorCodes::WASM_ERROR, "Can't preopen_dir");
+        }
+

Rust code is this:

#[clickhouse_udf]
pub fn test_fs(greet: String) -> anyhow::Result<String> {
    let mut file = fs::File::create("test.txt")?;
    for _ in 0..10 {
        writeln!(file, "{}", greet)?;
    }
    let contents = fs::read_to_string("test.txt")?;
    Ok(contents)
}

#[clickhouse_udf]
pub fn test_read(path: String) -> anyhow::Result<String> {
    let contents = fs::read_to_string(path)?;
    Ok(contents)
}

#[clickhouse_udf]
pub fn ls(path: String) -> anyhow::Result<Vec<String>> {
    let mut entries = Vec::new();
    for entry in fs::read_dir(&path)? {
        let entry = entry?;
        entries.push(entry.file_name().to_string_lossy().into_owned());
    }
    Ok(entries)
}

Directory content:

ls  /tmp/ch-udf/
aloha.txt

this is run inside CH after scuffolding:

:) select ls('');

...

   ┌─ls('')────────┐
1. │ ['aloha.txt'] │
   └───────────────┘

:) select test_read('aloha.txt');
...
   ┌─test_read('aloha.txt')─┐
1. │ Aloha                 ↴│
   └────────────────────────┘

1 row in set. Elapsed: 0.004 sec. 
:) select test_fs('ALOHA CLICKHOUSE');
...
   ┌─test_fs('ALOHA CLICKHOUSE')─┐
1. │ ALOHA CLICKHOUSE           ↴│
   │↳ALOHA CLICKHOUSE           ↴│
   │↳ALOHA CLICKHOUSE           ↴│
   │↳ALOHA CLICKHOUSE           ↴│
   │↳ALOHA CLICKHOUSE           ↴│
   │↳ALOHA CLICKHOUSE           ↴│
   │↳ALOHA CLICKHOUSE           ↴│
   │↳ALOHA CLICKHOUSE           ↴│
   │↳ALOHA CLICKHOUSE           ↴│
   │↳ALOHA CLICKHOUSE           ↴│
   └─────────────────────────────┘


after run:

$ ls  /tmp/ch-udf/
aloha.txt  test.txt
$ cat /tmp/ch-udf/test.txt 
ALOHA CLICKHOUSE
ALOHA CLICKHOUSE
ALOHA CLICKHOUSE
ALOHA CLICKHOUSE
ALOHA CLICKHOUSE
ALOHA CLICKHOUSE
ALOHA CLICKHOUSE
ALOHA CLICKHOUSE
ALOHA CLICKHOUSE
ALOHA CLICKHOUSE

Basically after cleaning up and deciding on stdin/stdout/stderr/env handling, providing configuration for an optional "preopen_dir" it should be reasonably straight forward to implement something landable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-experimental Experimental Feature pr-synced-to-cloud The PR is synced to the cloud repo submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support WASM plugins for UDF/TableFunctions/more

7 participants