Add optional WASM feature to the native library, allowing it to run wasm-compiled parsers via wasmtime#1864
Conversation
951b5ce to
4bb82ce
Compare
60b4f81 to
51720be
Compare
Replace non-mutating `ts_parser_wasm_store` function with `ts_parser_take_wasm_store`, which removes and returns the wasm store, in order to facilitate single ownership.
I don't exactly understand why the Rust dependency on |
|
Okay it seems I completely missed the point. Depending on the C API from Rust allows the tree-sitter C lib to be compiled and linked easily without having to manually vendor wasmtime. I created a PR on wasmtime here: bytecodealliance/wasmtime#6765. |
95369a4 to
6fd7a1e
Compare
241de09 to
13dd76e
Compare
|
Ok, this experimental feature is complete enough for people to try using in downstream applications. There will probably be bugs, but the feature basically works. |
|
Thanks a lot for the work! We will be happy to test it in Lapce (hopefully soon) |
|
Huge step towards a truly platform-independent parser ecosystem. Out of interest (and I apologize if it's obvious), how does this relate to #949? |
Yeah, that's a good question. That same limitation applies when using wasm-compiled parsers in this mode: there is a fixed (small) subset of the C and C++ standard libraries available that is compiled into the library, and any external scanners that rely on symbols that aren't in this subset will not work. There may be some way to change how parsers are compiled to wasm, such that they can include code that they need from the standard library, but I don't know how to do that with Emscripten (or Clang) thus far. I feel like this design, though somewhat limiting, mostly works in practice. But as part of stabilizing this feature, it would probably be good to add some tooling around detecting when external scanners use functions that are unavailable in a wasm context, and emitting warnings. |
|
Ah, pity. That is a dealbreaker for Neovim (as the Markdown parser is now required infrastructure, and many parsers in nvim-treesitter have a scanner.c). But still, great feature! |
|
Also, some big limitations to call out:
|
Yeah, I noticed ;) Lots of stabbing in the dark for me... |
Just to be clear, parsers with a |
This would be fantastic. We've tried to do so in Pulsar by adding a runtime check, but that requires a user to hit the specific code path first, and serves only to present a more understandable error message. If there were (or if there is) a way to analyze a |
|
Another question, just to be sure: If this mode is enabled, will you still be able to use native parsers (i.e., support both |
|
Yes, you can still use the native parsers. |
This PR adds undocumented functionality for loading custom language plugins at runtime. I don't intend to expose the functionality to end users yet, but this will allow the team to test the capability internally. ### Implementation There isn't much new code in Zed. Most of the work here is within Tree-sitter, in PRs tree-sitter/tree-sitter#1864 and tree-sitter/tree-sitter#2840, which allow Tree-sitter to load languages from WASM blobs. I've tested the functionality in Tree-sitter's test suite and via its CLI, but having it wired into Zed allows us to test the functionality more fully. ### Details Now, on startup, Zed will look for subdirectories inside of `~/Application Support/plugins`. These subdirectories are expected to look similar to the per-language subdirectories in [`crates/zed2/src/languages`](https://github.com/zed-industries/zed/tree/main/crates/zed2/src/languages), except that they also contain a `.wasm` file for the parser itself. I'll add more details here as I go.
|
Any plan on shipping this? |
|
Can (should?) the latest tree-sitter release be built against a published |
|
The challenge right now is that |
Background
Currently, Tree-sitter parsers can be compiled to WebAssembly (aka 'WASM') and run within a web browser, via the
web-tree-sitterJavaScript library, which contains a WASM build of the Tree-sitter library.In some applications that use the native Tree-sitter library, it would also be very useful to be able to load these same WASM builds of parsers, in order to support adding parsers as 🔌 plugins 🔌 , without requiring users to download platform-specific native binaries or to compile C code on their own machines.
Change
This PR adds a new optional WASM feature to the core library, which can be enabled in Rust via the
wasmcargo feature, and in C via theTREE_SITTER_FEATURE_WASMmacro.This feature allows you to build a native
TSLanguageobject from a WASM buffer. You can then use this language object just like any other (native-compiled) language object: parsing text that lives in native memory, constructing syntax trees on the native heap, sending it between threads, etc. Tree-sitter languages are mostly just plain immutable data, so they're easy to unmarshal from a compiled wasm module.The only difference is that when using a wasm-based language with a
TSParser, you must first provide the parser with aTSWasmStore. This wasm store object wraps awasmtime::Storeobject, which is used by the Tree-sitter library to invoke the language's lexing functions which are code, not data, so they require a WASM runtime to execute.Notes
Wasmtime Dependency - I originally thought that it'd be cool to code against WebAssembly's standard
wasm.hC interface, which is supposedly implemented by multiple runtimes (mainly wasmtime and V8). This way, applications using Tree-sitter would have multiple choices of which WASM runtime to link Tree-sitter against.As I got into the details, I ended finding that some of the C APIs that I needed were currently unimplemented in wasmtime, but that Wasmtime provides its own custom C interface which is fully implemented, and which is supposedly better-designed from a performance perspective. So I ended up using Wasmtime's own C API. This means you can't use V8 as the wasm runtime when using this Tree-sitter feature, at least for now.
Tasks
wasmtimePR for tweaks to thewasmtime-c-apiCargo.tomllibcorlibc++web-tree-sitterts_wasm_store_load_language--wasmflag to thetree-sitter parseCLI command, which causes any parsers to be compiled to WASM instead of native shared libraries, and loaded via the new logic.--wasmflag to thetree-sitter testcommand that works the same way