Hello 👋
I've stumbled over situation that a predefined list of symbols in lib/binding_web/exports.json isn't enough for my case when I worked on external scanner implementation.
Here is a list of all available library functions that can be used in scanner.cc for languages in wasm files without patches to the exports.json file:
$ curl -s https://raw.githubusercontent.com/tree-sitter/tree-sitter/master/lib/binding_web/exports.json | jq -r '.[]|sub("^_";"")' | c++filt
calloc
free
malloc
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::copy(char*, unsigned long, unsigned long) const
std::__2::__vector_base_common<true>::__throw_length_error() const
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::__init(char const*, unsigned long)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::reserve(unsigned long)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::__grow_by(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::push_back(char)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::basic_string(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> > const&)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::~basic_string()
std::__2::basic_string<wchar_t, std::__2::char_traits<wchar_t>, std::__2::allocator<wchar_t> >::push_back(wchar_t)
std::__2::basic_string<wchar_t, std::__2::char_traits<wchar_t>, std::__2::allocator<wchar_t> >::~basic_string()
operator delete(void*)
operator new(unsigned long)
abort
iswalnum
iswalpha
iswdigit
iswlower
iswspace
memchr
memcmp
memcpy
strlen
towupper
abort
.....................................
I found that emsdk is shipped with llvm-objdump that is able to extract required symbols from a wasm library:
$ docker run --rm -it -v $PWD:/src emscripten/emsdk emcc -I src -c -o src/scanner.wasm.o src/scanner.cc
$ docker run --rm -it -v $PWD:/src emscripten/emsdk bash -c "/emsdk/upstream/bin/llvm-objdump -t --demangle src/scanner.wasm.o" | grep '*UND*'
00000000 F *UND* operator new(unsigned long)
00000000 *UND* __stack_pointer
00000000 F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::basic_string(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> > const&)
00000000 F *UND* strcpy
00000000 F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::~basic_string()
00000000 F *UND* operator delete(void*)
00000000 F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::at(unsigned long)
00000000 F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::__init(char const*, unsigned long)
00000000 F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::push_back(char)
00000000 F *UND* std::__2::__vector_base_common<true>::__throw_length_error() const
00000000 F *UND* __cxa_allocate_exception
00000000 O *UND* typeinfo for std::length_error
00000000 F *UND* std::length_error::~length_error()
00000000 F *UND* __cxa_throw
00000000 F *UND* std::logic_error::logic_error(char const*)
00000000 O *UND* vtable for std::length_error
00000000 F *UND* strlen
00000000 F *UND* std::__2::__vector_base_common<true>::__throw_out_of_range() const
Actually any version of llvm-objdump is able to dump symbols but version from the emscripten/emsdk docker can be used in case of missing system version.
My suggestion is to split the compilation of tree-sitter language wasm files into several steps like:
- Compile any additional source files into a separate object file.
- Collect all undefined symbols from the object file to a dynamically generated
exports.json file.
- Use the generated
exports.json file to link the final tree-sitter language wasm file.
Edit 1
Seems a similar issue happened in #1300.
Hello 👋
I've stumbled over situation that a predefined list of symbols in lib/binding_web/exports.json isn't enough for my case when I worked on external scanner implementation.
Here is a list of all available library functions that can be used in
scanner.ccfor languages inwasmfiles without patches to theexports.jsonfile:I found that
emsdkis shipped withllvm-objdumpthat is able to extract required symbols from a wasm library:Actually any version of
llvm-objdumpis able to dump symbols but version from theemscripten/emsdkdocker can be used in case of missing system version.My suggestion is to split the compilation of tree-sitter language
wasmfiles into several steps like:exports.jsonfile.exports.jsonfile to link the final tree-sitter languagewasmfile.Edit 1
Seems a similar issue happened in #1300.