Skip to content

Problem with statically predefined symbols in binding_web/exports.json for wasm language files #949

@ahlinc

Description

@ahlinc

Hello 👋

I've stumbled over situation that a predefined list of symbols in lib/binding_web/exports.json isn't enough for my case when I worked on external scanner implementation.

Here is a list of all available library functions that can be used in scanner.cc for languages in wasm files without patches to the exports.json file:

$ curl -s https://raw.githubusercontent.com/tree-sitter/tree-sitter/master/lib/binding_web/exports.json | jq -r '.[]|sub("^_";"")' | c++filt
calloc
free
malloc
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::copy(char*, unsigned long, unsigned long) const
std::__2::__vector_base_common<true>::__throw_length_error() const
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::__init(char const*, unsigned long)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::reserve(unsigned long)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::__grow_by(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::push_back(char)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::basic_string(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> > const&)
std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::~basic_string()
std::__2::basic_string<wchar_t, std::__2::char_traits<wchar_t>, std::__2::allocator<wchar_t> >::push_back(wchar_t)
std::__2::basic_string<wchar_t, std::__2::char_traits<wchar_t>, std::__2::allocator<wchar_t> >::~basic_string()
operator delete(void*)
operator new(unsigned long)
abort
iswalnum
iswalpha
iswdigit
iswlower
iswspace
memchr
memcmp
memcpy
strlen
towupper
abort
.....................................

I found that emsdk is shipped with llvm-objdump that is able to extract required symbols from a wasm library:

$ docker run --rm -it -v $PWD:/src emscripten/emsdk emcc -I src -c -o src/scanner.wasm.o src/scanner.cc

$ docker run --rm -it -v $PWD:/src emscripten/emsdk bash -c "/emsdk/upstream/bin/llvm-objdump -t --demangle src/scanner.wasm.o" | grep '*UND*'
00000000       F *UND* operator new(unsigned long)
00000000         *UND* __stack_pointer
00000000       F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::basic_string(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> > const&)
00000000       F *UND* strcpy
00000000       F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::~basic_string()
00000000       F *UND* operator delete(void*)
00000000       F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::at(unsigned long)
00000000       F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::__init(char const*, unsigned long)
00000000       F *UND* std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >::push_back(char)
00000000       F *UND* std::__2::__vector_base_common<true>::__throw_length_error() const
00000000       F *UND* __cxa_allocate_exception
00000000       O *UND* typeinfo for std::length_error
00000000       F *UND* std::length_error::~length_error()
00000000       F *UND* __cxa_throw
00000000       F *UND* std::logic_error::logic_error(char const*)
00000000       O *UND* vtable for std::length_error
00000000       F *UND* strlen
00000000       F *UND* std::__2::__vector_base_common<true>::__throw_out_of_range() const

Actually any version of llvm-objdump is able to dump symbols but version from the emscripten/emsdk docker can be used in case of missing system version.

My suggestion is to split the compilation of tree-sitter language wasm files into several steps like:

  1. Compile any additional source files into a separate object file.
  2. Collect all undefined symbols from the object file to a dynamically generated exports.json file.
  3. Use the generated exports.json file to link the final tree-sitter language wasm file.

Edit 1

Seems a similar issue happened in #1300.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions