-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Proposed Emscripten / WebAssembly API Changes
Since late September 2018, I've been implementing a straightforward translation layer from host calls from Emscripten-generated WebAssembly to the C library of the underlying host. The result today is a WebAssembly runtime that fully implements all the required functionality to run an unmodified version of Nginx compiled with Emscripten. While it works, there were a few modifications I would make to the existing Emscripten<->WebAssembly API to correct some inadequacies, simplify the translation logic, and optimize the translation logic.
These changes are proposed as minor edits to the existing API but long term it would be good if we could fully document the existing API with the intention of publishing it as a POSIX API standard for WebAssembly. I'm looking for feedback on these proposals before I potentially start working on a patch for Emscripten.
1. Include memory layout information in WASM binary (Critical) (DONE 0d83546)
The WASM module produced by Emscripten requires a few exports from the
host to run successfully:
memoryBase/__memory_base__table_basetempDoubleDYNAMICTOP_PTRSTACKTOPSTACK_MAX
The value of these exports depend on the target STATIC_BUMP value. This value is not encoded in the generated WASM file itself, and must be parsed out from the accompanying .js file.
I propose the STATIC_BUMP value be encoded in a custom "Emscripten Metadata" section in the generated WASM file.
2. Compiling with -O3 shouldn't minify WebAssembly import names (critical) (DONE 2852283)
This was recently changed in Emscripten. -O3 now by default mangles import names. I propose the default -O3 doesn't mangle the import names, and an extra option be added to do that extra minification if desired. Alternatively maintaining the current behavior and adding an option to disable the name mangling is another options.
3. Compile straight to .wasm file (recommended) (DONE 2852283)
It should be possible to specify a .wasm file as the output file. Right now you must specify a file ending in .js, when the .js file isn't always necessary.
4. Include table layout information in WASM binary (Critical) (DONE 0d83546)
The WASM module generated by Emscripten imports its index-zero table from the host. The expected size of this table is not generic and must be parsed out from the accompanying .js file as well.
I propose either:
- that the import parameters for the table allow a generically sized table
- the table size is encoded in a custom "Emscripten Metadata" similar section in the
generated WASM file.
5. off_t and ino_t should be 64-bits (Critical) (DONE 4e4a794)
Most file systems support 64-bit off_t and ino_t. The WASM generated by Emscripten expects a 32-bit off_t and ino_t. This is a problem when delegating the host functions required by the generated WASM file directly to the host's C library since it's not possible to return the full value produced by the host to the hosted WASM.
I propose we convert off_t and ino_t to 64-bits.
6. More POSIX types should be 64-bits (Recommended)
Emscripten ends up using 32-bit versions of many POSIX structures, whereas the host implementation usually uses 64-bit values in its corresponding structures. While pointers might always necessarily be
32-bits in value and don't make sense in host context without conversion to host pointers first anyway, there are a few POSIX structures that in theory could be directly forwarded to the host
implementation. Examples include struct timespec, struct stat, struct tm, and more. If Emscripten encoded these using 64-bit integer values, then no conversion step would be required on the most popular data model on POSIX systems for the foreseeable future, LP64.
7. POSIX API, not System Call API (Recommended)
At first, it seemed like the API between Emscripten-generated WebAssembly and the Emscripten host should be based on system calls implied by POSIX. This doesn't work in the case of higher-level APIs that have implementation-defined behavior. Examples include: readdir() and friends, getgrent and friends, getpwent and friends, sem_post() and friends, and more.
Emscripten currently uses musl library code to implement readdir() and requires the host to implement the getdents64() system call. Only Linux provides an API that has getdents64() semantics, other POSIX systems like OpenBSD and macOS provide slightly similar but incompatible APIs and furthermore those APIs are unsupported. POSIX only requires that implementations provide readdir() and friends. It would be better for Emscripten-generated WebAssembly to generate calls to host-defined readdir() and friends instead of getdents64() to more easily implement the API on all POSIX systems.
This requires some modification to the Emscripten's version of musl which adds extra complexity but I think is overall worth it.
8. Better function signatures for system calls (Recommended)
This is related to the previous recommendation: right now the system call API names the system call by number (e.g. _syscall220). Also the arguments to the system calls are passed in a buffer instead of as WebAssembly function arguments, this requires extra work on the host implement to decode the arguments. I propose all APIs use descriptive names and pass arguments normally instead of in an indirect buffer.
9. API function calls are responsible for setting Errno (Recommended)
Right now some Emscripten APIs expect the system call to return a negative errno number to signal error, and some return the error in the errno value. I think all APIs should behave similarly for consistency.