Add support for loading .so files in command line runner#3098
Add support for loading .so files in command line runner#3098hoodmane merged 11 commits intopyodide:mainfrom
Conversation
(if they are larger than the synchronous compilation threshold)
ryanking13
left a comment
There was a problem hiding this comment.
Thanks @hoodmane!
pip install scipy makes the command line runner extremely slow. Without scipy installed, python -c 'print(1)' runs in about 1 second, but with it installed it takes more like 10 seconds (time to load clapack_all.so and 111 different .so files in scipy, totaling 20 megabytes). We have to load all of this despite the fact that we won't use any of it.
Well that sounds horrible. Can't we pass loadAsync flags to false when calling loadDynamicLibrary? Since in cmdline runner we already have files in the file system so maybe we can compile libraries synchronously on demand?
It's worth checking, but I think not. I think v8 has caps on the maximum size of a wasm module that can be synchronously loaded and node inherits those caps. |
|
This PR is needed in order for the command line runner to work in numpy CI. |
|
Well without #3054 or some other changes this won't work for any package that uses a shared library. But there is still a benefit because it makes the other packages work |
Why won't it work? Is loading a library globally causing some error? |
|
No the global libraries are just missing. I don't want to always load all shared libraries because loading them is really slow. But we also don't know which ones are needed. |
Okay, that makes sense. How much does loading the library globally slower compared to loading it locally? I mean, even if we manage to make libraries work locally, we will still have to load all of them at the initialization step unless synchronously compiling a WASM module is available. |
I think they are pretty much the same speed. The issue is in knowing whether or not we have to load them at all: we are using |
|
In #3167 I opened an issue about analyzing which shared libs are used in the wheel and embedding them into the wheel. That issue is still WIP, but finding which libraries are required for each wheel or .so file is possible with the For example, you can: $ pip install auditwheel-emscripten
$ pyodide audit show Shapely-1.8.2-cp310-cp310-emscripten_3_1_21_wasm32.whl
The following external shared libraries are required:
{
│ 'shapely/speedups/_speedups.cpython-310-wasm32-emscripten.so': ['libgeos_c.so'],
│ 'shapely/vectorized/_vectorized.cpython-310-wasm32-emscripten.so': ['libgeos_c.so']
}>>> import auditwheel_emscripten
>>> auditwheel_emscripten.show("Shapely-1.8.2-cp310-cp310-emscripten_3_1_21_wasm32.whl")
{'shapely/speedups/_speedups.cpython-310-wasm32-emscripten.so': ['libgeos_c.so'], 'shapely/vectorized/_vectorized.cpython-310-wasm32-emscripten.so': ['libgeos_c.so']} |
Yes, this does sound like it would likely be a solution. Thank you @ryanking13! |
|
Thanks a lot @hoodmane! |
|
Great, thanks @hoodmane and @ryanking13 for getting to the bottom of it. We should probably document this more. So what's the practical impact on say loading CLAPACK .so in scipy from a venv #3186? How do I specify the location of the .so to load if it's in a separate package or do we now expect to vendor all .so in the wheel? cc @lesteve |
When they are larger than the synchronous compilation threshold.
Notes:
global_dso: falsein Switch to separate setting for loading shared libraries globally #3054. Hopefully we can gradually move more to explicitly linking shared libraries they depend on.pip install scipymakes the command line runner extremely slow. Without scipy installed,python -c 'print(1)'runs in about 1 second, but with it installed it takes more like 10 seconds (time to loadclapack_all.soand 111 different .so files in scipy, totaling 20 megabytes). We have to load all of this despite the fact that we won't use any of it.Checklists