Skip to content

Add support for loading .so files in command line runner#3098

Merged
hoodmane merged 11 commits intopyodide:mainfrom
hoodmane:cmdline-dynlibs
Nov 9, 2022
Merged

Add support for loading .so files in command line runner#3098
hoodmane merged 11 commits intopyodide:mainfrom
hoodmane:cmdline-dynlibs

Conversation

@hoodmane
Copy link
Member

@hoodmane hoodmane commented Sep 14, 2022

When they are larger than the synchronous compilation threshold.

Notes:

  1. This only works if the package works with global_dso: false in Switch to separate setting for loading shared libraries globally #3054. Hopefully we can gradually move more to explicitly linking shared libraries they depend on.
  2. pip install scipy makes the command line runner extremely slow. Without scipy installed, python -c 'print(1)' runs in about 1 second, but with it installed it takes more like 10 seconds (time to load clapack_all.so and 111 different .so files in scipy, totaling 20 megabytes). We have to load all of this despite the fact that we won't use any of it.

Checklists

(if they are larger than the synchronous compilation threshold)
Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hoodmane!

pip install scipy makes the command line runner extremely slow. Without scipy installed, python -c 'print(1)' runs in about 1 second, but with it installed it takes more like 10 seconds (time to load clapack_all.so and 111 different .so files in scipy, totaling 20 megabytes). We have to load all of this despite the fact that we won't use any of it.

Well that sounds horrible. Can't we pass loadAsync flags to false when calling loadDynamicLibrary? Since in cmdline runner we already have files in the file system so maybe we can compile libraries synchronously on demand?

@hoodmane
Copy link
Member Author

Can't we pass loadAsync flags to false when calling loadDynamicLibrary?

It's worth checking, but I think not. I think v8 has caps on the maximum size of a wasm module that can be synchronously loaded and node inherits those caps.

@hoodmane hoodmane mentioned this pull request Sep 16, 2022
@hoodmane
Copy link
Member Author

This PR is needed in order for the command line runner to work in numpy CI.

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @hoodmane. Overall I am +1 with merging this. I guess #3054 is not mandatory for this (though making #3054 work will make loading libraries faster)?

@hoodmane
Copy link
Member Author

Well without #3054 or some other changes this won't work for any package that uses a shared library. But there is still a benefit because it makes the other packages work

@ryanking13
Copy link
Member

Well without #3054 or some other changes this won't work for any package that uses a shared library.

Why won't it work? Is loading a library globally causing some error?

@hoodmane
Copy link
Member Author

hoodmane commented Oct 1, 2022

No the global libraries are just missing. I don't want to always load all shared libraries because loading them is really slow. But we also don't know which ones are needed.

@ryanking13
Copy link
Member

No the global libraries are just missing. I don't want to always load all shared libraries because loading them is really slow. But we also don't know which ones are needed.

Okay, that makes sense. How much does loading the library globally slower compared to loading it locally? I mean, even if we manage to make libraries work locally, we will still have to load all of them at the initialization step unless synchronously compiling a WASM module is available.

@hoodmane
Copy link
Member Author

hoodmane commented Oct 8, 2022

How much does loading the library globally slower compared to loading it locally?

I think they are pretty much the same speed. The issue is in knowing whether or not we have to load them at all: we are using pip to install things but pip has no idea whether or not numpy or scipy require some external non-wheel .so file.

@ryanking13
Copy link
Member

In #3167 I opened an issue about analyzing which shared libs are used in the wheel and embedding them into the wheel. That issue is still WIP, but finding which libraries are required for each wheel or .so file is possible with the auditwheel-emscripten package that I suggested in that issue. Do you think it would help?

For example, you can:

$ pip install auditwheel-emscripten
$ pyodide audit show Shapely-1.8.2-cp310-cp310-emscripten_3_1_21_wasm32.whl
The following external shared libraries are required:
{
│   'shapely/speedups/_speedups.cpython-310-wasm32-emscripten.so': ['libgeos_c.so'],
│   'shapely/vectorized/_vectorized.cpython-310-wasm32-emscripten.so': ['libgeos_c.so']
}
>>> import auditwheel_emscripten
>>> auditwheel_emscripten.show("Shapely-1.8.2-cp310-cp310-emscripten_3_1_21_wasm32.whl")
{'shapely/speedups/_speedups.cpython-310-wasm32-emscripten.so': ['libgeos_c.so'], 'shapely/vectorized/_vectorized.cpython-310-wasm32-emscripten.so': ['libgeos_c.so']}

@hoodmane
Copy link
Member Author

Do you think it would help?

Yes, this does sound like it would likely be a solution. Thank you @ryanking13!

@hoodmane hoodmane merged commit 2cd616d into pyodide:main Nov 9, 2022
@hoodmane hoodmane deleted the cmdline-dynlibs branch November 9, 2022 21:05
@ryanking13
Copy link
Member

Thanks a lot @hoodmane!

@rth
Copy link
Member

rth commented Nov 11, 2022

Great, thanks @hoodmane and @ryanking13 for getting to the bottom of it. We should probably document this more.

So what's the practical impact on say loading CLAPACK .so in scipy from a venv #3186? How do I specify the location of the .so to load if it's in a separate package or do we now expect to vendor all .so in the wheel? cc @lesteve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants