Introduce ExtensionVersioner for C++ extensions#11725
Introduce ExtensionVersioner for C++ extensions#11725goldsborough wants to merge 1 commit intopytorch:masterfrom
Conversation
|
Thank you! |
41162c3 to
9fff003
Compare
|
@peterjc123 could you do some digging for the Windows failure? I'm seeing which means it can't link the second version for some reason. I don't quite understand why it only fails on Windows like this. It seems to compile the first version |
|
@goldsborough It's probably a permission issue. Is this library already in use during build? |
|
@peterjc123 Sorry could you elaborate on what you mean? |
|
@goldsborough It seems that reloaded_jit_extension_v1.pyd is being used when compiling. On Windows, deleting a file that is used by another process is not allowed. |
|
@peterjc123 I see. I don't know why that would be happening. It is going through the exact same code path as version 0, with the exact same function ( |
|
One day, someone is going to deploy hot loadable C++ extensions with PyTorch in prod, and then wonder why after a week the Python process crashes XD |
|
@ezyang as long as they don't modify the source code they reference from their JIT extension running in prod and re-load it, they should be ok :) |
|
They'd have a date stamp to be extra safe...
|
|
@goldsborough Is reloaded_jit_extension_v1 built twice? When built for the first time, it should be okay from the log. But the second time, it would fail because it is still being used(loaded in python). We could not delete one file under this circumstances.
|
cc69f24 to
0ec2c32
Compare
|
This is looking healthy. Hooray! I think the one failing test is spurious. Let me retest it |
facebook-github-bot
left a comment
There was a problem hiding this comment.
goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
I made a change such that we don't even attempt to re-build an unchanged library. Previously we were just relying on Ninja for this, but now we know this messes with Windows due to access to a file from multiple processes. Thanks for your help @peterjc123. |
0ec2c32 to
9557341
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
goldsborough is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Python never closes shared library it
dlopens. This means that callingloadorload_inline(i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name.I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an
ExtensionVersioner, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as_v<version>to the extension name for all versions greater zero.One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under
/tmp. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them.Fixes #11398 CC @t-vi
@ezyang @gchanan @soumith @fmassa