Skip to content

Windows: attempt to preserve DLL parent directory structure#7028

Merged
rokm merged 3 commits intopyinstaller:developfrom
rokm:keep-stuff-in-place
Aug 15, 2022
Merged

Windows: attempt to preserve DLL parent directory structure#7028
rokm merged 3 commits intopyinstaller:developfrom
rokm:keep-stuff-in-place

Conversation

@rokm
Copy link
Copy Markdown
Member

@rokm rokm commented Aug 14, 2022

When collecting a DLL that was discovered via link-time dependency analysis of a collected binary/extension, attempt to preserve
its parent directory structure instead of collecting it into application's top-level directory.

This aims to preserve the parent directory structure of DLLs bundled with python packages in PyPI wheels, while the DLLs
collected from system directories (as well as from Library\bin directory of the Anaconda's environment) are still collected into
top-level application directory (because there is no directory structure to preserve there).

Ultimately, this behavior should be fixed on all OSes, but for macOS and linux, we need to first implement support for symbolic
links. On Windows, we won't be using symbolic links anyway, so we can make the change in advance and see (and fix) the fallout this causes.

Besides, our hand is kind of forced by #6924 and the fix for it, #6925. There, we improve the tracking of DLL search paths for binary dependency analysis in order to be able to find more DLLs. Previously, similar problems were handled by hooks that collected those "unreachable" DLLs, and the hooks typically preserved the DLLs parent directory structure. On the other hand, prior to this PR, binary dependency analysis always collected the discovered DLLs into top-level directory. So a DLL discovered and collected via both mechanisms ended up duplicated; and because more DLLs can be discovered by binary dependency analysis due to #6925, this leads to duplication. But, after changes in this PR, both mechanisms should collect into the original sub-directory, and the duplication will be handled on the TOC level.

This does introduce another type of DLL duplication, if multiple packages bundle a copy of the same DLL (e.g., vcruntime140.dll). I suppose if this becomes a problem, we could refine the mechanism to collect well-known-and-common DLLs into top-level directory again. But on the other hand, such duplication might already happen when DLL collection is done by a hook, and in some cases, the package expects to find that particular DLL copy in its library (sub)directory anyway (e.g., packages using delvewheel on python 3.7 due to loading via load order file).

@rokm rokm changed the title Windows: attempt to preserve DLL parent path Windows: attempt to preserve DLL parent directory structure Aug 14, 2022
rokm added 2 commits August 14, 2022 20:13
For DLLs collected from site-packages directories during the
binary dependency analysis, attempt to preserve the parent
directory structure instead of collecting them to the top-level
directory.

Ultimately, this behavior should be fixed on all OSes, but for
macOS and linux, we need to first implement support for symbolic
links.
Due to preservation of the directory layout, the pywintypes DLLs
(pywintypes3X.dll and pythoncom3X.dll) are not collected into
top-level application directory anymore, but rather into the
pywin32_system32 sub-directory.

Add this sub-directory to sys.path to keep the pywintypes' loader
code happy...
@rokm rokm force-pushed the keep-stuff-in-place branch from 5a2fd4a to 0e74a90 Compare August 14, 2022 18:13
@rokm rokm marked this pull request as ready for review August 14, 2022 21:28
@rokm rokm requested a review from bwoodsend August 14, 2022 21:28
@bwoodsend
Copy link
Copy Markdown
Member

bwoodsend commented Aug 14, 2022

I don't quite follow why we need symlinks on the unix platforms. Is it because we're currently finding these dependent .so files via LD_LIBRARY_PATH and .dylibs with that macOS rewriting thing? If so, would that not be fixed by using the rpath/$ORIGIN approach suggested here?

@rokm
Copy link
Copy Markdown
Member Author

rokm commented Aug 15, 2022

I don't quite follow why we need symlinks on the unix platforms. Is it because we're currently finding these dependent .so files via LD_LIBRARY_PATH and .dylibs with that macOS rewriting thing?

It is really necessary only for macOS due to path rewriting. We might get rid of that altogether at a later stage, but during transition period, the idea is to make symlinks back to top-level directory to satisfy the assumptions of the path rewriting (similarly to how BUNDLE already does it for certain parts).

On linux, this should not be necessary (as existing relative rpaths should already do the trick, as long as relative path relations between involved libraries are preserved), but it is still nice to have as a fallback in case it is needed for some corner case.

(And yes, the overreaching problem here is that on POSIX systems, we cannot manipulate linker search paths dynamically, as we can on Windows by modifying PATH or using AddDllDirectory or its python binding, os.add_dll_directory).

If so, would that not be fixed by using the rpath/$ORIGIN approach suggested here?

Yeah, but that approach is something I am not particularly looking forward to, because it is essentially the same thing as our macOS path rewriting. Plus, it will introduce a new dependency (chrpath, or patchelf) that people will not have installed by default...

…DLLs

In helpers that explicitly collect DLLs from Qt binary path
(`get_qt_binaries`, `get_qt_network_ssl_binaries`), preserve the
directory structure if the DLLs are collected from the python
package directory (e.g., PyPI wheels).

The change applies only to Windows, as that is the only platform
where we explicitly collect (some) DLLs from the Qt binary path.
@rokm rokm force-pushed the keep-stuff-in-place branch from 0e74a90 to 81bee58 Compare August 15, 2022 16:03
dll_file_paths = glob.glob(dll_path)
for dll_file_path in dll_file_paths:
to_include.append((dll_file_path, dst_dll_path))
dll_file_path = pathlib.Path(dll_file_path).resolve()
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on earlier discussion about Path.resolve(), I've now added the call here as well. I don't think it should make much difference, but it makes the changes in the PR consistent (i.e., all added pathlib.Path() instances are explicitly resolved now).

dll_names = ('libeay32.dll', 'ssleay32.dll', 'libssl-1_1-x64.dll', 'libcrypto-1_1-x64.dll')
binaries = []
for location in locations:
location = pathlib.Path(location).resolve()
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Path.resolve() call here as well, as per above comment.

Copy link
Copy Markdown
Member

@bwoodsend bwoodsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, still approve 🙂

@rokm rokm merged commit ca645da into pyinstaller:develop Aug 15, 2022
@rokm rokm deleted the keep-stuff-in-place branch August 15, 2022 19:43
@Honghe
Copy link
Copy Markdown

Honghe commented Sep 14, 2022

Can not import PyAV after upgrade PyInstaller to this commit.
code:

import av

logs:

Traceback (most recent call last):
  File "<dist\obf\demo.py>", line 3, in <module>
  File "<frozen demo>", line 28, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 499, in exec_module
  File "<video_frame_extract.py>", line 1, in <module>
  File "<frozen video_frame_extract>", line 11, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 499, in exec_module
  File "av\__init__.py", line 52, in <module>
ImportError: DLL load failed while importing _core: The specified module could not be found.

@rokm
Copy link
Copy Markdown
Member Author

rokm commented Sep 14, 2022

Can not import PyAV after upgrade PyInstaller to this commit.

What version of python and what version of PyAV? And how did you install them; is this python.org python and pip, anaconda, etc.?

FWIW, running test_av (which is the same as your sample code) across all supported python versions and OSes, succeeds with PyAV 9.1, 9.1.1, and 9.2 (these are PyPI wheels).

@Honghe
Copy link
Copy Markdown

Honghe commented Sep 15, 2022

@rokm Environment is:

  • Windows 10
  • Miniconda 3
  • Python 3.8.12
  • av 9.2.0 via pip

code:

import av
print('hello')
print(av.__version__)

pack:

pyinstaller demo_av.py

logs:

.\demo_av.exe
Traceback (most recent call last):
  File "demo_av.py", line 1, in <module>
  File "PyInstaller\loader\pyimod02_importers.py", line 499, in exec_module
  File "av\__init__.py", line 52, in <module>
ImportError: DLL load failed while importing _core: The specified module could not be found.
[16504] Failed to execute script 'demo_av' due to unhandled exception!

dist directory tree:

> tree /F
Folder PATH listing for volume Data
D:.
│  api-ms-win-core-console-l1-1-0.dll
│  api-ms-win-core-datetime-l1-1-0.dll
│  api-ms-win-core-debug-l1-1-0.dll
│  api-ms-win-core-errorhandling-l1-1-0.dll
│  api-ms-win-core-file-l1-1-0.dll
│  api-ms-win-core-file-l1-2-0.dll
│  api-ms-win-core-file-l2-1-0.dll
│  api-ms-win-core-handle-l1-1-0.dll
│  api-ms-win-core-heap-l1-1-0.dll
│  api-ms-win-core-interlocked-l1-1-0.dll
│  api-ms-win-core-libraryloader-l1-1-0.dll
│  api-ms-win-core-localization-l1-2-0.dll
│  api-ms-win-core-memory-l1-1-0.dll
│  api-ms-win-core-namedpipe-l1-1-0.dll
│  api-ms-win-core-processenvironment-l1-1-0.dll
│  api-ms-win-core-processthreads-l1-1-0.dll
│  api-ms-win-core-processthreads-l1-1-1.dll
│  api-ms-win-core-profile-l1-1-0.dll
│  api-ms-win-core-rtlsupport-l1-1-0.dll
│  api-ms-win-core-string-l1-1-0.dll
│  api-ms-win-core-synch-l1-1-0.dll
│  api-ms-win-core-synch-l1-2-0.dll
│  api-ms-win-core-sysinfo-l1-1-0.dll
│  api-ms-win-core-timezone-l1-1-0.dll
│  api-ms-win-core-util-l1-1-0.dll
│  api-ms-win-crt-conio-l1-1-0.dll
│  api-ms-win-crt-convert-l1-1-0.dll
│  api-ms-win-crt-environment-l1-1-0.dll
│  api-ms-win-crt-filesystem-l1-1-0.dll
│  api-ms-win-crt-heap-l1-1-0.dll
│  api-ms-win-crt-locale-l1-1-0.dll
│  api-ms-win-crt-math-l1-1-0.dll
│  api-ms-win-crt-process-l1-1-0.dll
│  api-ms-win-crt-runtime-l1-1-0.dll
│  api-ms-win-crt-stdio-l1-1-0.dll
│  api-ms-win-crt-string-l1-1-0.dll
│  api-ms-win-crt-time-l1-1-0.dll
│  api-ms-win-crt-utility-l1-1-0.dll
│  base_library.zip
│  demo_av.exe
│  libcrypto-1_1-x64.dll
│  libffi-8.dll
│  libssl-1_1-x64.dll
│  python38.dll
│  select.pyd
│  ucrtbase.dll
│  unicodedata.pyd
│  VCRUNTIME140.dll
│  _bz2.pyd
│  _ctypes.pyd
│  _decimal.pyd
│  _hashlib.pyd
│  _lzma.pyd
│  _socket.pyd
│  _ssl.pyd
│
├─av
│  │  buffer.cp38-win_amd64.pyd
│  │  bytesource.cp38-win_amd64.pyd
│  │  descriptor.cp38-win_amd64.pyd
│  │  dictionary.cp38-win_amd64.pyd
│  │  enum.cp38-win_amd64.pyd
│  │  error.cp38-win_amd64.pyd
│  │  format.cp38-win_amd64.pyd
│  │  frame.cp38-win_amd64.pyd
│  │  logging.cp38-win_amd64.pyd
│  │  option.cp38-win_amd64.pyd
│  │  packet.cp38-win_amd64.pyd
│  │  plane.cp38-win_amd64.pyd
│  │  stream.cp38-win_amd64.pyd
│  │  utils.cp38-win_amd64.pyd
│  │  _core.cp38-win_amd64.pyd
│  │
│  ├─audio
│  │      codeccontext.cp38-win_amd64.pyd
│  │      fifo.cp38-win_amd64.pyd
│  │      format.cp38-win_amd64.pyd
│  │      frame.cp38-win_amd64.pyd
│  │      layout.cp38-win_amd64.pyd
│  │      plane.cp38-win_amd64.pyd
│  │      resampler.cp38-win_amd64.pyd
│  │      stream.cp38-win_amd64.pyd
│  │
│  ├─codec
│  │      codec.cp38-win_amd64.pyd
│  │      context.cp38-win_amd64.pyd
│  │
│  ├─container
│  │      core.cp38-win_amd64.pyd
│  │      input.cp38-win_amd64.pyd
│  │      output.cp38-win_amd64.pyd
│  │      pyio.cp38-win_amd64.pyd
│  │      streams.cp38-win_amd64.pyd
│  │
│  ├─data
│  │      stream.cp38-win_amd64.pyd
│  │
│  ├─filter
│  │      context.cp38-win_amd64.pyd
│  │      filter.cp38-win_amd64.pyd
│  │      graph.cp38-win_amd64.pyd
│  │      link.cp38-win_amd64.pyd
│  │      pad.cp38-win_amd64.pyd
│  │
│  ├─sidedata
│  │      motionvectors.cp38-win_amd64.pyd
│  │      sidedata.cp38-win_amd64.pyd
│  │
│  ├─subtitles
│  │      codeccontext.cp38-win_amd64.pyd
│  │      stream.cp38-win_amd64.pyd
│  │      subtitle.cp38-win_amd64.pyd
│  │
│  └─video
│          codeccontext.cp38-win_amd64.pyd
│          format.cp38-win_amd64.pyd
│          frame.cp38-win_amd64.pyd
│          plane.cp38-win_amd64.pyd
│          reformatter.cp38-win_amd64.pyd
│          stream.cp38-win_amd64.pyd
│
└─av.libs
        .load-order-av-9.2.0
        avcodec-58-ef69f9bd.dll
        avdevice-58-1b474226.dll
        avfilter-7-90346677.dll
        avformat-58-ab7200bd.dll
        avutil-56-70aa9aa7.dll
        libaom-467b7650.dll
        libass-9-0bcb21dc.dll
        libbluray-2-adf4510f.dll
        libdav1d-8e972675.dll
        libfontconfig-1-c24f5c0f.dll
        libfreetype-6-0eaef20c.dll
        libfribidi-0-ab757abd.dll
        libgcc_s_seh-1-c9790b40.dll
        libgmp-10-0546624b.dll
        libgnutls-30-821030c5.dll
        libharfbuzz-0-6eba1aa6.dll
        libhogweed-6-334d2190.dll
        libiconv-2-c40b11a3.dll
        liblzma-5-8cd9ddde.dll
        libmp3lame-0-028bfb50.dll
        libnettle-8-78729769.dll
        libogg-0-71f020c3.dll
        libopencore-amrnb-0-1a55247e.dll
        libopencore-amrwb-0-3a4aa351.dll
        libopenjp2-95059119.dll
        libopus-0-57232241.dll
        libpng16-16-455807eb.dll
        libspeex-1-76dbc82f.dll
        libstdc++-6-e192ec66.dll
        libtheoradec-1-dd45e4d4.dll
        libtwolame-0-31b0dc25.dll
        libunistring-2-06e900e6.dll
        libvorbis-0-62374c4f.dll
        libvorbisenc-2-0b31b777.dll
        libvpx-1-37f2f739.dll
        libwinpthread-1.dll
        libx264-164-df8f682a.dll
        libx265-64fe7b8a.dll
        libxml2-2-177166ef.dll
        postproc-55-566af066.dll
        swresample-3-6975e3c3.dll
        swscale-5-a27d1997.dll
        xvidcore-7c3d0e75.dll
        zlib1-d1697865.dll

@rokm
Copy link
Copy Markdown
Member Author

rokm commented Sep 15, 2022

Hmmm, I can indeed reproduce this if I mix conda python 3.8.12 and av 9.2 from PyPI.

But I don't think we'll be doing anything about it, because this is purely anaconda-induced problem:

It looks like by default (at least on that python version), anaconda python blocks os.add_dll_directory() call from modifying DLL search paths, unless CONDA_DLL_SEARCH_MODIFICATION_ENABLE environment variable is set to 1.

Therefore, av.__init__ in PyPI wheel has this block of the code at its very beginning, which tries to detect if it is running in the conda environment, and if it is, temporarily enable CONDA_DLL_SEARCH_MODIFICATION_ENABLE.

""""""# start delvewheel patch
def _delvewheel_init_patch_0_0_21():
    import os
    import sys
    libs_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir, 'av.libs'))
    if sys.version_info[:2] >= (3, 8):
        conda_workaround = sys.version_info[:3] < (3, 9, 9) and os.path.exists(os.path.join(sys.base_prefix, 'conda-meta'))
        if conda_workaround:
            # backup the state of the environment variable CONDA_DLL_SEARCH_MODIFICATION_ENABLE
            conda_dll_search_modification_enable = os.environ.get('CONDA_DLL_SEARCH_MODIFICATION_ENABLE')
            os.environ['CONDA_DLL_SEARCH_MODIFICATION_ENABLE'] = '1'
        os.add_dll_directory(libs_dir)
        if conda_workaround:
            # restore the state of the environment variable CONDA_DLL_SEARCH_MODIFICATION_ENABLE
            if conda_dll_search_modification_enable is None:
                os.environ.pop('CONDA_DLL_SEARCH_MODIFICATION_ENABLE', None)
            else:
                os.environ['CONDA_DLL_SEARCH_MODIFICATION_ENABLE'] = conda_dll_search_modification_enable
    else:
        from ctypes import WinDLL
        with open(os.path.join(libs_dir, '.load-order-av-9.2.0')) as file:
            load_order = file.read().split()
        for lib in load_order:
            WinDLL(os.path.join(libs_dir, lib))


_delvewheel_init_patch_0_0_21()
del _delvewheel_init_patch_0_0_21
# end delvewheel patch

Unfortunately, the frozen application does not contain 'conda-meta' directory, and therefore that work-around is not enabled when av is imported in the frozen application.

So if you want to mix conda python and PyPI av wheel for the frozen application, you will need to set CONDA_DLL_SEARCH_MODIFICATION_ENABLE at the start of your program (before you import av):

import os
os.environ['CONDA_DLL_SEARCH_MODIFICATION_ENABLE'] = '1'
import av

Or switch to anaconda python >= 3.9.9, which (based on the delvewheel snippet above) apparently does not need the work-around anymore.

@Honghe
Copy link
Copy Markdown

Honghe commented Sep 15, 2022

@rokm But the built exe prints nothing after adding the environment variable CONDA_DLL_SEARCH_MODIFICATION_ENABLE,

# -*- coding:utf-8 -*-
import os
os.environ['CONDA_DLL_SEARCH_MODIFICATION_ENABLE']='1'
import av
print('hello')
print(av.__version__)

@rokm
Copy link
Copy Markdown
Member Author

rokm commented Sep 15, 2022

You are right. What about:

import sys
if getattr(sys, "frozen", False):
    os.environ["PATH"] = os.path.join(sys._MEIPASS, "av.libs") + os.pathsep + os.environ["PATH"]
import av
print('hello')
print(av.__version__)

Based on another work-around found in av.__init__:

# Some Python versions distributed by Conda have a buggy `os.add_dll_directory`
# which prevents binary wheels from finding the FFmpeg DLLs in the `av.libs`
# directory. We work around this by adding `av.libs` to the PATH.
if (
    os.name == "nt"
    and sys.version_info[:2] in ((3, 8), (3, 9))
    and os.path.exists(os.path.join(sys.base_prefix, "conda-meta"))
):
    os.environ["PATH"] = (
        os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir, "av.libs"))
        + os.pathsep
        + os.environ["PATH"]
    )

@Honghe
Copy link
Copy Markdown

Honghe commented Sep 20, 2022

@rokm Great!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants