Skip to content

CI/BLD: use scipy-openblas wheel when building#20585

Merged
rgommers merged 2 commits intoscipy:mainfrom
mattip:openblas-wheels2
May 28, 2024
Merged

CI/BLD: use scipy-openblas wheel when building#20585
rgommers merged 2 commits intoscipy:mainfrom
mattip:openblas-wheels2

Conversation

@mattip
Copy link
Copy Markdown
Contributor

@mattip mattip commented Apr 26, 2024

Replaces #20074. Uses the scipy-openblas32 wheel for building Scipy

The first commit changes linux wheel builds to use the scipy-openblas32 wheel

Reference issue

What does this implement/fix?

  • Add a new cibw_before_build.sh script Change cibw_before_build*.sh scripts to use scipy-openblas wheels
  • Remove the OpenBLAS version check from tests

Additional information

cc @rgommers

@mattip mattip force-pushed the openblas-wheels2 branch from 4737ad9 to 8214296 Compare April 26, 2024 01:43
@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented Apr 26, 2024

The scipy_ BLAS_SYMBOL_PREFIX is not getting prepended to srotg. I see that scpiy-openblas 0.3.27 is found, which should result in using the cflags from the pkgconfig file. I wonder where that is getting lost.

Copy link
Copy Markdown
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need for LD_LIBRARY_PATH is quite annoying. @ev-br pinged me about an issue with this as well yesterday. I haven't been able to reproduce it myself, but my conclusion was that Libs: -L{libdir} -lscipy_openblas in the generated .pc file is not reliable since it depends on the loader's library search path configuration as well as whether site-packages is under sysroot or not, and it should be replaced with /abspath/to/scipy_openblas.so to ensure the built extension modules will always find scipy_openblas.so at runtime.

All of the boilerplate of copying shared libraries around should be reduceable to:

$ python -c "import scipy_openblas32; print(scipy_openblas32.get_pkg_config())" > scipy-openblas.pc
$ export PKG_CONFIG_PATH=$PWD

Does that sound right to you @mattip?

@ev-br
Copy link
Copy Markdown
Member

ev-br commented Apr 26, 2024

For LD_LIBRARY_PATH, my band-aid fix was to copy the .so file from build/ to build-install:

https://github.com/ev-br/openblas-bench/blob/main/.github/workflows/codspeed.yml#L42

Still puzzling why the two libs have different linkage.

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented Apr 26, 2024

LD_LIBRARY_PATH is needed at runtime, not build time. The linker arguments in the pkg_config file are for build time, not runtime. In a proper wheel, delocate/auditwheel are used so LD_LIBRARY_PATH is no longer needed. But in the CI workflows, we need to either copy the dependencies into somwhere like /usr/local/lib, which is what the current scripts do, or use LD_LIBRARY_PATH

@rgommers
Copy link
Copy Markdown
Member

I know it's a runtime thing, I am saying that it should not be needed. It is bad for a build to succeed and then fail at runtime.

In a proper wheel, delocate/auditwheel are used

We have a goal of wanting it to work without vendoring. Also, folks are trying to use the scipy-openblas32/64 locally. And we want the cibuildwheel config to be as simple as possible. That's three reasons to make this work without LD_LIBRARY_PATH.

I don't think it's hard. It should only require passing an absolute path rather than -L/-l, as I said above. We should be able to do this in scipy_openblas32.get_pkg_config(), right? It knows exactly where the shared library is located, so writing out the .pc file containing the correct absolute path seems like only a small tweak.

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented Apr 26, 2024

writing out the .pc file containing the correct absolute path…

Makes sense. I can try this next week

@lucascolley lucascolley changed the title use scipy-openblas wheel when building [wheel build] CI: use scipy-openblas wheel when building Apr 26, 2024
@lucascolley lucascolley added the CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure label Apr 26, 2024
@lucascolley lucascolley changed the title CI: use scipy-openblas wheel when building CI/BLD: use scipy-openblas wheel when building Apr 26, 2024
@lucascolley lucascolley added the Build issues Issues with building from source, including different choices of architecture, compilers and OS label Apr 26, 2024
@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 6, 2024

it should be replaced with /abspath/to/scipy_openblas.so to ensure the built extension modules will always find scipy_openblas.so at runtime

That seems to work on linux. How should this work on macos? Linking with the absolute path does not preserve the path for runtime, nor does using -Wl,-rpath,/abspath/to properly add an RPATH to the shared object (at least as far as I can tell).

@rgommers
Copy link
Copy Markdown
Member

rgommers commented May 6, 2024

nor does using -Wl,-rpath,/abspath/

This is what conda-forge compilers use on macOS, so I expect it to work:

% echo $LDFLAGS
-Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/Users/rgommers/mambaforge/envs/scipy-dev/lib -L/Users/rgommers/mambaforge/envs/scipy-dev/lib

It would be surprising if it works when given via LDFLAGS (so for all targets) but not if it's only in the list of link args for the targets where it's needed. Are you sure it doesn't work? If so, does using LDFLAGS instead work?

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 7, 2024

The -Wl,-rpath,<path> part is needed in the Libs: line of scipy-openblas.pc to set the rpath search path.
It doesn't matter whether I use the full/path/to/libscipy_openblas.dylib or -L/path -lscipy_openblas, the compiled linalg shared objects still only have a dependency on libscipy_openblas.dylib

The missing part is that I need to do this in order to "burn" the @rpath prefix onto the libscipy_openblas in the LC_LOAD_DYLIB stanza of the dylib header in all the shared objects that use scipy_openblas:

install_name_tool -change libscipy_openblas.dylib @rpath/libscipy_openblas.dylib <path-to>/linalg/shared-object.cpython-311-darwin.so

I am still trying to figure out how to set the @rpath prefix from the linker without install_name_tool.

@rgommers
Copy link
Copy Markdown
Member

rgommers commented May 7, 2024

The -Wl,-rpath addition is correct. The problem is with libscipy_openblas.dylib. Its install name (LC_ID_DYLIB) should include @rpath. You can check with:

% otool -D libscipy_openblas.dylib
libscipy_openblas.dylib:
libscipy_openblas.dylib

or with:

% otool -l libscipy_openblas64_.dylib | grep -A 2 LC_LOAD_DYLIB

To test that, you can fix it locally like this:

% install_name_tool -id @rpath/libscipy_openblas.dylib libscipy_openblas.dylib
% codesign --force -s - libscipy_openblas.dylib

After that, the install name is correct:

% otool -l libscipy_openblas.dylib | grep -A 2 LC_ID_DYLIB
          cmd LC_ID_DYLIB
      cmdsize 56
         name @rpath/libscipy_openblas.dylib (offset 24)

and adding -Wl,-rpath,${libdir} to the .pc file will work as expected.

You can also try checking install names of for example the <prefix>/lib directories of conda-forge or Homebrew. For conda-forge, all dylibs start with @rpath. For Homebrew, they're all absolute paths. Both work - but a plain libscipy_openblas.dylib will not. The conda-forge approach is preferred (see, e.g., https://forums.developer.apple.com/forums/thread/736719 - generally a very useful article).

tl;dr rebuilding scipy-openblas with a change to the install name of the shared library on macOS, as well as including -Wl,-rpath,${libdir} for all platforms except Windows, should fix this.

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 9, 2024

All of the boilerplate of copying shared libraries around should be reduceable to:

$ python -c "import scipy_openblas32; print(scipy_openblas32.get_pkg_config())" > scipy-openblas.pc
$ export PKG_CONFIG_PATH=$PWD

I updated the wheels so those two lines are all that is needed. The error which indicates something is not working right with the -DBLAS_SYMBOL_PREFIX=scipy_ is still there:

ImportError: Error relocating /__w/scipy/scipy/build-install/lib/python3.10/site-packages/scipy/linalg/_fblas.cpython-310-x86_64-linux-gnu.so: srotg_: symbol not found

@rgommers
Copy link
Copy Markdown
Member

rgommers commented May 9, 2024

Great, much nicer! I'll try to have a look at the srotg issue tonight or tomorrow. It doesn't look like we treat it in a special way, however a comment in npy_blas_base.h says: Routines with S and D prefix only.

The macOS job failed, detection didn't work:

scipy/meson.build:191:9: ERROR: Dependency "OpenBLAS" not found, tried pkgconfig, framework and cmake

but I assume that's because this is now a pared-down PR focused on Linux?

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 9, 2024

Yes, once the speediest build job to use a scipy-openblas wheel works (which I think happens to be the musllinux one), I will clean this all up for the other build jobs and wheel builds.

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 9, 2024

If it helps to debug the problem, I can clean up the macos one so it too fails to apply the prefix?

@thalassemia
Copy link
Copy Markdown
Contributor

The reason the symbol prefix is not being honored is because we currently only generate the required BLAS/LAPACK wrappers for Accelerate builds to handle their special symbol suffixes. Since scipy-openblas has a special symbol prefix, we can fix this by enabling wrapper generation for those builds as well (the commit I added). The wrappers should only be generated when there is a symbol prefix or suffix because they result in infinite recursion when the wrapper symbols are the same as the BLAS/LAPACK symbols (as they are when there is no prefix/suffix).

Could we automatically detect whether a BLAS/LAPACK library has prefixes/suffixes instead of enabling wrapper generation on a case-by-case basis?

@rgommers
Copy link
Copy Markdown
Member

rgommers commented May 10, 2024

Ah of course, good catch @thalassemia.

Could we automatically detect whether a BLAS/LAPACK library has prefixes/suffixes instead of enabling wrapper generation on a case-by-case basis?

Not fully automatically, but we can specify a set of "these are the set of potential prefixes/suffixes to check". E.g.: https://github.com/numpy/meson/blob/4e370ca8ab73c07f7b84abe8a4b937caace050a4/mesonbuild/dependencies/blas_lapack.py#L398-L409

For now doing it manually is the way to go, but we can add a standard set to check in Meson. I think it mainly matters for OpenBLAS (and most ILP64 builds of course).

@rgommers
Copy link
Copy Markdown
Member

The musllinux job is happy now. I cancelled the two Prerelease deps jobs, because they were hanging - should be related, since they were using an older version of scipy-openblas, so the prefix change probably created a conflict there. Anyway, this should be unblocked now to update the other jobs.

@mattip mattip force-pushed the openblas-wheels2 branch from 9928b0a to 4716792 Compare May 12, 2024 05:50
@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 23, 2024

testing locally, setting OPENBLAS_NUM_THREADS=1 speeds up linalg tests by 50%, even when running tests without xdist.

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 23, 2024

That solved the slow tests. Now a test is failing on numpy/numpy#26439:

ValueError: data type <class 'numpy.int32'> not inexact

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 23, 2024

Should this PR remove the openblas preference from the meson defaults? It seems to be working when using scipy-openblas regardless of this setting

scipy/meson.build

Lines 10 to 18 in b421cd6

default_options: [
'buildtype=debugoptimized',
'b_ndebug=if-release',
'c_std=c17',
'cpp_std=c++17',
'fortran_std=legacy',
'blas=openblas',
'lapack=openblas'
],

@rgommers
Copy link
Copy Markdown
Member

Should this PR remove the openblas preference from the meson defaults? It seems to be working when using scipy-openblas regardless of this setting

I don't think so. A PR for auto-detection in the same way as NumPy does it is still to be opened. That will overhaul all this, for now I don't see a reason to touch it.

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 23, 2024

Ci (except for "owner approval") is passing. Should I trigger the wheel building, or leave it for a future PR?

@rgommers
Copy link
Copy Markdown
Member

Great! Please do trigger the wheel builds, since the changes here affect wheel builds.

@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 25, 2024

CI is passing, I think this might be ready for a review.

@mattip mattip force-pushed the openblas-wheels2 branch from bf1d5e3 to f14bf39 Compare May 27, 2024 19:50
@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 27, 2024

There was a merge conflict, so I rebased, cleaned this up a bit, and squashed all the commits.

# This does not work because pkg-config does not like backslashes,
PKG_CONFIG_PATH="{project}"
# do this instead (which will override this setting)
# set CIBW_ENVIRONMENT_WINDOWS=PKG_CONFIG_PATH=PWD.replace('\\', '/')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is happening with this comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot set PKG_CONFIG_PATH to "{project}" on windows, the path native separators confuse pkg-config. So you have to set it via CIBW_ENVIRONMENT_WINDOWS instead. This is a comment for people wishing to build wheels that they must use that alternative. Should I remove it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. I didn't really understand that from the comment - let's just clarify it instead.

To be honest I also don't understand the "this does not work" comment (which I now think is also aimed at folks wanting to build locally?), nor what "{project}" expands to and which tool does the expanding.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's deal with this loose end post-merge. Time to get this in, so we have it in the 1.14.x branch - that'll make maintenance easier.

@rgommers
Copy link
Copy Markdown
Member

Thanks Matti! This looks quite good, and makes things significantly easier to understand.

@rgommers rgommers added this to the 1.14.0 milestone May 28, 2024
Copy link
Copy Markdown
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks for getting this over the finish line @mattip! And thanks for the reviews @andyfaff

@rgommers rgommers merged commit d474026 into scipy:main May 28, 2024
@mattip
Copy link
Copy Markdown
Contributor Author

mattip commented May 28, 2024

It seems some of the windows tests are still slower now, there are occasional CI failures due to test durations being over the limits. Was that happening before this PR?

@rgommers
Copy link
Copy Markdown
Member

I think what you're seeing is gh-20806, and that's due to CI machinery changes unrelated to this PR.


echo "REPAIR_PATH=$LIB_PATH" >> "$GITHUB_ENV"
GFORTRAN_LIB="\$(dirname \$(gfortran --print-file-name libgfortran.dylib))"
CIBW="DYLD_LIBRARY_PATH=$GFORTRAN_LIB:$LIB_PATH delocate-listdeps {wheel} &&\
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the changes in this code block seem to be causing the problems. We already had two libgfortran's here, but I bet setting DYLD_LIBRARY_PATH here made OpenBLAS also see the one from gfortran rather than the one in the openblas tarball, or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build issues Issues with building from source, including different choices of architecture, compilers and OS CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants