BLD: Add Accelerate support for macOS 13.3+#19816
Conversation
|
Thanks @thalassemia. Working on going through this once more and exercising it a bit.
Agreed. I see a 10%-15% increase in build time on my M1 Macbook Pro. Which I agree is acceptable. Tracing shows that the added targets are adding a ~20 sec load of small file compiles at the start of the build, which makes sense:
The build time increase will only occur with Accelerate, so that's all good. |
|
I checked the binary size change: this approach is very lean, 54 kb extra due to the extra static libraries. Size grows from 21.78 MB to 21.84 MB for a local build. The
I think it's possible with linker flags like |
|
Good news! I think I found a way to put every wrapper in a single file and let the linker exclude unused symbols. Previously, the linker was forced to include otherwise unused symbols simply because all exported symbols are global by default. Now, by explicitly marking my wrapper symbols as hidden, the linker can remove any symbols that are not directly used within each individual module. Importantly, this PR only adds the flags needed to make this work in macOS, since that is the only platform with Accelerate. In a future ILP64 PR using this framework, I'll need to add additional compiler-specific Meson flags to achieve the same result ( |
rgommers
left a comment
There was a problem hiding this comment.
Previously, the linker was forced to include otherwise unused symbols simply because all exported symbols are global by default.
Regarding hidden symbols, it shouldn't be needed to define those manually or use __attribute__('hidden') directly I suspect. Instead, the gnu_symbol_visibility keyword to static_library can be used to do this automatically - and we wouldn't have to worry about things like Clang defining __GNU__. Did you consider this?
| 'SuperLU/SRC/zutil.c' | ||
| ], | ||
| c_args: _superlu_lib_c_args, | ||
| dependencies: blas_dep, |
There was a problem hiding this comment.
That this worked at all before looks like a bug. It should have failed without this and USE_VENDOR_BLAS=1 set. The reason it doesn't is probably all the aliasing done in slu_Cnames.h.
There was a problem hiding this comment.
Looking at this more closely, I think this line is unnecessary. Even if the BLAS dependency is missing when compiling the static library, it is included when compiling the extension module that links with this library. The same applies to the BLAS/LAPACK dependencies I added to the ARPACK static library. I'll clean those up for clarity.
|
Re |
|
Did not know about the |
rgommers
left a comment
There was a problem hiding this comment.
LGTM now. Time to give this a go. Thanks a lot @thalassemia! And thanks @h-vetinari for all the review effort!
|
Congratulations @thalassemia and thanks for splitting various pieces off from this big effort! This definitely deserves a release note - if you like, please edit one into the current draft. |
|
Yes, congratulations @thalassemia, a big effort! Do we know how much the wheel size will reduce with not having to bundle openblas? |
|
@andyfaff On my machine (M2, macOS 14.4.1, Python 3.11), 34edaf2 on Do you think it makes sense to have our macOS wheels use Accelerate by default? At the moment, Accelerate is only used if users build from source with |
It's only useful for a certain mac version and above. Below a certain version I think there are issues with the library. I think the version is 14, but I can't remember off the top of my head. This means two sets of wheels have to be made. One set for version > 10.X that bundles openblas and one set for version >14.x that doesn't. numpy is already doing this, but start from a lower macOS version. |
|
Yes indeed, we should start shipping extra wheels built against Accelerate for macOS >=14.0 for the next release, just like is done for numpy 2.0.0. They will be smaller and faster. |
|
This may be a good place to record that macOS 15 will include another update to Accelerate - it upgrade to LAPACK 3.11.0 with no change in symbol names. The extra LAPACK APIs will be available if the macOS deployment target is >= 15.0. |

I added a script (scipy/_build_utils/_generate_blas_wrapper.py) to generate C wrappers for Accelerate BLAS/LAPACK. These wrappers are compiled into a static library that other modules can link against with no code modifications (e.g. a call to
dgemmin a Fortran module will be routed through a wrapper and turned into a call todgemm$NEWLAPACK). I also added Accelerate support to the script (scipy/linalg/_generate_pyx.py) that generates the Cython for_fblas/_flapackusing similar C wrappers. In the process of making these two changes, I included special logic to handle the G77 ABI wrappers, eliminating the need for additional Fortran wrappers.Other changes:
Functions in
_fblas/_flapackthat have a return value were written as such (previously, f2py generated Fortran wrappers for these)Resolved missing prototype build warnings in scipy/integrate/lsoda.pyf and scipy/integrate/vode.pyf
Added test for
cython_blas.cladivandcython_blas.dladivDeclared separate
blas_depandlapack_depwith necessary compile arguments, leavingblasandlapackuntouched for more useful information when callingscipy.show_config()Resolved function with no return value build warnings and remove custom G77 wrappers in PROPACK (depends on separate PR on the PROPACK repo)
Downsides
The wrapper functions are all in separate source files, allowing the linker to exclude unused symbols and keep binary sizes about the same as they currently are. Since there are ~1600 BLAS/LAPACK functions, this increases build time by a noticeable amount. I tried writing everything to a single file and compiling with
-ffunction-sections -fdata-sections -Wl,-dead_strip, but this still resulted in unused symbols and larger binaries.There is no way to mix and match BLAS/LAPACK libraries at the moment (also a challenge to support both LP64 and ILP64).
Future
This wrapper generation framework makes it pretty easy to add ILP64 support. If this PR seems OK in principle, I can make a separate PR detailing some quirks I encountered trying to implement ILP64 support with this as a base.