BUG: distutils: check Fortran ABI for BLAS libraries #3439
BUG: distutils: check Fortran ABI for BLAS libraries #3439pv wants to merge 2 commits intonumpy:masterfrom
Conversation
|
Hm, this whole thing is a really messy solution. Maybe that's the only way to do it, but let me ask anyway:
Fortran 90 on the other hand is definitely not ABI compatible across compilers, not even with the major versions of the same compiler sometimes. But that would be a different issue than this PR is trying to fix, wouldn't it? @pv, if you can help me better understand the core of the problem itself, I would appreciate it. Then I could have some comments to your actual solution. |
|
F77 is is binary compatible only for Fortran SUBROUTINEs, but not for FUNCTIONs. In particular, the differences include whether a REAL return value is returned actually as DOUBLE PRECISION, and whether complex return values are returned on stack/registers or as extra pointer arguments. This can result both to invalid return values and segfaults. (Indeed, the test binary in this PR segfaults on OSX --- that doesn't seem to cause problems however.) BLAS and LAPACK have a couple of FUNCTIONs in their API, and are therefore not F77 ABI portable. In the GNU world, there are two ABIs: gfortran and g77. Linux distributions switched from g77 to gfortran several years ago. However, on OSX, Veclib/Accelerate is on g77 ABI. Not sure about Intel MKL, but I think it's also g77 at least on OSX. This PR concerns only checking F77 compatibility and only in BLAS, because this is what Scipy and 3rd party modules likely use numpy.distutils for. (GNU Octave uses a similar approach to this PR.) Arpack fails on OSX, since it calls Veclib/Accelerate functions from Fortran, and Scipy is usually compiled with gfortran ABI. However, as said, Scipy does have workaround wrappers that use the Veclib C API to step around ABI issues. This however doesn't fully work on OSX 10.7 and 10.8, possibly because some functions are missing from the custom Fortran wrappers. However, the fact that numpy.distutils does not check whether the library info and the Fortran compiler flags it provides are compatible is a somewhat dangerous situation. We can possibly deal with it in Scipy by spending more effort on fixing the C API wrappers, but I believe it is not correct behavior for A possible criticism for this PR is that an invalid Fortran configuration should not block using the library in C-only code. However, I don't really know how to avoid this --- |
numpy/distutils/fcompiler/gnu.py
Outdated
There was a problem hiding this comment.
Why this change? Your original proposal was leaving it up to the user to do export FOPT='-ff2c'. Like this you won't be able to not use -ff2c, plus it won't work the same for Clang and other compilers.
There was a problem hiding this comment.
Well, this commit is a provocation to get some proper discussion.
The upside is that it makes things work out-of-the-box on OSX.
The downside may be many unintended consequences (e.g., I'm not sure it's actually possible to disable this flag in any way after this change).
There was a problem hiding this comment.
I indeed don't see a better way to make things work out of the box besides adding -ff2c.
How about (a) moving this change into FCompiler.get_flags and (b) adding a mechanism (envvar?) to avoid adding -ff2c?
There was a problem hiding this comment.
I think ff2c isn't a recognized option e.g. for Intel Fortran, so this flag shouldn't go to FCompiler
|
Thanks a lot @pv for the explanation. So it looks like the only problem is with functions, which this PR indeed should be able to catch. I went over the Fortran program and it looks ok to me. My understanding is that Fortran codes really need to be compiled with the same compiler, or use C API. But I don't know how to link with the precompiled Accelerate framework using Fortran. I usually compile my own Lapack or OpenBlas with the same Fortran compiler that I use for everything, then things work. So since you mentioned that this test script actually fails on your Mac, what other options do you have to take advantage of the Accelerate framework? If you compile OpenBlas, will it be as fast as Accelerate? |
|
@certik: setting the environment variable FOPT="-ff2c" or including commit b9c4599 allows using Accelerate on OSX (this is tested). (Don't know about OpenBLAS vs. Accelerate speed, but would guess that they fit within a factor of 2.) The question whether to force the Gnu compilers use the g77 ABI on OSX by default (via -ff2c flag) or whether to leave this to the one in charge of the build setup is a good question. I would perhaps prefer the latter, even though it means that Accelerate is not used out-of-the-box. |
|
Possibly naive question: Shouldn't -f2c be added exactly in those cases
|
|
@njsmith: I think that is too magical, as |
|
Hi all, what is the status of this? |
|
This needs some wider testing on downstream packages that use numpy.distutils to build with BLAS+LAPACK. (Scipy on OSX works OK with this and it fixes a few bugs.) Needs also testing with Intel MKL. I don't have a license and cannot do this. Also, I'm sort of waiting for someone to voice counterarguments :) Moreover, I think commit b9c4599 should be dropped: choosing whether to use Unfortunately, the ARPACK issues in Scipy are not solved just by using a correct Fortran ABI. There is actually something dodgy going on in the LAPACK bundled with Accelerate; either it's subtly broken or ARPACK is making some assumptions that are not correct (apparently about QR decompositions produced). I have a PR on Scipy to monkeypatch numpy.distutils and avoid linking with Accelerate's LAPACK components. This however has nothing to do with this PR, we are just working around some ARPACK oddity. Some other Fortran ABI issues in Scipy however are solved by this PR. |
|
Want to put up a PR reverting b9c4599 ? |
|
I'll try to do some testing of this with and without scipy/scipy#2620 after next weekend. I do have access to MKL, although I haven't got it set up on OS X yet. |
Make numpy.distutils to check that the BLAS libraries could actually be used from Fortran, with the currently active Fortran compiler configuration. Failing to check this can result to several commonly used functions such as SDOT, DDOT, ... return invalid results if called from 3rd party Fortran code that links to the BLAS library using numpy.distutils. (Known occurrence: Scipy's ARPACK routines failed consistently on OSX due to Fortran ABI mismatch.)
|
@charris: the offending commit was the second one in this PR. Rebased + dropped it. |
|
Works as advertised for gcc. MKL testing todo. |
|
In addition to MKL, there is an unresolved question on how to allow people to link C-only code without needing to adjust their Fortran settings. |
|
No progress. The blocker is that people calling |
|
@pv Closing this, please resubmit when needed. |
Make numpy.distutils to check that the BLAS libraries can actually be used from Fortran, with the currently active Fortran compiler configuration. Failing to check this can result to several commonly used functions such as SDOT, DDOT, ..., to either return invalid results or segfault when called from Fortran code.
The second commit also turns on
-ff2cflag on OSX for gfortran, so that the default flags are compatible with Veclib/Accelerate. The other commit is independent of this. We can drop the OSX flag change, if it seems too risky (the ABI error message is informative and tells the user how to add-ff2cto the build flags).Known occurrence of problems: Scipy's ARPACK routines currently fail consistently on OSX due to Fortran ABI mismatch. Scipy has some wrapper code that uses Veclib's C API instead of the Fortran API, but this is brittle and fails for a reason or another to work properly on OSX 10.7 and 10.8. Such wrappers are IMHO better left on the responsibility of whoever is doing the building rather than bundled by us.
Moreover, allowing numpy.distutils to give an incompatible flags+library info combination is a trap for 3rd party modules, whose authors may not be aware (and should not be requried to be!) of the Fortran ABI issues.
NOTE: this PR affects also Intel MKL which I think may have similar ABI issues, so also it should be tested. I don't yet know how far the consequences stretch in this direction, so feedback would be useful.
@cournape: heads up, this probably may affect Enthought's build rig