Skip to content

BUILD: use standard build of OpenBLAS for aarch64, ppc64le, s390x#15279

Merged
tylerjereddy merged 4 commits intonumpy:masterfrom
mattip:openblas
Jan 20, 2020
Merged

BUILD: use standard build of OpenBLAS for aarch64, ppc64le, s390x#15279
tylerjereddy merged 4 commits intonumpy:masterfrom
mattip:openblas

Conversation

@mattip
Copy link
Copy Markdown
Member

@mattip mattip commented Jan 7, 2020

In MacPython/openblas-libs#9 we added aarch64, ppc64le, s390x to the standard builds of the OpenBLAS libs. On those architectures we use the multilinux2014 docker image, x86 still uses multilinux1. This PR now uses those CI builds (instead of manual builds) for numpy.

This simplifies tools/openblas_support.py and gets us one step closer to wheels for other architectures (gh-14886)

@mattip
Copy link
Copy Markdown
Member Author

mattip commented Jan 7, 2020

Hmm. The OpenBLAS libs that MacPython/openblas-libs#9 built are 0.3.8dev. Are we ready to move off 0.3.7 ?

@charris
Copy link
Copy Markdown
Member

charris commented Jan 7, 2020

Are we ready to move off 0.3.7

It doesn't hurt for testing purposes as long as there aren't a bunch of errors. I assume 3.8 will be out before the 1.19 release.

@tylerjereddy
Copy link
Copy Markdown
Contributor

It looks like the new ppc64le OpenBLAS depends on a runtime libgfortran that isn't available by default on the Travis image. In fact, it looks like that libgfortran.so.5 is associated with rather new gcc version (8.x ?). Was it necessary to bump the gcc toolchain by that much upstream somewhere?

@mattip
Copy link
Copy Markdown
Member Author

mattip commented Jan 12, 2020

gcc8 is a side effect of using manylinux2014. Maybe we can get the MacPython/openblas-libs recipe to statically link to libgfortran.a. @tylerjereddy I seem to remember some work in this direction, or maybe that was to statically link libopenblas to _multiarray_umath?

@tylerjereddy
Copy link
Copy Markdown
Contributor

The static linking of OpenBLAS to the gcc/gfortran runtime was relatively easy on Mac, and I even did a demo PR to the NumPy repo that contained links to the example script: #13191

But on Linux I thought we'd need to build the gcc toolchain from source because of the same problem Julia had: JuliaLang/julia#326 (comment)

I imagine Matti could build the gcc toolchain from source (I've never done it)---I suppose if you wanted that in a workflow you'd need a custom docker container image so we don't have to build from source each time?

We've had a few different scattered discussions about static linking of OpenBLAS to the gcc toolchain runtime, but it never really caught much traction with the team, I think.

There was a more recent discussion where someone was claiming the above problem for Linux doesn't actually exist, but I'm pretty sure I ran into it.

@tylerjereddy
Copy link
Copy Markdown
Contributor

On an unrelated note, I better go ahead and build the 0.3.7 OpenBLAS libs for the more exotic archs just in case they are needed for debugging, etc. 0.3.8 is clearly getting a lot of changes from what I can see on the tracker. I'll try to do that on Sunday.

@tylerjereddy
Copy link
Copy Markdown
Contributor

Ok, I've started the builds upstream in MacPython/openblas-libs#13

Hopefully that will give NumPy/SciPy etc. access to the 0.3.7 version of OpenBLAS for these new archs if they need it.

@mattip
Copy link
Copy Markdown
Member Author

mattip commented Jan 13, 2020

hmm, no libgfortran5 debian package in xenial (ubuntu). So either I need to build the OpenBLAS on multilinux2014 (in MacPython/openblas-libs) with an older gcc, install libgfortran5 on the xenial image, or update to bionic

@tylerjereddy
Copy link
Copy Markdown
Contributor

If the ppc64le Travis support extends to bionic that might be worth a try. I remember the options being pretty limited for that arch, but maybe it is better now.

@mattip mattip force-pushed the openblas branch 2 times, most recently from 9523ba2 to efb5a01 Compare January 14, 2020 05:54
@mattip
Copy link
Copy Markdown
Member Author

mattip commented Jan 14, 2020

There are failures on s390x and ppc64le around einsum, maybe due to upgrading to bionic from xenial. s390x is complaining about uninitialized values in arrays:

  numpy/core/src/multiarray/einsum.c.src:2155:17: warning: ‘icombinemap[label]’ \
  may be used uninitialized in this function [-Wmaybe-uninitialized]

  numpy/core/src/multiarray/einsum.c.src:2158:25: warning: ‘new_dims[i]’ \
  may be used uninitialized in this function [-Wmaybe-uninitialized]

  numpy/core/src/multiarray/einsum.c.src:2166:28: warning: ‘new_strides[i]’ \
  may be used uninitialized in this function [-Wmaybe-uninitialized]

and ppc64le is failing an einsum test at numpy/core/tests/test_einsum.py:291
when n = 9, dtype = 'c8', do_opt = False

        # inner(a,b)
        for n in range(1, 17):
            a = np.arange(2 * 3 * n, dtype=dtype).reshape(2, 3, n)
            b = np.arange(n, dtype=dtype)
            assert_equal(np.einsum("...i, ...i", a, b, optimize=do_opt),
                                 np.inner(a, b))

The other x86 failure can be fixed by adding the proper packages to the addons: apt stanza for that build in the .travis.yml

@mattip
Copy link
Copy Markdown
Member Author

mattip commented Jan 15, 2020

maybe connected with signed/unsigned int confusion in PowerPC

@seberg
Copy link
Copy Markdown
Member

seberg commented Jan 16, 2020

Ah, I was thinking about this yesterday: #14693 but seems that is already fixed then...

@mattip
Copy link
Copy Markdown
Member Author

mattip commented Jan 17, 2020

The ppc64le failure is due to inner(a, b) returning the wrong answer.

Details
correct (from einsum): array([[ 204.+0.j,  528.+0.j,  852.+0.j],
    [1176.+0.j, 1500.+0.j, 1824.+0.j]], dtype=complex64)

incorrect (from inner): array([[ 204.+0.j,  500.+0.j,  852.+0.j],
    [1148.+0.j, 1500.+0.j, 1796.+0.j]], dtype=complex64)

On X86_64 the calculation is equivalent to np.dot(a, b) and hits CFLOAT_dot for the 6 calculations of a[i, j, :] * b. It should be quite straightforward: np.sum(atag * b) where atag is np.range(i * 9, (i+1) * 9, dtype='c8') for i in range(6) and b is np.arange(9, dtype='c8'). For some reason the result is off by 28 for every other calculation (i odd).

@mattip
Copy link
Copy Markdown
Member Author

mattip commented Jan 17, 2020

Debugging this on the gcc112 ppc64le machine on the gcc farm, it seems there is a problem with cblas_cdotu_sub but only on the ppc64le machine, maybe connected to alignment?. xref OpenMathLib/OpenBLAS#2369

@mattip
Copy link
Copy Markdown
Member Author

mattip commented Jan 17, 2020

OpenBLAS devs recommend using 0.3.8dev which should fix the alignment problem

@mattip
Copy link
Copy Markdown
Member Author

mattip commented Jan 19, 2020

tests are all passing, ready for review

@mattip mattip requested a review from tylerjereddy January 19, 2020 06:20
@tylerjereddy tylerjereddy added this to the 1.19.0 release milestone Jan 20, 2020
Copy link
Copy Markdown
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM & probably good to identify issues that may crop up as the community starts to look at supporting wheels for these archs.

env:
global:
- OpenBLAS_version=0.3.7
- OpenBLAS_version=0.3.8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it is 0.3.8.dev. Perhaps in a follow-up PR we might clamp down on the "string in result" check in openblas_support.py to distinguish between the development branch and the stable release of 0.3.8, but that's a minor point really.

@tylerjereddy tylerjereddy changed the title WIP, BUILD: use standard build of OpenBLAS for aarch64, ppc64le, s390x BUILD: use standard build of OpenBLAS for aarch64, ppc64le, s390x Jan 20, 2020
@tylerjereddy tylerjereddy merged commit a16dfb5 into numpy:master Jan 20, 2020
@tylerjereddy
Copy link
Copy Markdown
Contributor

Thanks Matti

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants