-
Notifications
You must be signed in to change notification settings - Fork 1.7k
cblas_cdotu_sub on ppc64le requires ip1 ip2 to be aligned? #2369
Description
Summary: there is a strange error in cblas_cdotu_sub on ppc64le.
I am trying to get numpy to use OpenBLAS 0.3.7.0 on aarch64, ppc64le, and s390x. xref numpy/numpy#15279. Everything seems to be passing with all architectures except ppc64le. This uses OpenBLAS compiled via the manylinux2014 docker image in the MacPython/openblas-lib github repo, which uploads the artifacts to https://3f23b170c54c2533c070-1c8a9b3114517dc5fe17b7c3f8c63a43.ssl.cf2.rackcdn.com, in particular the PR uses the openblas-v0.3.7-manylinux2014_ppc64le.tar.gz tarball for ppc64le. It seems there is something funky with cblas_cdotu_sub(9, ip1, 1, ip2, 1, res) where
(float)ip1@18 -> {9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0}
(float)ip2@18 -> {0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0}
It sets res to {500, 0} not {528, 0}. Is there a requirement that the memory for ip1, ip2 be aligned on ppc64le? The source of ip1 is 6 iterations over a 6x9 complex64 numpy array (ip2 is a 1x9 vector), so the even iteration of the 6 calls give the right answer and the odd iterations (like above where ip1 is 0x10519a78) give the wrong answer.
The tarball we used until now, produced by @tylerjereddy with a different compiler, does not have this error.