Skip to content

cblas_cdotu_sub on ppc64le requires ip1 ip2 to be aligned? #2369

@mattip

Description

@mattip

Summary: there is a strange error in cblas_cdotu_sub on ppc64le.

I am trying to get numpy to use OpenBLAS 0.3.7.0 on aarch64, ppc64le, and s390x. xref numpy/numpy#15279. Everything seems to be passing with all architectures except ppc64le. This uses OpenBLAS compiled via the manylinux2014 docker image in the MacPython/openblas-lib github repo, which uploads the artifacts to https://3f23b170c54c2533c070-1c8a9b3114517dc5fe17b7c3f8c63a43.ssl.cf2.rackcdn.com, in particular the PR uses the openblas-v0.3.7-manylinux2014_ppc64le.tar.gz tarball for ppc64le. It seems there is something funky with cblas_cdotu_sub(9, ip1, 1, ip2, 1, res) where

(float)ip1@18 -> {9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0}
(float)ip2@18 -> {0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0}

It sets res to {500, 0} not {528, 0}. Is there a requirement that the memory for ip1, ip2 be aligned on ppc64le? The source of ip1 is 6 iterations over a 6x9 complex64 numpy array (ip2 is a 1x9 vector), so the even iteration of the 6 calls give the right answer and the odd iterations (like above where ip1 is 0x10519a78) give the wrong answer.

The tarball we used until now, produced by @tylerjereddy with a different compiler, does not have this error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions