Skip to content

Suprisingly slow dsyrk on Mac #730

@jakirkham

Description

@jakirkham

Here is an example on Mac OS 10.9 on a Late 2013 MacBook Pro 15" with a 2.3 GHz Intel Core i7-4850HQ using OpenBLAS 0.2.15, Python 2.7.11, NumPy 1.10.2, SciPy 0.16.1. I have verified that NumPy and SciPy are properly linked to the version of OpenBLAS specified. OpenBLAS has built with the following flags make DYNAMIC_ARCH=1 BINARY=64 NO_LAPACK=0 NO_AFFINITY=1 NUM_THREADS=1 no other options or modifications were made.

>>> import numpy
>>> from scipy.linalg import blas
>>> a = numpy.ones((100,101), dtype=numpy.float32)
>>> %timeit blas.ssyrk(1, a)
10000 loops, best of 3: 37.8 µs per loop
>>> a = numpy.ones((100,101), dtype=numpy.float64)
>>> %timeit blas.dsyrk(1, a)
1000 loops, best of 3: 520 µs per loop

I wouldn't be surprised to see it takes twice as a long due to the fact that it is double versus single precision; however, taking over an order of magnitude longer seems to be a bit much.

Following the same build procedure on a Linux VM on the same machine (uses VirtualBox 5.0.12), I get a much more reasonable time for dsyrk (around double ssyrk). I have no idea whether this carries over to Windows or not. If someone is able to reproduce a similar example using C or Fortran, please share your steps.

After further discussion, we found this was array size dependent. Below is a graph showing this dependence. More details about how the graph was made can be found in this comment. ( #730 (comment) )

performance_of_openblas_syrk_subroutines

For comparison, one can look at the time taken by sgemm and dgemm, but one will not see this behavior.

performance_of_openblas_gemm_subroutines

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions