Suprisingly slow `dsyrk` on Mac

Here is an example on Mac OS 10.9 on a Late 2013 MacBook Pro 15" with a 2.3 GHz Intel Core i7-4850HQ using OpenBLAS 0.2.15, Python 2.7.11, NumPy 1.10.2, SciPy 0.16.1. I have verified that NumPy and SciPy are properly linked to the version of OpenBLAS specified. OpenBLAS has built with the following flags `make DYNAMIC_ARCH=1 BINARY=64 NO_LAPACK=0 NO_AFFINITY=1 NUM_THREADS=1` no other options or modifications were made.

```
>>> import numpy
>>> from scipy.linalg import blas
>>> a = numpy.ones((100,101), dtype=numpy.float32)
>>> %timeit blas.ssyrk(1, a)
10000 loops, best of 3: 37.8 µs per loop
>>> a = numpy.ones((100,101), dtype=numpy.float64)
>>> %timeit blas.dsyrk(1, a)
1000 loops, best of 3: 520 µs per loop
```

I wouldn't be surprised to see it takes twice as a long due to the fact that it is double versus single precision; however, taking over an order of magnitude longer seems to be a bit much.

Following the same build procedure on a Linux VM on the same machine (uses VirtualBox 5.0.12), I get a much more reasonable time for `dsyrk` (around double `ssyrk`). I have no idea whether this carries over to Windows or not. If someone is able to reproduce a similar example using C or Fortran, please share your steps.

After further discussion, we found this was array size dependent. Below is a graph showing this dependence. More details about how the graph was made can be found in this comment. ( https://github.com/xianyi/OpenBLAS/issues/730#issuecomment-171545950 )

![performance_of_openblas_syrk_subroutines](https://cloud.githubusercontent.com/assets/3019665/12317033/00983468-ba5b-11e5-9d10-fe25db36df99.png)

For comparison, one can look at the time taken by `sgemm` and `dgemm`, but one will not see this behavior.

![performance_of_openblas_gemm_subroutines](https://cloud.githubusercontent.com/assets/3019665/12418718/f348683c-be7e-11e5-8a98-24bfc2249ed4.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suprisingly slow `dsyrk` on Mac #730

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suprisingly slow dsyrk on Mac #730

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Suprisingly slow `dsyrk` on Mac #730