You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Modern Intel architectures supporting fma instruction sets can perform the first loop calculating the matrix-matrix product between panels a and b in one go using _mm256_fmadd_pd. We should implement these and see how they affect performance.
Modern Intel architectures supporting fma instruction sets can perform the first loop calculating the matrix-matrix product between panels a and b in one go using
_mm256_fmadd_pd. We should implement these and see how they affect performance.