Hello everyone
I've noticed that order of loops in xGETC2 is not optimal. It goes through the rows of matrix. But our matrices have column layout. Is not it better to change the order of loops, so we get more cache friendly algorithm?
It can cause difference in case, if we have two equal maximum elements in matrix, so we get different LU decomposition and different ipiv/jpiv arrays. But it seems, that both of this results should be correct
For example, see SRC/sgetc2.f, lines 178-187:
XMAX = ZERO
DO 20 IP = I, N
DO 10 JP = I, N
IF( ABS( A( IP, JP ) ).GE.XMAX ) THEN
XMAX = ABS( A( IP, JP ) )
IPV = IP
JPV = JP
END IF
CONTINUE
CONTINUE
Changing the order of loops makes IP continuous index, and cache works better
Hello everyone
I've noticed that order of loops in xGETC2 is not optimal. It goes through the rows of matrix. But our matrices have column layout. Is not it better to change the order of loops, so we get more cache friendly algorithm?
It can cause difference in case, if we have two equal maximum elements in matrix, so we get different LU decomposition and different ipiv/jpiv arrays. But it seems, that both of this results should be correct
For example, see SRC/sgetc2.f, lines 178-187:
Changing the order of loops makes IP continuous index, and cache works better