[grid] reduce shared memory usage#2793
Conversation
mtaillefumier
commented
May 23, 2023
- the coefficients are now read directly from global memory when lp_max > 4 instead of storing them in shared memory. This reduce shared memory usage for large l
- grid_miniapp seems unhappy and trigger a race condition when hab coefficients are calculated. So add an atomicAdd.
- minor changes in the cmake build system.
- the coefficients are now read directly from global memory when lp_max > 4 instead of storing them in shared memory. This reduce shared memory usage for large l - grid_miniapp seems unhappy and trigger a race condition when hab coefficients are calculated. So add an atomicAdd. - minor changes in the cmake build system.
|
Oh this is nice! I also contemplated storing the Btw, this exception in the unittest should then no longer be needed. Linking #1785 for posterity. |
|
I only supressed the cxyz coefficients from the shared memory which means that we still have to store cab and alpha. alpha ain't an issue the cab might. for instance ncoset(l = 10) x ncoset(l = 10) is 286^2 doubles which is higher than the shared memory available. One solution might be some hybrid case where we sort task in to low l and high l and then use a different algorithm for low and high l. I can certainly do this on my side (cab in global memory) as integrate/collocate are separated from the calculation of the coefficients (at the prize of more used global memory). I will lift the exception asap (tomorrow). From experience collocate/integrate will always dominate the timers so if we loose a little with the cab been in global memory then be it. The overall gain of treating large l on GPU instead of CPU is worth the effort. it would be worth triggering the HIP pascal tests. the grid_miniapp passes but still |
I actually did the opposite and partitioned
|
indeed. I removed the cab from shared memory entirely. Something is still puzzling me. I get this when I run grid_unittest (same hardware both cases). GPU backend hip backend |
That looks strangely identical. There should be at least numerical noise. What does the statistics at the end say? |
|
GPU backend HIP backend |
|
they are different and it is not where my trouble comes from. It is more about the difference ref/GPU and ref/hip. |
Those difference should be ok because they are below our tolerances of 1e-12 for matrix elements and 1e-8 for forces. These "warning lines" are already printed when the diffs are surpassing 0.01 of our thresholds. While this was useful during development, now it's confusing. Hence, I've opened #2797 to fix this inconsistency. |
|
perfect. thanks for the clarification because I was searching for something wrong in the code. #2797 can be merged first then I will update this PR unless there is no conflict you can merge both. You can squash all commits in to one if you wish |