Eigen openacc compatibility#690
Conversation
|
Can one of the admins verify this patch? |
|
Logfiles from GitLab pipeline #8037 (:no_entry:) have been uploaded here! Status and direct links: |
bbp-hpcteam
left a comment
There was a problem hiding this comment.
Just a quick note for failing clang-format: note that we use clang-format v11 for formatting check under CI. I have added an item for this Thursday meeting to discuss project specific clang-format version.
|
Logfiles from GitLab pipeline #8141 (:white_check_mark:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #8142 (:white_check_mark:) have been uploaded here! Status and direct links: |
| static int count_length(const std::vector<SymbolType>& variables) { | ||
| int length = 0; | ||
| for (const auto& variable: variables) { | ||
| length += variable->get_length(); | ||
| } | ||
| return length; | ||
| }; |
There was a problem hiding this comment.
this function is not really needed since it can be replaced with a one-liner:
std::accumulate(v.begin(), v.end(), 0, [](int l, const SymbolType& variable) {return a+= variable->get_length(); })
| void CodegenCudaVisitor::print_backend_includes() { | ||
| printer->add_line("#include <cuda.h>"); | ||
|
|
||
| if (info.eigen_linear_solver_exist && count_length(info.state_vars) > 4) { |
There was a problem hiding this comment.
not blocking but a one-liner with accumulate here is better than a new static function since the code remains cleaner. As you prefer
|
|
||
|
|
||
| template <typename T> | ||
| bool test_Crout_correctness(T rtol = 1e-6, T atol = 1e-6) { |
There was a problem hiding this comment.
these tolerances are quite big. why?
There was a problem hiding this comment.
@cattabiani you mean that we can relax them even more (like 1e-3)?
There was a problem hiding this comment.
I think he meant the inverse - why not something much closer to eps ?
There was a problem hiding this comment.
Yes, I meant what Ohm said
There was a problem hiding this comment.
@ohm314 & @cattabiani : I have modified Crout solver and I have resolved an accuracy issue that I had. I computed the relative error (A*x-b)/b for both eigen and crout, and crout's relative error was consistently two orders of magnitude higher than eigen one (before the modification). Therefore, for small tolerances my tests where failing. Now (after the modification), the relative error for both solvers has the same order of magnitude.
|
|
||
|
|
||
| template <typename T> | ||
| bool test_Crout_correctness(T rtol = 1e-6, T atol = 1e-6) { |
There was a problem hiding this comment.
I think he meant the inverse - why not something much closer to eps ?
| #pragma acc routine seq | ||
| #endif | ||
| template <typename T> | ||
| EIGEN_DEVICE_FUNC inline void Crout(int d, T* S, T* D) { |
There was a problem hiding this comment.
can this not be done with anything better than C-style arrays?
There was a problem hiding this comment.
I prefered not to change the initial solver (C-style arrays) to avoid bugs. The C-style arrays are readily combined with the .data() of Eigen and therefore, I think that after an extensive testing, we could consider its modernization (if everything passes).
…resolve accuracy issues
|
Logfiles from GitLab pipeline #8300 (:white_check_mark:) have been uploaded here! Status and direct links: |
…wMajor storage order
|
Logfiles from GitLab pipeline #8373 (:white_check_mark:) have been uploaded here! Status and direct links: |
…wMajor storage order (improvement with in-place transposition)
|
Logfiles from GitLab pipeline #8413 (:white_check_mark:) have been uploaded here! Status and direct links: |
…r and its unit test (full compatibility with OpenACC/CUDA backends)
|
Logfiles from GitLab pipeline #8532 (:no_entry:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #8531 (:no_entry:) have been uploaded here! Status and direct links: |
…ag for PGI - resolve failing CI
|
Logfiles from GitLab pipeline #8565 (:white_check_mark:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #11210 (:no_entry:) have been uploaded here! Status and direct links: |
|
Logfiles from GitLab pipeline #15092 (:white_check_mark:) have been uploaded here! Status and direct links: |
olupton
left a comment
There was a problem hiding this comment.
Looking good, let's test it again after a merge/rebase with the latest master.
| #include <algorithm> | ||
| #include <cmath> | ||
| #include <ctime> | ||
| #include <numeric> |
| const std::string& X, | ||
| const std::string& Jm, | ||
| const std::string& F) { | ||
| // The Eigen::PartialPivLU is not compatible with GPUs (no __device__ tokens). |
There was a problem hiding this comment.
Is this still true with the latest Eigen changes we plan to use? (https://github.com/BlueBrain/eigen/)
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} --diag_suppress 1,82,111,115,177,186,611,997,1097,1625") | ||
|
|
||
| # Needed for Eigen | ||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wc,--pending_instantiations=0") |
There was a problem hiding this comment.
The surrounding code has changed a bit in master, it would be good to merge/rebase before final review/testing.
| using VecType = Matrix<T, Dynamic, 1>; | ||
|
|
||
| std::random_device rd; // seeding | ||
| std::mt19937 mt(rd()); |
There was a problem hiding this comment.
If we're going to use a pseudorandom seed then we should at least print what it is, in case we need to reproduce/debug an issue.
|
Logfiles from GitLab pipeline #15548 (:no_entry:) have been uploaded here! Status and direct links: |
|
For the build PGI pipeline we will need to bring back in CMake the following: |
|
Superseded by #728 |
This PR solves the compatibility issues of Eigen with GPUs (I fixed both the OpenACC & CUDA backends).
The Eigen::PartialPivLU (LU decomposition to solve linear systems) is not compatible with GPUs (no device tokens).
For matrices up to 4x4, the Eigen inverse() has template specializations decorated with host & device tokens. Therefore, we use the inverse method instead of the PartialPivLU (requires an invertible matrix) which supports both CPUs & GPUs.
For matrices 5x5 and above, Eigen does not provide GPU-enabled methods to solve small linear systems. For this reason, we use the Crout LU decomposition (Legacy code : coreneuron/sim/scopmath/crout_thread.cpp). For the CPU exectutions, we use the standard PartialPivLU from Eigen.
For the Crout LU-decomposition, I have included a unit test to compare its result with Eigen::PartialPivLU.