Fixing declaration of inner OpenMP privates#3348
Conversation
|
Unfortunately I am not familiar enough with the internals of Cython to comment on this PR. But as I was pinged, let me just suggest to write a new non-regression test for this fix :) |
bfe653c to
aa3b13b
Compare
|
I just provided a much simpler fix, which simply ignore inner loops as temp-collecting blocks. |
aa3b13b to
549ffe2
Compare
|
Ah, yes, that seems a very good way to resolve this. |
|
Will think about a reliable test. |
|
I tried, but this is rather difficult to produce reliably. The difficulty is that the race is in a monolithic generated code; it cannot be influenced/changed from cython code. The region for possible errors is super small. cython/tests/run/sequential_parallel.pyx Line 381 in 0b383fa Any suggestions appreciated. |
|
Could you post an example of the code that fails intermittently? |
|
I think it's already an improvement to have a test that at least exercises the code, even if it's not guaranteed to fail with a race condition. If enabling the disabled test that you pointed to is all it takes, and if the tests then proves to work on Jenkins, that's great, let's just try that! |
|
I had a thought on testing - you should be able to tell if variables are private by their address: At the minute both |
|
@da-woods the problem is with temp variables, not user visible ones. That makes it difficult to do anything in user test code. What could fail (or … work, whatever) is having thread-specific calculations that require a temp variable and use the same temp name in the C code, and then collect the results to check if all threads produced their correct result or if any of them leaked. |
|
I enabled the test. I can only reliably make something fail on my offload branch, where the parallel sections is offloaded to a GPU/device using #pragma omp target. The higher degree of parallelism on a GPU makes it more sensitive to issues like this. The PR will definitely include a test that usually failed without this fix. I made it an extra PR because it is not device specific but generally how OpenMP is used. |
tests/run/sequential_parallel.pyx
Outdated
| the first line of output being '0 0 0 0'. The generated code looks | ||
| awfully correct though... needs investigation | ||
|
|
||
| >> test_nested_break_continue() |
There was a problem hiding this comment.
Actually, it's this line that disabled the test (I had to take a second look, too). Note the >> at the beginning, which isn't Python's >>>.
|
Yes - I can see that |
|
oh, I see. Fixed it. |
Here's a test that I believe works. I've abused C++ assignment operators to get the addresses of the temporary variables. Private variables have different addresses in different threads so this should be reliable and not depend on race conditions. |
|
Cool. I just checked and it actually behaves as expected on master and this branch. I am not enough familiar with all the intricate details of OpenMP to be 100% sure this must work according to the spec. Still, the only critical assumption made here seems to be that the 2 loop-iterations are actually executed on different threads. I specified schedule='static' and chunksize=1 in the first parfor to make that more explicit and added the example to the run-tests. Hope this is the right way to do this. |
@fschlimb Honestly me neither. One option I considered was to do I wonder if it should be in a separate file though, since it forces C++, and the other tests don't? |
|
Yeah, adding C++ into the game makes this a bit heavy – and it also means that the C compilation mode can't be used to test this. OpenMP is a compiler feature, so that's an actual drawback. It definitely needs a separate test file, because just adding it (and especially its However, now that we have this new test, I think we can keep it as a separate test file, and rely on the now-enabled old test to cover the C case. |
|
Ok, done. |
|
@fschlimb It looks like you missed the |
40018c1 to
abaf44c
Compare
|
Thanks @da-woods, this is really nice. I modified the test so that it will succeed if OpenMP is not enabled with -fopenmp. |
* omp declare privates on outer loop, since inner loops are not 'omp parallel for'ified
Currently inner/nested pranges are not translated to an active OpenMP parallel for.
The private temporary variables identified for these inner loops were only declared in the
but not anywhere else. This apparently causes races.
The suggested code propagates the privatization to the outer construct so that it gets actually exposed.