work around slice_test compiler bug (for windows bazel opt build)#20555
Conversation
test/core/slice/slice_test.cc
Outdated
| grpc_slice src2 = grpc_slice_from_copied_string("hello123456789123456789"); | ||
|
|
||
| // Do not remove the log line. It actually supresses a compiler bug in windows | ||
| // bazel opt build. See https://github.com/grpc/grpc/issues/20519 |
There was a problem hiding this comment.
Is there a way to loudly mark the test as "skipped", rather than using this workaround? Very interesting finds though.
There was a problem hiding this comment.
We could just skip the test and file an issue to followup. I'm still trying to find the actual MSVC bug listing to make 100% sure this is actually a compiler bug, but it's very time-consuming as I don't have much to go off of.
So far:
- I've been able to reproduce under bazel with VS2015, VS2017 and VS2019
- when I build with cmake (Release build), I'm unable to reproduce - which is pretty strange, but I haven't yet compared the exact compiler flags being used (but I checked they both run with /O2)
|
@apolcyn ok I think I start to understand what's going on here.
The above explains the situation and is consistent with all my experiments. What's unclear is whether relying on src1 and src2 being different variables (and thus assuming they have a different address) is legitimate C/C++ or not - we could ask C++ experts. Depending on the answer a) either our test is wrong b) there's a compiler bug (including VS2019) where the optimizer fails to hide that we it has reduced two local variables into one and we can observe it through our code. |
|
I also figured out why I wasn't able to reproduce with cmake build before:
I was able to reproduce with cmake 64bit Release build on windows locally. |
|
I updated the PR with what I think is a better workaround. I think I now have a good understanding of the situation so I feel it should be safe to merge and unbreak the windows opt test finally. |
| // distinct (even on windows opt 64bit build). | ||
| // See https://github.com/grpc/grpc/issues/20519 | ||
| GPR_ASSERT(src1.refcount); | ||
| GPR_ASSERT(GRPC_SLICE_START_PTR(src1) != GRPC_SLICE_START_PTR(src2)); |
There was a problem hiding this comment.
alternatively we could add (&src1 != &src2) condition to the assert which also unbreaks the test.
There was a problem hiding this comment.
Personally I like this idea of adding GPR_ASSERT(&src1 != &src2), I think it makes the theory explained here more explicit.
|
@arjunroy needs to see this imo because of recent slice changes |
|
I will say that I don't see anything obviously wrong here, though I may be missing something. It is certainly valid to assume that two different objects have different addresses, and it is valid to assume objects within each of two distinct objects have different addresses. If you never ask about these addresses the compiler is free to collapse the two objects into one, under "as-if" rules. This whole thing smells like a bug along exactly those lines -- the compiler is failing to realize it can't do that optimization because we're taking the address of a sub-object. |
apolcyn
left a comment
There was a problem hiding this comment.
LGTM to fix the test. Though as mentioned this could still probably use more eyes
|
Merging to fix the test. I still plan to follow up and I was actually able to come up with an isolated and simplified repro |
|
Known failures: #18404 |
|
Link to compiler issue I filed: https://developercommunity.visualstudio.com/content/problem/779761/optimizer-bug-in-c-leads-to-incorrect-behavior-at.html |
Fixes #20519.
The fix may look a bit random but I actually spent a fair amount of time debugging the problem
and I'm now pretty sure that the failure is due to a compiler bug.
See https://github.com/grpc/grpc/pull/20554/files that demonstrates and explains the problem.