[core] Remove perf test in shutdown_coordinator_test#56033
[core] Remove perf test in shutdown_coordinator_test#56033dayshah merged 2 commits intoray-project:masterfrom
shutdown_coordinator_test#56033Conversation
Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request correctly disables a performance test under TSAN to prevent spurious failures. The change to use std::chrono::steady_clock instead of std::chrono::high_resolution_clock is an excellent improvement for ensuring monotonic time measurement in the benchmark. The code is clean and the changes are well-justified.
| volatile bool result = false; | ||
|
|
||
| for (int i = 0; i < iterations; ++i) { | ||
| result = coordinator->ShouldEarlyExit(); |
There was a problem hiding this comment.
why do we want to microbenchmark this? If we want to do proper microbenchmark, we should use libraries like https://github.com/google/benchmark since writing good and reliable benchmark is pretty hard.
There was a problem hiding this comment.
agree especially not in our ci env
There was a problem hiding this comment.
ShouldEarlyExit() sits on hot paths (e.g., CoreWorker::IsExiting() checks inside tight loops). In an earlier (unmerged) implementation of this method, I was using more complex atomic flag and memory ordering. It was simplified to simply take a lock but there was a perf concern. So, I added this test to get a sense of how much time it takes. I can remove it form unit test. Don't see a very good utility in running this now.
dayshah
left a comment
There was a problem hiding this comment.
i still don't get why this doesn't pass tsan, is there inherently something unsafe with ShouldEarlyExit. I think I'm just missing context on this test and why we need to test should exit perf
| auto start = std::chrono::high_resolution_clock::now(); | ||
| auto start = std::chrono::steady_clock::now(); | ||
| constexpr int iterations = 1000000; | ||
| volatile bool result = false; |
There was a problem hiding this comment.
To prevent the compiler from optimizing away the calls/loop. Without volatile (or another optimizer barrier), the compiler may elide repeated reads with no observable side effects.
| volatile bool result = false; | ||
|
|
||
| for (int i = 0; i < iterations; ++i) { | ||
| result = coordinator->ShouldEarlyExit(); |
There was a problem hiding this comment.
should just wrap this in void( or RAY_UNUSED instead of having the unused result var
There was a problem hiding this comment.
yeah can do, however, void/RAY_UNUSED only silence "unused variable" warnings; they don’t prevent call elision. So, i can keep volatile to force evaluation; keep (void)result or RAY_UNUSED(result); to silence the warning.
But, I am considering removing the test altogether -- #56033 (comment)
Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com>
jjyao
left a comment
There was a problem hiding this comment.
Please update PR title and description to reflect the latest
shutdown_coordinator_test under tsanshutdown_coordinator_test
| // Should be very fast (less than 100ns per call on modern hardware) | ||
| double ns_per_call = static_cast<double>(duration.count()) / iterations; | ||
| EXPECT_LT(ns_per_call, 100.0) | ||
| << "ShouldEarlyExit too slow: " << ns_per_call << "ns per call"; |
There was a problem hiding this comment.
we should not write this type of test in CI as a general rule; it's destined to be flaky.
if we want to measure performance, we should do it as a release test where we actually track the metrics over time.
testing this specific call also doesn't make much sense to me.
…6033) Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com> Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
…6033) Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com> Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
…6033) Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com> Signed-off-by: Gang Zhao <gang@gang-JQ62HD2C37.local>
…6033) Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com> Signed-off-by: sampan <sampan@anyscale.com>
…6033) Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
…6033) Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com> Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>
…6033) Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
…6033) Signed-off-by: Sagar Sumit <sagarsumit09@gmail.com>
Why are these changes needed?
The test microbenchmarks a method (
ShutdownCoordinator::ShouldEarlyExit) which uses a lock. It's checked at task boundaries and event loop posts. That's orders of magnitude less frequent than per-object/serialization loops. But, it is not very useful. So, removing the test.Related issue number
Closes #55801