Conversation
Checks if executors are busy waiting while they should block in spin_some or spin_all. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
This test was strange. It looked like, it assumed that spin_all did not return instantly. Also it was racy, as the thread could terminate instantly. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
…available Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Before, the method would not recollect available work in case of spin_some, spin_all. This would lead to the method behaving differently than to what the documentation states. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
6ff50bf to
9495fa7
Compare
|
I looked further into this, as all the tests lid up. @wjwwood @mjcarroll I noticed, that the spin_some_impl method did not act according to the documentation. |
|
@jmachowinski sorry i missed this, here is new CI. CI: |
| RCPPUTILS_SCOPE_EXIT(this->spinning.store(false); ); | ||
|
|
||
| // clear result, to force recollect of ready executables | ||
| wait_result_.reset(); |
There was a problem hiding this comment.
I think this change is correct, but it doesn't address what was originally reported in #2508, which is about avoiding a busy wait in spin_all. Instead this change prevents old data from a previous spin being used at the beginning of this spin_some/all.
| // In the case of spin some, then we can exit | ||
| // In the case of spin all, then we will allow ourselves to wait again. | ||
| break; | ||
| if (!work_available || !exhaustive) { |
There was a problem hiding this comment.
I don't think this is the right logic actually. If this is called with exhaustive = false and the first wait_for_work() collects two things to do, this function would stop after executing only one of them, when it should execute both. (!true || !false) -> true -> break.
The && ensured that you didn't consider the exhaustive option until you finished what you collected, which is the behavior of spin_some:
/// Collect work once and execute all available work, optionally within a max duration.
If exhaustive = true then it's spin_all which not only wants to finish all collected work after waiting, but also wants to wait again and again until there's no work available just after waiting:
/// Collect and execute work repeatedly within a duration or until no more work is available.
With all of that in mind, I will rewrite the logic of this function to make it clearer what is happening and why.
There was a problem hiding this comment.
You would not end up in this code path as long as work is available, as get_next_ready_executable(any_exec) would return true.
| // Long timeout, but should not block test if spin_all works as expected as we cancel the | ||
| // executor. | ||
| bool spin_exited = false; | ||
| std::thread spinner([&spin_exited, &executor, this]() { | ||
| executor.spin_all(1s); | ||
| executor.remove_node(this->node, true); | ||
| spin_exited = true; | ||
| }); | ||
|
|
||
| // Do some work until sufficient calls to the waitable occur | ||
| auto start = std::chrono::steady_clock::now(); | ||
| while ( | ||
| my_waitable->get_count() <= 1 && | ||
| !spin_exited && | ||
| (std::chrono::steady_clock::now() - start < 1s)) | ||
| { | ||
| my_waitable->trigger(); | ||
| this->publisher->publish(test_msgs::msg::Empty()); | ||
| std::this_thread::sleep_for(1ms); | ||
| } | ||
| // trigger multiple times, so that some work is available | ||
| my_waitable->trigger(); | ||
| my_waitable->trigger(); | ||
| my_waitable->trigger(); | ||
|
|
||
| executor.cancel(); | ||
| start = std::chrono::steady_clock::now(); | ||
| while (!spin_exited && (std::chrono::steady_clock::now() - start) < 1s) { | ||
| std::this_thread::sleep_for(1ms); | ||
| } | ||
| // Long timeout, but should not block as almost no work is available | ||
| executor.spin_all(1s); | ||
| executor.remove_node(this->node, true); | ||
|
|
||
| EXPECT_LT(1u, my_waitable->get_count()); | ||
| waitable_interfaces->remove_waitable(my_waitable, nullptr); | ||
| ASSERT_TRUE(spin_exited); | ||
| spinner.join(); |
There was a problem hiding this comment.
When making a change to the behavior of a function in a way that shouldn't be breaking existing users, it's best to not change the test too. Looking at these changes, I don't see why they are necessary. It might be cleaner or simpler, but it's better to do that in a separate pull request.
There was a problem hiding this comment.
This test triggers as a false positive on my 7800X3D. The problem is, that the spinner thread starts and terminates, before the waitable is even triggered.
| executor->add_node(node); | ||
|
|
||
| // spin a bit to get all events from the addition of the node processed | ||
| executor->spin_all(std::chrono::milliseconds(100)); |
There was a problem hiding this comment.
Rather than do this, I think it's better to create an isolated callback group and use that for testing, so that transient node-specific stuff does not interfere. This kind of "spin a bit and hope the resulting state is what I'm expecting" is pretty fragile. And I know we do that in other places, but I'd rather not add more cases of it.
| // we allow a big number here, as in case of real active waiting | ||
| // it will be off by a few thousands anyway. | ||
| ASSERT_LT(waitable->get_is_ready_call_count() - start_is_ready_count, 30); |
There was a problem hiding this comment.
This is a pretty fuzzy assertion. It would be better to use a number based on the number of things we expect need handling or something like that, rather than arbitrarily using 30. I think avoiding adding the entire node and instead using a single callback group might help with that, but we also want to avoid coupling the test too tightly to the current implementation details, as well as leaving some flexibility for the executor implementations. I think in general testing for busy waits is going to be really challenging, given that this is not a guarantee made by the interface, e.g. someone could create a busy-wait/polling executor because that met their needs and that would break this test. It would be like making a test for std::condition_variable::wait_for that stated that the predicate could not be called more than X number of times. For some value of X that might be a reasonable check, but if it's too high you get too many false positives for the test, but if it's too low you might be false negatives, given that you don't know the implementation details.
That all being said, I do appreciate trying to make a test for this, and I think we should keep it until some executor breaks the mold.
|
Closing in favor of #2517, but we can reopen this one if needed. |
Fixes #2508