Fixup Executor::spin_all() regression fix#2517
Conversation
Checks if executors are busy waiting while they should block in spin_some or spin_all. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
This test was strange. It looked like, it assumed that spin_all did not return instantly. Also it was racy, as the thread could terminate instantly. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
…available Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Before, the method would not recollect available work in case of spin_some, spin_all. This would lead to the method behaving differently than to what the documentation states. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
wjwwood
left a comment
There was a problem hiding this comment.
Just some notes for anyone reviewing this.
| // clear the wait result and wait for work without blocking to collect the work | ||
| // for the first time | ||
| // both spin_some and spin_all wait for work at the beginning | ||
| wait_result_.reset(); | ||
| wait_for_work(std::chrono::milliseconds(0)); | ||
| bool just_waited = true; |
There was a problem hiding this comment.
This change wasn't part of the original report (busy wait in spin_all) but it is related and is needed. It essentially ensures that we don't start out spin_some or spin_all by iterating over existing work from a previous spin method. Ideally the other spin methods would reset the result before returning so that this wouldn't be necessary, but this is safer. We do want to do this because in the case of spin_some this could cause it to never wait, iterating over old work and exiting after. This isn't really what we want.
Also, it is safe to throw away a wait result and wait again as all the things we wait on need to be able to be waited on, be ready, not be handled, be waited on again, and then be ready again. Or else they need to be ok with not being handled. Put another way, being waited on and then being ready does not guarantee being executed, and it may require being waited on again.
mjcarroll
left a comment
There was a problem hiding this comment.
Overall change and test looks good. My only nit is that we need to probably put some of your comments here in the file for posterity.
Signed-off-by: William Woodall <william@osrfoundation.org>
|
I added a comment in 8587653, so if the Rpr job comes back green I'll merge without new CI, since it was just a comment and Rpr checks style and linting. |
|
I will then follow up with a jazzy backport. |
|
@Mergifyio backport jazzy |
✅ Backports have been createdDetails
|
* test(Executors): Added tests for busy waiting Checks if executors are busy waiting while they should block in spin_some or spin_all. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix: Reworked spinAll test This test was strange. It looked like, it assumed that spin_all did not return instantly. Also it was racy, as the thread could terminate instantly. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix(Executor): Fixed spin_all not returning instantly is no work was available Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * Update rclcpp/test/rclcpp/executors/test_executors.cpp Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com> * test(executors): Added test for busy waiting while calling spin Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix(executor): Reset wait_result on every call to spin_some_impl Before, the method would not recollect available work in case of spin_some, spin_all. This would lead to the method behaving differently than to what the documentation states. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * restore previous test logic for now Signed-off-by: William Woodall <william@osrfoundation.org> * refactor spin_some_impl's logic and improve busy wait tests Signed-off-by: William Woodall <william@osrfoundation.org> * added some more comments about the implementation Signed-off-by: William Woodall <william@osrfoundation.org> --------- Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com> Signed-off-by: William Woodall <william@osrfoundation.org> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: jmachowinski <jmachowinski@users.noreply.github.com> Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> (cherry picked from commit f7185dc)
* test(Executors): Added tests for busy waiting Checks if executors are busy waiting while they should block in spin_some or spin_all. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix: Reworked spinAll test This test was strange. It looked like, it assumed that spin_all did not return instantly. Also it was racy, as the thread could terminate instantly. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix(Executor): Fixed spin_all not returning instantly is no work was available Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * Update rclcpp/test/rclcpp/executors/test_executors.cpp Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com> * test(executors): Added test for busy waiting while calling spin Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * fix(executor): Reset wait_result on every call to spin_some_impl Before, the method would not recollect available work in case of spin_some, spin_all. This would lead to the method behaving differently than to what the documentation states. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> * restore previous test logic for now Signed-off-by: William Woodall <william@osrfoundation.org> * refactor spin_some_impl's logic and improve busy wait tests Signed-off-by: William Woodall <william@osrfoundation.org> * added some more comments about the implementation Signed-off-by: William Woodall <william@osrfoundation.org> --------- Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com> Signed-off-by: William Woodall <william@osrfoundation.org> Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com> Co-authored-by: jmachowinski <jmachowinski@users.noreply.github.com> Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> (cherry picked from commit f7185dc) Co-authored-by: William Woodall <william@osrfoundation.org>
| } | ||
| EXPECT_GT(this->waitable->get_count(), 0u); | ||
|
|
||
| first_check_passed = true; |
There was a problem hiding this comment.
you need to lock cv_m while accessing the share state (e.g. first_check_passed). If you don't do this, you get race conditions
| cv.wait_for(lk, 10s); | ||
| } | ||
| EXPECT_GT(this->waitable->get_count(), 0u); | ||
|
|
There was a problem hiding this comment.
This test is racy / does not really check for busy waits. You need to wait a bit here, to give a busy waiting executor time to call is_ready a lot of times. As the test is implemented right now, it could pass even if the executor is busy waiting.
This is a follow up of #2509 which addresses #2508.