Skip to content

Fixup Executor::spin_all() regression fix#2517

Merged
wjwwood merged 9 commits intorollingfrom
wjwwood/executor_spin_all_regression_fix
May 2, 2024
Merged

Fixup Executor::spin_all() regression fix#2517
wjwwood merged 9 commits intorollingfrom
wjwwood/executor_spin_all_regression_fix

Conversation

@wjwwood
Copy link
Member

@wjwwood wjwwood commented Apr 30, 2024

This is a follow up of #2509 which addresses #2508.

Janosch Machowinski and others added 8 commits April 30, 2024 03:27
Checks if executors are busy waiting while they should block
in spin_some or spin_all.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
This test was strange. It looked like, it assumed that spin_all did
not return instantly. Also it was racy, as the thread could terminate
instantly.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
…available

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>
Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Before, the method would not recollect available work in case of
spin_some, spin_all. This would lead to the method behaving differently
than to what the documentation states.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: William Woodall <william@osrfoundation.org>
Signed-off-by: William Woodall <william@osrfoundation.org>
@wjwwood wjwwood self-assigned this Apr 30, 2024
@wjwwood wjwwood mentioned this pull request Apr 30, 2024
@wjwwood
Copy link
Member Author

wjwwood commented Apr 30, 2024

CI:

  • Linux Build Status (updated after job got canceled)
  • Linux-aarch64 Build Status (known issues with mimic)
  • Windows Build Status (restarted after the job hung)

Copy link
Member Author

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some notes for anyone reviewing this.

Comment on lines +369 to +374
// clear the wait result and wait for work without blocking to collect the work
// for the first time
// both spin_some and spin_all wait for work at the beginning
wait_result_.reset();
wait_for_work(std::chrono::milliseconds(0));
bool just_waited = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change wasn't part of the original report (busy wait in spin_all) but it is related and is needed. It essentially ensures that we don't start out spin_some or spin_all by iterating over existing work from a previous spin method. Ideally the other spin methods would reset the result before returning so that this wouldn't be necessary, but this is safer. We do want to do this because in the case of spin_some this could cause it to never wait, iterating over old work and exiting after. This isn't really what we want.

Also, it is safe to throw away a wait result and wait again as all the things we wait on need to be able to be waited on, be ready, not be handled, be waited on again, and then be ready again. Or else they need to be ok with not being handled. Put another way, being waited on and then being ready does not guarantee being executed, and it may require being waited on again.

Copy link
Member

@mjcarroll mjcarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall change and test looks good. My only nit is that we need to probably put some of your comments here in the file for posterity.

Signed-off-by: William Woodall <william@osrfoundation.org>
@wjwwood
Copy link
Member Author

wjwwood commented May 2, 2024

I added a comment in 8587653, so if the Rpr job comes back green I'll merge without new CI, since it was just a comment and Rpr checks style and linting.

@wjwwood
Copy link
Member Author

wjwwood commented May 2, 2024

I will then follow up with a jazzy backport.

@wjwwood wjwwood merged commit f7185dc into rolling May 2, 2024
@wjwwood wjwwood deleted the wjwwood/executor_spin_all_regression_fix branch May 2, 2024 21:06
@wjwwood
Copy link
Member Author

wjwwood commented May 2, 2024

@Mergifyio backport jazzy

@mergify
Copy link
Contributor

mergify bot commented May 2, 2024

backport jazzy

✅ Backports have been created

Details

mergify bot pushed a commit that referenced this pull request May 2, 2024
* test(Executors): Added tests for busy waiting

Checks if executors are busy waiting while they should block
in spin_some or spin_all.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* fix: Reworked spinAll test

This test was strange. It looked like, it assumed that spin_all did
not return instantly. Also it was racy, as the thread could terminate
instantly.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* fix(Executor): Fixed spin_all not returning instantly is no work was available

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* Update rclcpp/test/rclcpp/executors/test_executors.cpp

Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>

* test(executors): Added test for busy waiting while calling spin

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* fix(executor): Reset wait_result on every call to spin_some_impl

Before, the method would not recollect available work in case of
spin_some, spin_all. This would lead to the method behaving differently
than to what the documentation states.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* restore previous test logic for now

Signed-off-by: William Woodall <william@osrfoundation.org>

* refactor spin_some_impl's logic and improve busy wait tests

Signed-off-by: William Woodall <william@osrfoundation.org>

* added some more comments about the implementation

Signed-off-by: William Woodall <william@osrfoundation.org>

---------

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>
Signed-off-by: William Woodall <william@osrfoundation.org>
Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Co-authored-by: jmachowinski <jmachowinski@users.noreply.github.com>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
(cherry picked from commit f7185dc)
wjwwood added a commit that referenced this pull request May 3, 2024
* test(Executors): Added tests for busy waiting

Checks if executors are busy waiting while they should block
in spin_some or spin_all.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* fix: Reworked spinAll test

This test was strange. It looked like, it assumed that spin_all did
not return instantly. Also it was racy, as the thread could terminate
instantly.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* fix(Executor): Fixed spin_all not returning instantly is no work was available

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* Update rclcpp/test/rclcpp/executors/test_executors.cpp

Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>

* test(executors): Added test for busy waiting while calling spin

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* fix(executor): Reset wait_result on every call to spin_some_impl

Before, the method would not recollect available work in case of
spin_some, spin_all. This would lead to the method behaving differently
than to what the documentation states.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

* restore previous test logic for now

Signed-off-by: William Woodall <william@osrfoundation.org>

* refactor spin_some_impl's logic and improve busy wait tests

Signed-off-by: William Woodall <william@osrfoundation.org>

* added some more comments about the implementation

Signed-off-by: William Woodall <william@osrfoundation.org>

---------

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>
Signed-off-by: William Woodall <william@osrfoundation.org>
Co-authored-by: Janosch Machowinski <J.Machowinski@cellumation.com>
Co-authored-by: jmachowinski <jmachowinski@users.noreply.github.com>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
(cherry picked from commit f7185dc)

Co-authored-by: William Woodall <william@osrfoundation.org>
}
EXPECT_GT(this->waitable->get_count(), 0u);

first_check_passed = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to lock cv_m while accessing the share state (e.g. first_check_passed). If you don't do this, you get race conditions

cv.wait_for(lk, 10s);
}
EXPECT_GT(this->waitable->get_count(), 0u);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is racy / does not really check for busy waits. You need to wait a bit here, to give a busy waiting executor time to call is_ready a lot of times. As the test is implemented right now, it could pass even if the executor is busy waiting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants