Executer regression fix by jmachowinski · Pull Request #2509 · ros2/rclcpp

jmachowinski · 2024-04-17T12:56:43Z

Checks if executors are busy waiting while they should block in spin_some or spin_all. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

This test was strange. It looked like, it assumed that spin_all did not return instantly. Also it was racy, as the thread could terminate instantly. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

…available Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

fujitatomoya

lgtm!

rclcpp/test/rclcpp/executors/test_executors.cpp

fujitatomoya · 2024-04-17T20:57:50Z

CI:

Linux
Linux-aarch64
Windows

Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

Before, the method would not recollect available work in case of spin_some, spin_all. This would lead to the method behaving differently than to what the documentation states. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

jmachowinski · 2024-04-19T09:36:25Z

I looked further into this, as all the tests lid up.
I pushed an updated version, can someone rerun the CI with this ?

@wjwwood @mjcarroll I noticed, that the spin_some_impl method did not act according to the documentation.
It was not recollecting ready executables on every call. Is this intentional, or a bug ?

fujitatomoya · 2024-04-25T16:06:40Z

@jmachowinski sorry i missed this, here is new CI.

CI:

Linux
Linux-aarch64
Windows

wjwwood

I couldn't fix up this pr directly because of push permissions, but I made a new pr that builds on this one: #2517

I think we should take my pr, but I'm open to discussing even more changes.

wjwwood · 2024-04-30T09:52:44Z

rclcpp/src/rclcpp/executor.cpp

  RCPPUTILS_SCOPE_EXIT(this->spinning.store(false); );

+  // clear result, to force recollect of ready executables
+  wait_result_.reset();


I think this change is correct, but it doesn't address what was originally reported in #2508, which is about avoiding a busy wait in spin_all. Instead this change prevents old data from a previous spin being used at the beginning of this spin_some/all.

wjwwood · 2024-04-30T10:01:05Z

rclcpp/src/rclcpp/executor.cpp

-      // In the case of spin some, then we can exit
-      // In the case of spin all, then we will allow ourselves to wait again.
-      break;
+      if (!work_available || !exhaustive) {


I don't think this is the right logic actually. If this is called with exhaustive = false and the first wait_for_work() collects two things to do, this function would stop after executing only one of them, when it should execute both. (!true || !false) -> true -> break.

The && ensured that you didn't consider the exhaustive option until you finished what you collected, which is the behavior of spin_some:

/// Collect work once and execute all available work, optionally within a max duration.

If exhaustive = true then it's spin_all which not only wants to finish all collected work after waiting, but also wants to wait again and again until there's no work available just after waiting:

/// Collect and execute work repeatedly within a duration or until no more work is available.

With all of that in mind, I will rewrite the logic of this function to make it clearer what is happening and why.

You would not end up in this code path as long as work is available, as get_next_ready_executable(any_exec) would return true.

wjwwood · 2024-04-30T10:03:36Z

rclcpp/test/rclcpp/executors/test_executors.cpp

-  // Long timeout, but should not block test if spin_all works as expected as we cancel the
-  // executor.
-  bool spin_exited = false;
-  std::thread spinner([&spin_exited, &executor, this]() {
-      executor.spin_all(1s);
-      executor.remove_node(this->node, true);
-      spin_exited = true;
-    });
-
-  // Do some work until sufficient calls to the waitable occur
-  auto start = std::chrono::steady_clock::now();
-  while (
-    my_waitable->get_count() <= 1 &&
-    !spin_exited &&
-    (std::chrono::steady_clock::now() - start < 1s))
-  {
-    my_waitable->trigger();
-    this->publisher->publish(test_msgs::msg::Empty());
-    std::this_thread::sleep_for(1ms);
-  }
+  // trigger multiple times, so that some work is available
+  my_waitable->trigger();
+  my_waitable->trigger();
+  my_waitable->trigger();

-  executor.cancel();
-  start = std::chrono::steady_clock::now();
-  while (!spin_exited && (std::chrono::steady_clock::now() - start) < 1s) {
-    std::this_thread::sleep_for(1ms);
-  }
+  // Long timeout, but should not block as almost no work is available
+  executor.spin_all(1s);
+  executor.remove_node(this->node, true);

  EXPECT_LT(1u, my_waitable->get_count());
  waitable_interfaces->remove_waitable(my_waitable, nullptr);
-  ASSERT_TRUE(spin_exited);
-  spinner.join();


When making a change to the behavior of a function in a way that shouldn't be breaking existing users, it's best to not change the test too. Looking at these changes, I don't see why they are necessary. It might be cleaner or simpler, but it's better to do that in a separate pull request.

This test triggers as a false positive on my 7800X3D. The problem is, that the spinner thread starts and terminates, before the waitable is even triggered.

wjwwood · 2024-04-30T10:06:35Z

rclcpp/test/rclcpp/executors/test_executors.cpp

+    executor->add_node(node);
+
+    // spin a bit to get all events from the addition of the node processed
+    executor->spin_all(std::chrono::milliseconds(100));


Rather than do this, I think it's better to create an isolated callback group and use that for testing, so that transient node-specific stuff does not interfere. This kind of "spin a bit and hope the resulting state is what I'm expecting" is pretty fragile. And I know we do that in other places, but I'd rather not add more cases of it.

wjwwood · 2024-04-30T10:17:12Z

rclcpp/test/rclcpp/executors/test_executors.cpp

+    // we allow a big number here, as in case of real active waiting
+    // it will be off by a few thousands anyway.
+    ASSERT_LT(waitable->get_is_ready_call_count() - start_is_ready_count, 30);


This is a pretty fuzzy assertion. It would be better to use a number based on the number of things we expect need handling or something like that, rather than arbitrarily using 30. I think avoiding adding the entire node and instead using a single callback group might help with that, but we also want to avoid coupling the test too tightly to the current implementation details, as well as leaving some flexibility for the executor implementations. I think in general testing for busy waits is going to be really challenging, given that this is not a guarantee made by the interface, e.g. someone could create a busy-wait/polling executor because that met their needs and that would break this test. It would be like making a test for std::condition_variable::wait_for that stated that the predicate could not be called more than X number of times. For some value of X that might be a reasonable check, but if it's too high you get too many false positives for the test, but if it's too low you might be false negatives, given that you don't know the implementation details.

That all being said, I do appreciate trying to make a test for this, and I think we should keep it until some executor breaks the mold.

wjwwood · 2024-05-01T11:23:11Z

Closing in favor of #2517, but we can reopen this one if needed.

Janosch Machowinski added 3 commits April 17, 2024 14:53

test(Executors): Added tests for busy waiting

8d57cb5

Checks if executors are busy waiting while they should block in spin_some or spin_all. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

fix: Reworked spinAll test

e230b02

This test was strange. It looked like, it assumed that spin_all did not return instantly. Also it was racy, as the thread could terminate instantly. Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

fix(Executor): Fixed spin_all not returning instantly is no work was …

a0ad26f

…available Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

jmachowinski requested review from hidmic, ivanpauno and wjwwood as code owners April 17, 2024 12:56

fujitatomoya approved these changes Apr 17, 2024

View reviewed changes

rclcpp/test/rclcpp/executors/test_executors.cpp Outdated Show resolved Hide resolved

alsora approved these changes Apr 17, 2024

View reviewed changes

jmachowinski and others added 3 commits April 19, 2024 11:29

Update rclcpp/test/rclcpp/executors/test_executors.cpp

30975d2

Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: jmachowinski <jmachowinski@users.noreply.github.com>

test(executors): Added test for busy waiting while calling spin

70a3e38

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

jmachowinski force-pushed the executer_regression_fix branch from 6ff50bf to 9495fa7 Compare April 19, 2024 09:33

wjwwood mentioned this pull request Apr 30, 2024

Fixup Executor::spin_all() regression fix #2517

Merged

wjwwood requested changes Apr 30, 2024

View reviewed changes

wjwwood closed this May 1, 2024

mergify bot mentioned this pull request May 2, 2024

Fixup Executor::spin_all() regression fix (backport #2517) #2521

Merged

Conversation

jmachowinski commented Apr 17, 2024

Uh oh!

fujitatomoya left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fujitatomoya commented Apr 17, 2024

Uh oh!

jmachowinski commented Apr 19, 2024

Uh oh!

fujitatomoya commented Apr 25, 2024

Uh oh!

wjwwood left a comment

Choose a reason for hiding this comment

Uh oh!

wjwwood Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

wjwwood Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

jmachowinski May 6, 2024

Choose a reason for hiding this comment

Uh oh!

wjwwood Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

jmachowinski May 6, 2024

Choose a reason for hiding this comment

Uh oh!

wjwwood Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

wjwwood Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

wjwwood commented May 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants