Skip to content

Fix high CPU usage due to rcl_wait waking quickly#355

Merged
jacobperron merged 2 commits intomasterfrom
bug_action_server_cpu_usage
Dec 11, 2018
Merged

Fix high CPU usage due to rcl_wait waking quickly#355
jacobperron merged 2 commits intomasterfrom
bug_action_server_cpu_usage

Conversation

@jacobperron
Copy link
Copy Markdown
Member

@jacobperron jacobperron commented Dec 8, 2018

Resolves #354

Passing a max integer value to rmw_wait() was leading to integer overflow in the rmw layer causing a quick timeout. Instead, pass NULL to block in the case "block until ready".

Also fixed another small bug in rcl_action found along the way.

Big thanks to @sloretz for helping find the issue!

@jacobperron jacobperron added in progress Actively being worked on (Kanban column) in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Dec 8, 2018
@jacobperron jacobperron self-assigned this Dec 8, 2018
@jacobperron
Copy link
Copy Markdown
Member Author

jacobperron commented Dec 8, 2018

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@jacobperron jacobperron force-pushed the bug_action_server_cpu_usage branch from c85ea91 to bfb4378 Compare December 8, 2018 02:35
@nuclearsandwich
Copy link
Copy Markdown
Member

Changes had been made since the last CI run so I optimistically started fresh builds

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@jacobperron jacobperron force-pushed the bug_action_server_cpu_usage branch from bfb4378 to 7f31f7e Compare December 10, 2018 17:55
@jacobperron
Copy link
Copy Markdown
Member Author

@wjwwood @mjcarroll Please take a look again. I realized the problem was that number_of_valid_timers was greater than zero even though min_timeout was not set in this block:

rcl/rcl/src/rcl/wait.c

Lines 504 to 508 in 7f31f7e

if (NULL != rmw_gcs->guard_conditions[gc_idx]) {
// This timer has a guard condition, so move it to make a legal wait set.
rmw_gcs->guard_conditions[rmw_gcs->guard_condition_count] =
rmw_gcs->guard_conditions[gc_idx];
++(rmw_gcs->guard_condition_count);

I've replaced it's usage with a check if min_timeout was set. Of course, if someone sets a timer to be INT64_MAX then we'll still have the same issue. What do you think about changing the default timeout value to something smaller (like INT64_MAX / 2) ?

@jacobperron
Copy link
Copy Markdown
Member Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@wjwwood
Copy link
Copy Markdown
Member

wjwwood commented Dec 10, 2018

Of course, if someone sets a timer to be INT64_MAX then we'll still have the same issue. What do you think about changing the default timeout value to something smaller (like INT64_MAX / 2) ?

Could this also be solved with an extra boolean that is set by checking the timers before (or during) calculation of the min_timeout? If so, then I wouldn't change the default timer (implication is that INT64_MAX is no longer a valid timer period) just for that. If not (I honestly can't say off-hand), then changing the timer default is ok, but I'd do INT64_MAX - 1 and then document that INT64_MAX is reserved and should not be used with the timer period (and then put code in the timer init to check that it is not).

…meout is not set

Otherwise, there is a risk of integer overflow (e.g. in rmw_fastrtps) and rmw_wait() will wake immediately.
@jacobperron jacobperron force-pushed the bug_action_server_cpu_usage branch from 7f31f7e to a4248da Compare December 10, 2018 20:06
@jacobperron
Copy link
Copy Markdown
Member Author

Could this also be solved with an extra boolean that is set by checking the timers before (or during) calculation of the min_timeout?

@wjwwood Done.

@jacobperron
Copy link
Copy Markdown
Member Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@jacobperron
Copy link
Copy Markdown
Member Author

Rebuild Windows: Build Status

@jacobperron jacobperron merged commit 1360c82 into master Dec 11, 2018
@jacobperron jacobperron deleted the bug_action_server_cpu_usage branch December 11, 2018 19:07
@jacobperron jacobperron removed the in review Waiting for review (Kanban column) label Dec 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

100% Action Server CPU usage after goal is done until it is expired

5 participants