Race conditions when updating PlanningScene: fixup #716 by rhaschke · Pull Request #728 · moveit/moveit_ros

rhaschke · 2016-07-26T14:32:34Z

This continues #716 and #724 which were both merged already. I repeat here my comments to #724:

@v4hn Realizing the simplification we agreed on in the phone call turned out to be rather difficult:

simplification of stateUpdateTimerCallback() (removing the additional timestamp check) caused dead locks. Hence, I reverted these changes.
I successfully moved validatePlan() from MoveGroupInterface (client side) to MoveGroup's ExecutionServiceCapability (server side) - as agreed.
Simplification of syncSceneUpdates(): We definitely need both while loops and the timeout. When the robot doesn't move at all, we will never receive a scene update. The first loop is required to check for a recent robot state update directly in CSM, while the second loop (in case there is no CSM) waits - with a timeout - for a direct scene update. There is no way to omit this timeout. The only chance would be to trigger all input channels of PSM to send updates - which is not possible. However, in move_group, this doesn't pose a problem, because there is a CSM instantiated.
Removing syncSceneUpdates() after trajectory execution makes sense. Doing so, led to failure of the test suggested in Trajectory start doesn't match Current Position, when using plan/execute #442. Now I fixed it. CSM::waitForCurrentState() didn't do what I expected from the name...

Some more thoughts:

I suggest some more API changes: syncSceneUpdates() should be renamed to waitForCurrentRobotState().
There are some methods waitForCurrentState() in CurrentStateMonitor, which actually should be renamed to waitForCompleteState() as they don't look at recent timestamps.
I'm still arguing for a true method syncSceneUpdates() which incorporates all pending scene updates in the callback queue. I agree to @v4hn that this doesn't guarantee that all scene updates up to a given timestamp are incorporated, because - due to network delays - ROS messages might be received delayed. The only guarantee for a synchronous scene update is the new method applyPlanningScene().
I added some integration tests in moveit_ros/test. To this end, I imported a Fanuc robot description and moveit_config from ROS industrial. Please check, if that's fine.

from MoveGroupInterface (client side) to MoveGroup's ExecutionServiceCapability (server side) This checks validity of plan no matter which client called. Placing it in MoveGroupContext will allow re-use by other capabilities.

Primary objective of syncSceneUpdates() is to receive a recent robot state. If a current state monitor is active, this is all to monitor. If not, scene updates are monitored, in particular the timestamp of their RobotState member.

- new CurrentStateMonitor::waitForCurrentState(ros::Time) - simply wait for state update to reach scene - fixed validatePlan(): use new CSM::waitForCurrentState()

If the robot doesn't move, an update in CSM is not forwarded to the planning scene. Hence, we always wait until timeout for a recent last_robot_motion_time_. If we ensure that an update, if it is triggered by CSM, is directly passed to the scene, we can early return true. This partially reverts commit 907980a392d352f5e2ebb6a0b6906c85d9c7d72c.

partially reverts f38a557

davetcoleman · 2016-07-26T19:48:53Z

I think the integration tests make this PR way too big - should be separate as discussion on call

new syncSceneUpdates() indeed also waits for all pending scene updates to be processed

v4hn · 2016-07-28T16:59:46Z

move_group/include/moveit/move_group/move_group_context.h

 #define MOVEIT_MOVE_GROUP_CONTEXT_

 #include <moveit/macros/class_forward.h>
+#include <trajectory_msgs/JointTrajectory.h>


please add the new dependency to the package.xml

rhaschke · 2016-07-28T19:34:21Z

I'm afraid that I cannot handle the comments before next week. Please be patient.

v4hn · 2016-07-28T19:37:08Z

@rhaschke too bad. Sorry it took me so long :(
@130s is it ok, if this will take another week before a new indigo release?
Releasing without this fixed-up request is not an option.

davetcoleman · 2016-07-29T00:48:34Z

Hopefully before August 5th...

v4hn · 2016-08-03T21:56:17Z

@davetcoleman , @130s, @mikeferguson / whoever feels responsible. This is a bit critical, so please read and comment.

@davetcoleman this is not going to happen before Friday,
I've been told @rhaschke is on vacation starting this Monday.

He will be really pissed because of that, but at the moment I'm considering a revert of the whole list of commits for #442 up to now. Maybe that's what we should have done directly after the unreviewed merge...
We have at least one open bug with respect to these changes by @dornhege, a list of things that should be improved, ABI breakage w.r.t. the last release and I just noticed a segfault in the RViz plugin in my setup when removing the MotionPlanning display, that gdb blames on the new spinner_ that has been added to the PlanningSceneMonitor. I'm not sure we can have that spinner there, because ros keeps a recursive mutex on spinners, and there must only be one thread that keeps spinners. I'm pretty sure this might have broken API in a totally unintuitive way (definitely not acceptable in indigo), because if you keep a AsyncSpinner around from your main thread and create the monitor from a different thread things probably break.

Overall, this is no state in which we should let people work on "minor" issues and the whole "fix the start state" patch is clearly not ready for a release.

I therefore propose to revert these change-sets and try to add only the part that makes the ExecutionController check whether the trajectory it is about to execute starts somewhere around the current state. This should be possible without most of this code and without breakage.
Later on we can focus on getting this to work without this many issues..

davetcoleman · 2016-08-04T02:38:30Z

Yes, after seeing #736 and possibly related moveit/moveit#10 I have been worried about the state of the PlanningSceneMonitor. Considering it was never fully reviewed I do not think we should feel guilty about the revert. +1

dornhege · 2016-08-04T11:31:51Z

+1 for the revert. Ideally before the merge and unless it is shown that #736 is not an issue in general or independent from this, we cannot release.
The PR is still there to be worked on, but this seems to be not trivial to get right.

rhaschke · 2016-08-04T15:59:38Z

Looks like AsyncSpinner isn't really usable as intended. As I can't work on the issue in the next days, go for the revert as suggested by @v4hn.

davetcoleman · 2016-08-04T17:10:21Z

@v4hn can you do the revert today before tomorrow's merge?

v4hn · 2016-08-04T21:18:24Z

I opened a request to revert the previous merge commits in #742 .

I'll close this request. We will have to look through the changes again anyway and rebase/reformat them for the merged repo.

rhaschke · 2016-08-15T07:32:19Z

@v4hn @dornhege @davetcoleman
I'm back from vacation and I'm going to look into the AsyncSpinner issue now.

Obviously any use of an AsyncSpinner is currently not safe: If they are started from different threads, only the very first one will actually be started. Hence it depends on the order of their start, which one will be active, which is not acceptable. As there are multiple AsyncSpinners used throughout the source, #736 reported by @dornhege might be only the tip of the iceberg.

I will try to fix that issue upstream in ros_comm and then analyze, why it failed in #736 but not for me.

v4hn · 2016-08-15T09:20:23Z

On Mon, Aug 15, 2016 at 12:32:20AM -0700, Robert Haschke wrote:

I will try to fix that issue upstream in ros_comm and then analyze, why it failed in #736 but not for me.

Thanks! This is clearly a bug in ros_comm and the commit that introduced the global mutex there is at fault.
The trouble is, it might be hard to remove it again without understanding why it was introduced in the first place..
Maybe it's worth a try to contact the author? It has been a couple of years, but maybe he remembers the original bug (if there was one)?

davetcoleman · 2016-08-15T19:06:11Z

Welcome back @rhaschke, hope vacation was good! Thanks for continuing to work on this

rhaschke · 2016-08-15T21:51:47Z

The other usage of AsyncSpinners in moveit_ros source tree is uncritical. They mostly belong to stand-alone programs. I think, the reason that @dornhege observed the bug, but not me, was that he was running the whole perception pipeline too. The perception/mesh_filter instantiates its own PlanningSceneMonitor and thus has thrown the error.

I prepared ros/ros_comm#867 and hope that they will consider it timely.

v4hn · 2016-08-16T13:21:08Z

Wow, given the scope of your patch @rhaschke I'm pretty sure this will not be merged in indigo, not sure about kinetic. But we should probably fix the safety issue in indigo...

rhaschke added 5 commits July 26, 2016 09:35

moved validatePlan()

91b97ce

from MoveGroupInterface (client side) to MoveGroup's ExecutionServiceCapability (server side) This checks validity of plan no matter which client called. Placing it in MoveGroupContext will allow re-use by other capabilities.

reworked syncSceneUpdates() to monitor the robot state

b600ca4

Primary objective of syncSceneUpdates() is to receive a recent robot state. If a current state monitor is active, this is all to monitor. If not, scene updates are monitored, in particular the timestamp of their RobotState member.

add debugging output

74a8905

reworked syncSceneUpdates()

4442f69

- new CurrentStateMonitor::waitForCurrentState(ros::Time) - simply wait for state update to reach scene - fixed validatePlan(): use new CSM::waitForCurrentState()

rhaschke mentioned this pull request Jul 26, 2016

fixup #716 #724

Merged

rhaschke force-pushed the fixup-#716 branch from 166c58e to 8843c21 Compare July 26, 2016 16:18

rhaschke added 2 commits July 26, 2016 21:34

remove syncSceneUpdates() after trajectory execution

c6e1de7

partially reverts f38a557

MoveGroupInterface::getCurrentState(): wait for current state

819df92

rhaschke force-pushed the fixup-#716 branch from 8843c21 to 819df92 Compare July 26, 2016 20:11

renamed syncSceneUpdates() to waitForCurrentRobotState()

f3a1441

new syncSceneUpdates() indeed also waits for all pending scene updates to be processed

v4hn mentioned this pull request Jul 28, 2016

indigo/jade releases #689

Closed

1 task

v4hn reviewed Jul 28, 2016
View reviewed changes

dornhege mentioned this pull request Aug 3, 2016

No planning scene updates in Rviz after #716. #736

Closed

davetcoleman changed the title ~~fixup #716~~ Race conditions when updating PlanningScene: fixup #716 Aug 4, 2016

v4hn mentioned this pull request Aug 4, 2016

Reverting Planning Scene Updates #742

Merged

v4hn closed this Aug 4, 2016

davetcoleman mentioned this pull request Aug 6, 2016

Jade Release moveit/moveit#22

Closed

v4hn mentioned this pull request Aug 8, 2016

Execution of Cartesian Path Fails Due to Differing Current and Start State moveit/moveit#27

Closed

rhaschke mentioned this pull request Aug 17, 2016

validate trajectory before execution moveit/moveit#63

Merged

rhaschke mentioned this pull request Sep 20, 2016

fix race conditions when updating PlanningScene moveit/moveit#232

Closed

Conversation

rhaschke commented Jul 26, 2016

Uh oh!

davetcoleman commented Jul 26, 2016

Uh oh!

v4hn Jul 28, 2016

Choose a reason for hiding this comment

Uh oh!

rhaschke commented Jul 28, 2016

Uh oh!

v4hn commented Jul 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davetcoleman commented Jul 29, 2016

Uh oh!

v4hn commented Aug 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davetcoleman commented Aug 4, 2016

Uh oh!

dornhege commented Aug 4, 2016

Uh oh!

rhaschke commented Aug 4, 2016

Uh oh!

davetcoleman commented Aug 4, 2016

Uh oh!

v4hn commented Aug 4, 2016

Uh oh!

rhaschke commented Aug 15, 2016

Uh oh!

v4hn commented Aug 15, 2016

Uh oh!

davetcoleman commented Aug 15, 2016

Uh oh!

rhaschke commented Aug 15, 2016

Uh oh!

v4hn commented Aug 16, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

v4hn commented Jul 28, 2016 •

edited

Loading

v4hn commented Aug 3, 2016 •

edited

Loading