Removes slow robotStateMsgToRobotState call from pilz by riv-tnoble · Pull Request #3589 · moveit/moveit

riv-tnoble · 2024-04-19T08:24:57Z

Description

I've been spending some time profiling our motion planning pipeline, which has become quite slow recently.

Using a collision object we use to generate raster paths (~66k faces), and a path of ~400 cartestian points, we were achieving planning times of ~30s.

I did some profiling of our own code, and found that around half the time was spent converting from moveit_msgs::RobotState to moveit::core::RobotState via movit::core::robotStateMsgToRobotState() for every point in our path. I suspect the main bottleneck was converting all of the attachment bodies (and their meshes). Rewriting the code to perform this conversion once at the start of path generation has reduced the time to ~30s, which appears to be spent mainly on the MoveIt side.

Of this remaining 30s, 23s are spent within pilz's TrajectoryGenerator::generate() in the MotionPlanInfo constructor, which, again, spends the majority of its time calling moveit::core::robotStateMsgToRobotState().

Time analysis of TrajectoryGenerator::generate() (Data & Pandas Summary)

TrajectoryGenerator__generate.csv

Time analysis of MotionPlanInfo::MotionPlanInfo() (Data & Pandas Summary)

MotionPlanInfo__MotionPlanInfo.csv

In my understanding, the purpose of the constructor is to set the start_scene and start_joint_position fields of this class. I've rewritten the implementation to take these values from the MotionPlanRequest parameter, rather than converting. Doing so has reduced the time spent in TrajectoryGenerator::generate() from ~23s to ~0.5s.

New time analysis of TrajectoryGenerator::generate() (Data & Pandas Summary)

TrajectoryGenerator::generate.csv

Since only the start_scene, which is const, and start_joint_position are set in this constructor, the resulting MotionPlanInfo should be unaffected by the changes.

My only concern is whether the (now removed) call to moveit::core::RobotState::update may be having side effects beyond the scope of the method.

I'd appreciate any thoughts and suggestions regarding this change,

Tom

Checklist

Required by CI: Code is auto formatted using clang-format
Extend the tutorials / documentation reference ( No user-facing changes )
Document API changes relevant to the user in the MIGRATION.md notes (No API changes)
Create tests, which fail without this PR reference
Include a screenshot if changing a GUI (No GUI changes. Profiling reports provided)
While waiting for someone to review your request, please help review another open pull request to support the maintainers

riv-mjohnson · 2024-04-19T08:54:27Z

For reference, we are already using #3518, so this PR is the second round of profiling and optimisation compared to master.

riv-tnoble · 2024-04-19T11:05:58Z

Unsure why the pipelines are failing. Looks like networking issues with apt?

rhaschke

I'm afraid your approach is not viable: The request's start_state not only comprises initial joint values, but also attached collision objects.

rhaschke · 2024-04-19T12:07:11Z

moveit_planners/pilz_industrial_motion_planner/src/trajectory_generator.cpp

    const auto& names = jm->getVariableNames();
    for (std::size_t i = 0, j = jm->getFirstVariableIndex(); i < jm->getVariableCount(); ++i, ++j)
-      start_joint_position[names[i]] = positions[j];
+      start_joint_position[names[i]] = req.start_state.joint_state.position[j];


This only considers joint_state and ignores multi_dof_joint_state. Moreover, I think the order of joints is mangled if there are multi-dof joints.

Does JointModel::getVariableNames() handle multi-dof joints? I may have (mistakenly) assumed that the code was designed to work with single-dof joints only, and that the extra check in the above if was redundant.

Yes. getVariableNames() will return a vector with multiple elements for multi-dof joints.

rhaschke · 2024-04-19T12:08:11Z

moveit_planners/pilz_industrial_motion_planner/src/trajectory_generator.cpp

-  start_state.update();
-  start_scene = std::move(ps);
-
+  start_scene = std::move(scene->diff());


std::move can be dropped here. This is only relevant for passing arguments, isn't it?

Potentially, yes. I decided to leave it as it was already there, but happy to remove if not necessary.

riv-tnoble · 2024-04-19T12:49:44Z

I'm afraid your approach is not viable: The request's start_state not only comprises initial joint values, but also attached collision objects.

I agree that this specific method ignores attached collision objects, however it seems the previous code was doing that anyway? The RobotState being built previously would contain any attached collision objects (hence the long conversion), but ultimately this object is only used to populate start_joint_position. The robot state itself is not persisted, if I'm understanding correctly, therefore I was assuming the collision objects are handled outside of this code (i.e. it's only really the joints that are relevant for this method)?

riv-tnoble · 2024-04-19T12:58:32Z

As an addition to the above point, I tried planning a contact motion with a collision object, and the output appears to be what I'd expect:

[12:54:28.221328] [/move_group/rviz]: Detected collision in planned motion: components with ee23.

For reference, "components" is the part in green, "ee23" is the tool in red, and the thin tip at the end of the tool (also red, just touching the surface in white) is a separate collision object with contacts disabled with the "component", therefore the collision is due to the bulk of the tool colliding with the back half of the "component"

riv-tnoble · 2024-04-19T13:19:07Z

I'll do some more checks with collisions in the meantime. We do some postprocessing after we get output from MoveIt, so I want to make sure the collisions are being detected before then (though I don't think we do any collision checks ourselves).

Would you mind re-running the pipelines? Hopefully if this is an issue, it'll be caught by a test which checks for collisions.

rhaschke · 2024-04-19T16:55:45Z

I agree that this specific method ignores attached collision objects, however it seems the previous code was doing that anyway?

Previously, the start_state was including attached collision objects from the request's start_state. This was important (and only recently changed) to consider these collision objects during planning.

codecov · 2024-04-19T17:33:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.31%. Comparing base (c174715) to head (1fc2f91).
Report is 10 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3589      +/-   ##
==========================================
- Coverage   47.15%   46.31%   -0.83%     
==========================================
  Files         603      603              
  Lines       58999    59104     +105     
  Branches     6969     6998      +29     
==========================================
- Hits        27814    27368     -446     
- Misses      30773    31320     +547     
- Partials      412      416       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rhaschke · 2024-04-20T08:25:24Z

I re-triggered CI. The current failures are due to real issues.

TSNoble · 2024-04-21T07:08:10Z

I agree that this specific method ignores attached collision objects, however it seems the previous code was doing that anyway?

Previously, the start_state was including attached collision objects from the request's start_state. This was important (and only recently changed) to consider these collision objects during planning.

I was more meaning that the start_state built from the request previously only persists within the scope of the method. Only the start_scene and the start_joint_positions are persisted to the resulting MotionPlanInfo object. The attached collision objects and anything else associated with the RobotState appear to have nothing to do with the MotionPlanInfo being constructed.

I did notice that PlanningScene::getCurrentState() returns a RobotState reference. Is it the case that this method was indirectly updating the scene with the objects through a side-effect?

As an aside, we're side-stepping using Pilz's plan sequence service due to error propagation documented here #3569, so we're basically doing iteration over the poses ourselves, and calling MoveGroupInterface::plan() on each waypoint, hence the expensive conversion performed per waypoint.

Perhaps a better solution would be to fix #3569, and ensure that Pilz's plan sequence service only performs the conversion of the expensive parts of the RobotState once at the start of the sequence for efficiency (which it may already be doing) so that we can continue to use that service.

I'd still like to know that this full conversion of the RobotState message in MotionPlanInfo::MotionPlanInfo() is definitely necessary though (more so I can understand the Pilz code better than anything else 😅)

TSNoble · 2024-04-21T07:13:58Z

I re-triggered CI. The current failures are due to real issues.

Thanks for doing that. I'll wait for a reply to my previous comment to decide whether this code should be fixed, or whether the proposed alternative is a better idea.

moveit#3569 describes propagation of error from updating collision objects with each update of the start state during a sequence motion plan. moveit#3589 attempts to speed up plans by updating only joints positions when updating the start state, though this may cause the RobotState to ignore collision objects entirely if the state being updated has no knowledge of them. This PR attempts to fix both issues by only updating the joints of a sequence motion, without copying across attached collision objects, in the hope that the start state being updated is already populated with attached collision object information.

TSNoble · 2024-04-21T08:22:42Z

I've raised #3590 as an alternative solution which I hope will fix both #3569, and the issue of efficiency while planning.

rhaschke · 2024-04-21T14:51:16Z

Is it the case that this method was indirectly updating the scene with the objects through a side-effect?

Yes, the planning scene's state was updated through this method.

TSNoble · 2024-04-21T16:14:34Z

Is it the case that this method was indirectly updating the scene with the objects through a side-effect?

Yes, the planning scene's state was updated through this method.

Thanks, that helps me understand the impact of the change, though I'm still a little unsure whether this poses an issue. My understanding is that the PlanningScene maintains an up-to-date instance of the RobotState, which should include any collision objects that are attached before the planner is called.

Requiring the planner to update the scene's state with objects would seem to suggest that the objects have changed without the scene knowing beforehand, or that they are expected to change mid-plan. The latter seems unlikely to me, and the former (i.e. ensuring that objects are up-to-date at the start of a plan) feels like a responsibility that should lie with the code calling the planner, rather than part of the planner itself.

I suppose my confusion boils down to the following question: would it be a valid use case to take a scene with no attached objects, and attempt to create a plan with objects that are not already known to the scene?

rhaschke · 2024-04-21T16:35:12Z

My understanding is that the PlanningScene maintains an up-to-date instance of the RobotState, which should include any collision objects that are attached before the planner is called.

This is correct. However, you can pass a different start state for planning via the planning request (planning doesn't need to use the current state). This start state should be represented as a state diff, which only requires to specify diffs w.r.t. the current scene state.

TSNoble · 2024-04-21T16:54:07Z

My understanding is that the PlanningScene maintains an up-to-date instance of the RobotState, which should include any collision objects that are attached before the planner is called.

This is correct. However, you can pass a different start state for planning via the planning request (planning doesn't need to use the current state). This start state should be represented as a state diff, which only requires to specify diffs w.r.t. the current scene state.

Ah, yeah, that makes sense and thanks for the clarification.

Would you mind taking a look at #3590? I think that might be a more suitable solution. We're only calling Pilz to plan individual motions successively to get around issues we were seeing with the built-in sequence planning functionality. As mentioned there, the idea would be to do a full copy of the start state once before the sequence, then only for joints as each waypoint is planned.

riv-mjohnson · 2024-04-22T07:43:30Z

My understanding is that the PlanningScene maintains an up-to-date instance of the RobotState, which should include any collision objects that are attached before the planner is called.

This is correct. However, you can pass a different start state for planning via the planning request (planning doesn't need to use the current state). This start state should be represented as a state diff, which only requires to specify diffs w.r.t. the current scene state.

Ah, I think this is where our problem lies. We're currently passing the whole state in, not a diff. Thank you for your help.

riv-tnoble · 2024-04-22T12:15:40Z

@rhaschke

I've tried rewriting our code to simply pass a RobotState msg with updated joints and is_diff = true, however, the planner is unable to find the subframes for the attached collision objects.

I thought maybe the planning scene was incorrect and didn't have this info, but oddly (after adding some print statements to the function changed in this PR, and reverting it back to its original form) it seems that the attached bodies are present, but get wiped after the robot state diff is applied:

TrajectoryGenerator::MotionPlanInfo::MotionPlanInfo(const planning_scene::PlanningSceneConstPtr& scene,
                                                    const planning_interface::MotionPlanRequest& req)
{
  auto ps = scene->diff();
  auto& start_state = ps->getCurrentStateNonConst();
  const auto& rs = req.start_state;
  std::vector<const moveit::core::AttachedBody*> attached_bodies;
  start_state.getAttachedBodies(attached_bodies);
  ROS_WARN("NUMBER OF ATTACHED BODIES (Before robotStateMsgToRobotState): %ld", attached_bodies.size());
  if (!rs.joint_state.name.empty() || !rs.multi_dof_joint_state.joint_names.empty())
    // update start state from request's start state
    moveit::core::robotStateMsgToRobotState(req.start_state, start_state);
  start_state.getAttachedBodies(attached_bodies);
  ROS_WARN("NUMBER OF ATTACHED BODIES (After robotStateMsgToRobotState): %ld", attached_bodies.size());
  start_state.update();
  start_state.getAttachedBodies(attached_bodies);
  ROS_WARN("NUMBER OF ATTACHED BODIES (After RobotState::update): %ld", attached_bodies.size());
  start_scene = std::move(ps);
  // initialize info.start_joint_position with active joint values from start_state
  const double* positions = start_state.getVariablePositions();
  for (const auto* jm : start_state.getRobotModel()->getJointModelGroup(req.group_name)->getActiveJointModels())
  {
    const auto& names = jm->getVariableNames();
    for (std::size_t i = 0, j = jm->getFirstVariableIndex(); i < jm->getVariableCount(); ++i, ++j)
      start_joint_position[names[i]] = positions[j];
  }
}

[12:14:04.899245] [/move_group/move_group]: NUMBER OF ATTACHED BODIES (Before robotStateMsgToRobotState): 6
[12:14:04.899271] [/move_group/move_group]: NUMBER OF ATTACHED BODIES (After robotStateMsgToRobotState): 0
[12:14:04.899285] [/move_group/move_group]: NUMBER OF ATTACHED BODIES (After update): 0
[12:14:04.899340] [/move_group/move_group]: Pose frame 'tb_4_26/burr_tip' does not exist.
[12:14:04.899372] [/move_group/move_group]: Inverse kinematics for pose 
  1.38525
0.0182948
 0.603565 has no solution.
[12:14:04.899426] [/move_group/move_group]: Failed to compute inverse kinematics for link: tb_4_26/burr_tip of goal pose

riv-tnoble · 2024-04-22T15:05:58Z

I think I've found the issue. We set the start state using MoveGroupInterface::setStartState() with a moveit_msgs::RobotState with is_diff = true. This converts the diff into a robot_msgs::RobotState, and uses the other implementation of setStartState() to set considered_start_state_ internally.

We then call MoveGroupInterface::plan(), which calls constructGoal(), which in turn calls constructMotionPlanRequest(). constructMotionPlanRequest turns the start state back into a message, but only sets is_diff to true if considered_start_state_ is unset.

So essentially, we're setting the start state as a diff, but the flag gets flipped before being passed to pilz, which then interprets the lack of attached collision objects in the message as a sign that it should clear them all from the state, rather than treating it as a diff. This is verified by adding some extra prints:

[14:23:24.623789] [/ns_motion/motion_server]: JOINTS (Our Code):
header: 
  seq: 0
  stamp: 0.000000000
  frame_id: 
name[]
  name[0]: yaskawa_joint_1_s
  name[1]: yaskawa_joint_2_l
  name[2]: yaskawa_joint_3_u
  name[3]: yaskawa_joint_4_r
  name[4]: yaskawa_joint_5_b
  name[5]: yaskawa_joint_6_t
  name[6]: cell_frame_tilt_arm
  name[7]: tilt_arm_fixture_plate
position[]
  position[0]: 0.013077
  position[1]: 0.268662
  position[2]: -0.476286
  position[3]: -0.0539252
  position[4]: 0.766893
  position[5]: -0.00960335
  position[6]: 0
  position[7]: -0.142854
velocity[]
  velocity[0]: 0
  velocity[1]: 0
  velocity[2]: 0
  velocity[3]: 0
  velocity[4]: 0
  velocity[5]: 0
  velocity[6]: 0
  velocity[7]: 0
effort[]
  effort[0]: 0
  effort[1]: 0
  effort[2]: 0
  effort[3]: 0
  effort[4]: 0
  effort[5]: 0
  effort[6]: 0
  effort[7]: 0
[14:23:24.623822] [/ns_motion/motion_server]: ATTACHED BODIES (Our Code):0
[14:23:24.623836] [/ns_motion/motion_server]: IS DIFF (Our Code): 1
[14:23:24.623853] [/ns_motion/motion_server]: SETTING START STATE FROM MSG

[14:23:24.630504] [/move_group/move_group]: JOINTS (MotionPlanInfo):
header: 
  seq: 0
  stamp: 0.000000000
  frame_id: world
name[]
  name[0]: cell_frame_tilt_arm
  name[1]: tilt_arm_fixture_plate
  name[2]: yaskawa_joint_1_s
  name[3]: yaskawa_joint_2_l
  name[4]: yaskawa_joint_3_u
  name[5]: yaskawa_joint_4_r
  name[6]: yaskawa_joint_5_b
  name[7]: yaskawa_joint_6_t
position[]
  position[0]: 0
  position[1]: -0.142854
  position[2]: 0.013077
  position[3]: 0.268662
  position[4]: -0.476286
  position[5]: -0.0539252
  position[6]: 0.766893
  position[7]: -0.00960335
velocity[]
  velocity[0]: 0
  velocity[1]: 0
  velocity[2]: 0
  velocity[3]: 0
  velocity[4]: 0
  velocity[5]: 0
  velocity[6]: 0
  velocity[7]: 0
effort[]
[14:23:24.630521] [/move_group/move_group]: ATTACHED BODIES (MotionPlanInfo):0
[14:23:24.630529] [/move_group/move_group]: IS DIFF (MotionPlanInfo): 0
[14:23:24.630539] [/move_group/move_group]: Not diff. Clearing attached bodies

[14:23:24.630566] [/move_group/move_group]: Pose frame 'tb_4_26/burr_tip' does not exist.
[14:23:24.630580] [/move_group/move_group]: Inverse kinematics for pose 
  1.38525
0.0182948
 0.603565 has no solution.
[14:23:24.630608] [/move_group/move_group]: Failed to compute inverse kinematics for link: tb_4_26/burr_tip of goal pose

TSNoble · 2024-04-22T17:37:50Z

Specifically, this line explicitly sets is_diff to false, even if the RobotState was originally built from a message where it was true. I'm unsure if this is intended behaviour i.e. diffs should be turned into RobotStates when robotStateMsgToRobotState() is called by applying them to the RobotState they are a diff with respect to.

I'd advocate for the opposite for a few reasons:

If this is the case, it doesn't appear to be working as intended. Applying robotStateMsgToRobotState() followed by robotStateToRobotStateMsg() appears to give back a msg with diff contents, but the flag set incorrectly.
Even if the code were modified to implement the above behaviour, this leads to the big drops in performance we're seeing when conversions are applied many times. I think it's nice to be able to pass around and convert RobotStates as diffs for efficiency, and only produce a full RobotState by applying the diff only when necessary.
I think having robotStateMsgToRobotState() and robotStateToRobotStateMsg() be true inverses of one another is a desirable property. The naming convention seems to imply this is the intention, but it appears not to do so in practice.

I can add a test along the lines of...

TEST(TestConversions, RobotStateConversionsAreInverses) {
   moveit_msgs::RobotState original_msg, converted_msg;
   robot_state::RobotState state(ps.getRobotModel());
   ... /// Set msg values and diff flag 
   moveit::core::robotStateMsgToRobotState(original_msg, state);
   moveit::core::robotStateToRobotStateMsg(state, converted_msg);
   EXPECT_EQ(converted_msg, original_msg);
}

...and rework the logic of these methods to implement this functionality, however, I'm unsure whether changing the outputs of these functions will have downstream implications.

That being said, I think only code that falls under point 1 will be affected, and this seems to be a case where the output is invalid anyway (it contains the contents of a diff, but is not considered so by the flag)

rhaschke · 2024-04-23T06:37:37Z

The RobotState class instance doesn't have an is_diff member. It always represents a full state, including all attached collision objects. Thus robotStateMsgToRobotState() and robotStateToRobotStateMsg() cannot be true inverses of each other: a diff cannot be retained.
I suggest changing the behavior of MoveGroupInterface: instead of converting a msg into a RobotState instance, the message should be retained as is.

riv-tnoble · 2024-04-23T08:04:04Z

I suggest changing the behavior of MoveGroupInterface: instead of converting a msg into a RobotState instance, the message should be retained as is.

Thanks for the guidance. Currently working on modifying the code and I'll raise a separate PR.

The RobotState class instance doesn't have an is_diff member. It always represents a full state, including all attached collision objects. Thus robotStateMsgToRobotState() and robotStateToRobotStateMsg() cannot be true inverses of each other: a diff cannot be retained.

With this in mind, my understanding would be:

Calling robotStateMsgToRobotState() on a diff message applies the diff and results in a fully populated RobotState
Calling robotStateToRobotStateMsg() on the output should produce msg containing a full RobotState with the diff applied

Even so, the logs seem to show that the final message does not contain a full state. It contains the contents of the original diff, simply with the diff flag set to false. This seems like a bug to me?

rhaschke · 2024-04-23T10:35:17Z

If your message doesn't comprise the full state, you probably passed an empty RobotState, with no attached collision objects defined yet. The existing objects are only cleared when the is_diff flag is not set:
https://github.com/ros-planning/moveit/blob/509ee493a76a99188dff050da903ee13f38a219e/moveit_core/robot_state/src/conversions.cpp#L369-L370

riv-tnoble · 2024-04-23T12:37:43Z

Closing in favour of #3592

Removes slow robotStateMsgToRobotState call from pilz

f4f8516

riv-tnoble requested a review from jschleicher as a code owner April 19, 2024 08:24

Fixes formatting

1fc2f91

rhaschke reviewed Apr 19, 2024

View reviewed changes

riv-tnoble requested a review from rhaschke April 19, 2024 13:22

TSNoble mentioned this pull request Apr 21, 2024

Disables copying of attached collision objects when updating start state #3590

Merged

6 tasks

riv-tnoble closed this Apr 23, 2024

Conversation

riv-tnoble commented Apr 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

riv-mjohnson commented Apr 19, 2024

Uh oh!

riv-tnoble commented Apr 19, 2024

Uh oh!

rhaschke left a comment

Choose a reason for hiding this comment

Uh oh!

rhaschke Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

riv-tnoble Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

rhaschke Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

rhaschke Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

riv-tnoble Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

riv-tnoble commented Apr 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riv-tnoble commented Apr 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riv-tnoble commented Apr 19, 2024

Uh oh!

rhaschke commented Apr 19, 2024

Uh oh!

codecov bot commented Apr 19, 2024

Codecov Report

Uh oh!

rhaschke commented Apr 20, 2024

Uh oh!

TSNoble commented Apr 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TSNoble commented Apr 21, 2024

Uh oh!

TSNoble commented Apr 21, 2024

Uh oh!

rhaschke commented Apr 21, 2024

Uh oh!

TSNoble commented Apr 21, 2024

Uh oh!

rhaschke commented Apr 21, 2024

Uh oh!

TSNoble commented Apr 21, 2024

Uh oh!

riv-mjohnson commented Apr 22, 2024

Uh oh!

riv-tnoble commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riv-tnoble commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TSNoble commented Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhaschke commented Apr 23, 2024

Uh oh!

riv-tnoble commented Apr 23, 2024

Uh oh!

rhaschke commented Apr 23, 2024

Uh oh!

riv-tnoble commented Apr 23, 2024

Uh oh!

Reviewers

Assignees

Labels

riv-tnoble commented Apr 19, 2024 •

edited

Loading

riv-tnoble commented Apr 19, 2024 •

edited

Loading

riv-tnoble commented Apr 19, 2024 •

edited

Loading

TSNoble commented Apr 21, 2024 •

edited

Loading

riv-tnoble commented Apr 22, 2024 •

edited

Loading

riv-tnoble commented Apr 22, 2024 •

edited

Loading

TSNoble commented Apr 22, 2024 •

edited

Loading