Skip to content

Removes slow robotStateMsgToRobotState call from pilz#3589

Closed
riv-tnoble wants to merge 2 commits intomoveit:masterfrom
riv-tnoble:patch-1
Closed

Removes slow robotStateMsgToRobotState call from pilz#3589
riv-tnoble wants to merge 2 commits intomoveit:masterfrom
riv-tnoble:patch-1

Conversation

@riv-tnoble
Copy link
Copy Markdown
Contributor

@riv-tnoble riv-tnoble commented Apr 19, 2024

Description

I've been spending some time profiling our motion planning pipeline, which has become quite slow recently.

Using a collision object we use to generate raster paths (~66k faces), and a path of ~400 cartestian points, we were achieving planning times of ~30s.

I did some profiling of our own code, and found that around half the time was spent converting from moveit_msgs::RobotState to moveit::core::RobotState via movit::core::robotStateMsgToRobotState() for every point in our path. I suspect the main bottleneck was converting all of the attachment bodies (and their meshes). Rewriting the code to perform this conversion once at the start of path generation has reduced the time to ~30s, which appears to be spent mainly on the MoveIt side.

Of this remaining 30s, 23s are spent within pilz's TrajectoryGenerator::generate() in the MotionPlanInfo constructor, which, again, spends the majority of its time calling moveit::core::robotStateMsgToRobotState().

Time analysis of TrajectoryGenerator::generate() (Data & Pandas Summary)

TrajectoryGenerator__generate.csv
trajectory_generate_before

Time analysis of MotionPlanInfo::MotionPlanInfo() (Data & Pandas Summary)

MotionPlanInfo__MotionPlanInfo.csv
motion_plan_info_before

In my understanding, the purpose of the constructor is to set the start_scene and start_joint_position fields of this class. I've rewritten the implementation to take these values from the MotionPlanRequest parameter, rather than converting. Doing so has reduced the time spent in TrajectoryGenerator::generate() from ~23s to ~0.5s.

New time analysis of TrajectoryGenerator::generate() (Data & Pandas Summary)

TrajectoryGenerator::generate.csv
trajectory_generate_after

Since only the start_scene, which is const, and start_joint_position are set in this constructor, the resulting MotionPlanInfo should be unaffected by the changes.

My only concern is whether the (now removed) call to moveit::core::RobotState::update may be having side effects beyond the scope of the method.

I'd appreciate any thoughts and suggestions regarding this change,

Tom

Checklist

  • Required by CI: Code is auto formatted using clang-format
  • Extend the tutorials / documentation reference ( No user-facing changes )
  • Document API changes relevant to the user in the MIGRATION.md notes (No API changes)
  • Create tests, which fail without this PR reference
  • Include a screenshot if changing a GUI (No GUI changes. Profiling reports provided)
  • While waiting for someone to review your request, please help review another open pull request to support the maintainers

@riv-tnoble riv-tnoble requested a review from jschleicher as a code owner April 19, 2024 08:24
@riv-mjohnson
Copy link
Copy Markdown
Contributor

For reference, we are already using #3518, so this PR is the second round of profiling and optimisation compared to master.

@riv-tnoble
Copy link
Copy Markdown
Contributor Author

Unsure why the pipelines are failing. Looks like networking issues with apt?

Copy link
Copy Markdown
Contributor

@rhaschke rhaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid your approach is not viable: The request's start_state not only comprises initial joint values, but also attached collision objects.

const auto& names = jm->getVariableNames();
for (std::size_t i = 0, j = jm->getFirstVariableIndex(); i < jm->getVariableCount(); ++i, ++j)
start_joint_position[names[i]] = positions[j];
start_joint_position[names[i]] = req.start_state.joint_state.position[j];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only considers joint_state and ignores multi_dof_joint_state. Moreover, I think the order of joints is mangled if there are multi-dof joints.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does JointModel::getVariableNames() handle multi-dof joints? I may have (mistakenly) assumed that the code was designed to work with single-dof joints only, and that the extra check in the above if was redundant.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. getVariableNames() will return a vector with multiple elements for multi-dof joints.

start_state.update();
start_scene = std::move(ps);

start_scene = std::move(scene->diff());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move can be dropped here. This is only relevant for passing arguments, isn't it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, yes. I decided to leave it as it was already there, but happy to remove if not necessary.

@riv-tnoble
Copy link
Copy Markdown
Contributor Author

riv-tnoble commented Apr 19, 2024

I'm afraid your approach is not viable: The request's start_state not only comprises initial joint values, but also attached collision objects.

I agree that this specific method ignores attached collision objects, however it seems the previous code was doing that anyway? The RobotState being built previously would contain any attached collision objects (hence the long conversion), but ultimately this object is only used to populate start_joint_position. The robot state itself is not persisted, if I'm understanding correctly, therefore I was assuming the collision objects are handled outside of this code (i.e. it's only really the joints that are relevant for this method)?

@riv-tnoble
Copy link
Copy Markdown
Contributor Author

riv-tnoble commented Apr 19, 2024

As an addition to the above point, I tried planning a contact motion with a collision object, and the output appears to be what I'd expect:

Screenshot from 2024-04-19 13-51-56

[12:54:28.221328] [/move_group/rviz]: Detected collision in planned motion: components with ee23.

For reference, "components" is the part in green, "ee23" is the tool in red, and the thin tip at the end of the tool (also red, just touching the surface in white) is a separate collision object with contacts disabled with the "component", therefore the collision is due to the bulk of the tool colliding with the back half of the "component"

@riv-tnoble
Copy link
Copy Markdown
Contributor Author

I'll do some more checks with collisions in the meantime. We do some postprocessing after we get output from MoveIt, so I want to make sure the collisions are being detected before then (though I don't think we do any collision checks ourselves).

Would you mind re-running the pipelines? Hopefully if this is an issue, it'll be caught by a test which checks for collisions.

@riv-tnoble riv-tnoble requested a review from rhaschke April 19, 2024 13:22
@rhaschke
Copy link
Copy Markdown
Contributor

I agree that this specific method ignores attached collision objects, however it seems the previous code was doing that anyway?

Previously, the start_state was including attached collision objects from the request's start_state. This was important (and only recently changed) to consider these collision objects during planning.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.31%. Comparing base (c174715) to head (1fc2f91).
Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3589      +/-   ##
==========================================
- Coverage   47.15%   46.31%   -0.83%     
==========================================
  Files         603      603              
  Lines       58999    59104     +105     
  Branches     6969     6998      +29     
==========================================
- Hits        27814    27368     -446     
- Misses      30773    31320     +547     
- Partials      412      416       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rhaschke
Copy link
Copy Markdown
Contributor

I re-triggered CI. The current failures are due to real issues.

@TSNoble
Copy link
Copy Markdown
Contributor

TSNoble commented Apr 21, 2024

I agree that this specific method ignores attached collision objects, however it seems the previous code was doing that anyway?

Previously, the start_state was including attached collision objects from the request's start_state. This was important (and only recently changed) to consider these collision objects during planning.

I was more meaning that the start_state built from the request previously only persists within the scope of the method. Only the start_scene and the start_joint_positions are persisted to the resulting MotionPlanInfo object. The attached collision objects and anything else associated with the RobotState appear to have nothing to do with the MotionPlanInfo being constructed.

I did notice that PlanningScene::getCurrentState() returns a RobotState reference. Is it the case that this method was indirectly updating the scene with the objects through a side-effect?

As an aside, we're side-stepping using Pilz's plan sequence service due to error propagation documented here #3569, so we're basically doing iteration over the poses ourselves, and calling MoveGroupInterface::plan() on each waypoint, hence the expensive conversion performed per waypoint.

Perhaps a better solution would be to fix #3569, and ensure that Pilz's plan sequence service only performs the conversion of the expensive parts of the RobotState once at the start of the sequence for efficiency (which it may already be doing) so that we can continue to use that service.

I'd still like to know that this full conversion of the RobotState message in MotionPlanInfo::MotionPlanInfo() is definitely necessary though (more so I can understand the Pilz code better than anything else 😅)

@TSNoble
Copy link
Copy Markdown
Contributor

TSNoble commented Apr 21, 2024

I re-triggered CI. The current failures are due to real issues.

Thanks for doing that. I'll wait for a reply to my previous comment to decide whether this code should be fixed, or whether the proposed alternative is a better idea.

TSNoble added a commit to TSNoble/moveit that referenced this pull request Apr 21, 2024
moveit#3569 describes propagation of error from updating collision objects with each update of the start state during a sequence motion plan.

moveit#3589 attempts to speed up plans by updating only joints positions when updating the start state, though this may cause the RobotState to ignore collision objects entirely if the state being updated has no knowledge of them.

This PR attempts to fix both issues by only updating the joints of a sequence motion, without copying across attached collision objects, in the hope that the start state being updated is already populated with attached collision object information.
@TSNoble
Copy link
Copy Markdown
Contributor

TSNoble commented Apr 21, 2024

I've raised #3590 as an alternative solution which I hope will fix both #3569, and the issue of efficiency while planning.

@rhaschke
Copy link
Copy Markdown
Contributor

Is it the case that this method was indirectly updating the scene with the objects through a side-effect?

Yes, the planning scene's state was updated through this method.

@TSNoble
Copy link
Copy Markdown
Contributor

TSNoble commented Apr 21, 2024

Is it the case that this method was indirectly updating the scene with the objects through a side-effect?

Yes, the planning scene's state was updated through this method.

Thanks, that helps me understand the impact of the change, though I'm still a little unsure whether this poses an issue. My understanding is that the PlanningScene maintains an up-to-date instance of the RobotState, which should include any collision objects that are attached before the planner is called.

Requiring the planner to update the scene's state with objects would seem to suggest that the objects have changed without the scene knowing beforehand, or that they are expected to change mid-plan. The latter seems unlikely to me, and the former (i.e. ensuring that objects are up-to-date at the start of a plan) feels like a responsibility that should lie with the code calling the planner, rather than part of the planner itself.

I suppose my confusion boils down to the following question: would it be a valid use case to take a scene with no attached objects, and attempt to create a plan with objects that are not already known to the scene?

@rhaschke
Copy link
Copy Markdown
Contributor

My understanding is that the PlanningScene maintains an up-to-date instance of the RobotState, which should include any collision objects that are attached before the planner is called.

This is correct. However, you can pass a different start state for planning via the planning request (planning doesn't need to use the current state). This start state should be represented as a state diff, which only requires to specify diffs w.r.t. the current scene state.

@TSNoble
Copy link
Copy Markdown
Contributor

TSNoble commented Apr 21, 2024

My understanding is that the PlanningScene maintains an up-to-date instance of the RobotState, which should include any collision objects that are attached before the planner is called.

This is correct. However, you can pass a different start state for planning via the planning request (planning doesn't need to use the current state). This start state should be represented as a state diff, which only requires to specify diffs w.r.t. the current scene state.

Ah, yeah, that makes sense and thanks for the clarification.

Would you mind taking a look at #3590? I think that might be a more suitable solution. We're only calling Pilz to plan individual motions successively to get around issues we were seeing with the built-in sequence planning functionality. As mentioned there, the idea would be to do a full copy of the start state once before the sequence, then only for joints as each waypoint is planned.

@riv-mjohnson
Copy link
Copy Markdown
Contributor

My understanding is that the PlanningScene maintains an up-to-date instance of the RobotState, which should include any collision objects that are attached before the planner is called.

This is correct. However, you can pass a different start state for planning via the planning request (planning doesn't need to use the current state). This start state should be represented as a state diff, which only requires to specify diffs w.r.t. the current scene state.

Ah, I think this is where our problem lies. We're currently passing the whole state in, not a diff. Thank you for your help.

@riv-tnoble
Copy link
Copy Markdown
Contributor Author

riv-tnoble commented Apr 22, 2024

@rhaschke

I've tried rewriting our code to simply pass a RobotState msg with updated joints and is_diff = true, however, the planner is unable to find the subframes for the attached collision objects.

I thought maybe the planning scene was incorrect and didn't have this info, but oddly (after adding some print statements to the function changed in this PR, and reverting it back to its original form) it seems that the attached bodies are present, but get wiped after the robot state diff is applied:

TrajectoryGenerator::MotionPlanInfo::MotionPlanInfo(const planning_scene::PlanningSceneConstPtr& scene,
                                                    const planning_interface::MotionPlanRequest& req)
{
  auto ps = scene->diff();
  auto& start_state = ps->getCurrentStateNonConst();
  const auto& rs = req.start_state;
  std::vector<const moveit::core::AttachedBody*> attached_bodies;
  start_state.getAttachedBodies(attached_bodies);
  ROS_WARN("NUMBER OF ATTACHED BODIES (Before robotStateMsgToRobotState): %ld", attached_bodies.size());
  if (!rs.joint_state.name.empty() || !rs.multi_dof_joint_state.joint_names.empty())
    // update start state from request's start state
    moveit::core::robotStateMsgToRobotState(req.start_state, start_state);
  start_state.getAttachedBodies(attached_bodies);
  ROS_WARN("NUMBER OF ATTACHED BODIES (After robotStateMsgToRobotState): %ld", attached_bodies.size());
  start_state.update();
  start_state.getAttachedBodies(attached_bodies);
  ROS_WARN("NUMBER OF ATTACHED BODIES (After RobotState::update): %ld", attached_bodies.size());
  start_scene = std::move(ps);
  // initialize info.start_joint_position with active joint values from start_state
  const double* positions = start_state.getVariablePositions();
  for (const auto* jm : start_state.getRobotModel()->getJointModelGroup(req.group_name)->getActiveJointModels())
  {
    const auto& names = jm->getVariableNames();
    for (std::size_t i = 0, j = jm->getFirstVariableIndex(); i < jm->getVariableCount(); ++i, ++j)
      start_joint_position[names[i]] = positions[j];
  }
}
[12:14:04.899245] [/move_group/move_group]: NUMBER OF ATTACHED BODIES (Before robotStateMsgToRobotState): 6
[12:14:04.899271] [/move_group/move_group]: NUMBER OF ATTACHED BODIES (After robotStateMsgToRobotState): 0
[12:14:04.899285] [/move_group/move_group]: NUMBER OF ATTACHED BODIES (After update): 0
[12:14:04.899340] [/move_group/move_group]: Pose frame 'tb_4_26/burr_tip' does not exist.
[12:14:04.899372] [/move_group/move_group]: Inverse kinematics for pose 
  1.38525
0.0182948
 0.603565 has no solution.
[12:14:04.899426] [/move_group/move_group]: Failed to compute inverse kinematics for link: tb_4_26/burr_tip of goal pose

@riv-tnoble
Copy link
Copy Markdown
Contributor Author

riv-tnoble commented Apr 22, 2024

I think I've found the issue. We set the start state using MoveGroupInterface::setStartState() with a moveit_msgs::RobotState with is_diff = true. This converts the diff into a robot_msgs::RobotState, and uses the other implementation of setStartState() to set considered_start_state_ internally.

We then call MoveGroupInterface::plan(), which calls constructGoal(), which in turn calls constructMotionPlanRequest(). constructMotionPlanRequest turns the start state back into a message, but only sets is_diff to true if considered_start_state_ is unset.

So essentially, we're setting the start state as a diff, but the flag gets flipped before being passed to pilz, which then interprets the lack of attached collision objects in the message as a sign that it should clear them all from the state, rather than treating it as a diff. This is verified by adding some extra prints:

[14:23:24.623789] [/ns_motion/motion_server]: JOINTS (Our Code):
header: 
  seq: 0
  stamp: 0.000000000
  frame_id: 
name[]
  name[0]: yaskawa_joint_1_s
  name[1]: yaskawa_joint_2_l
  name[2]: yaskawa_joint_3_u
  name[3]: yaskawa_joint_4_r
  name[4]: yaskawa_joint_5_b
  name[5]: yaskawa_joint_6_t
  name[6]: cell_frame_tilt_arm
  name[7]: tilt_arm_fixture_plate
position[]
  position[0]: 0.013077
  position[1]: 0.268662
  position[2]: -0.476286
  position[3]: -0.0539252
  position[4]: 0.766893
  position[5]: -0.00960335
  position[6]: 0
  position[7]: -0.142854
velocity[]
  velocity[0]: 0
  velocity[1]: 0
  velocity[2]: 0
  velocity[3]: 0
  velocity[4]: 0
  velocity[5]: 0
  velocity[6]: 0
  velocity[7]: 0
effort[]
  effort[0]: 0
  effort[1]: 0
  effort[2]: 0
  effort[3]: 0
  effort[4]: 0
  effort[5]: 0
  effort[6]: 0
  effort[7]: 0
[14:23:24.623822] [/ns_motion/motion_server]: ATTACHED BODIES (Our Code):0
[14:23:24.623836] [/ns_motion/motion_server]: IS DIFF (Our Code): 1
[14:23:24.623853] [/ns_motion/motion_server]: SETTING START STATE FROM MSG

[14:23:24.630504] [/move_group/move_group]: JOINTS (MotionPlanInfo):
header: 
  seq: 0
  stamp: 0.000000000
  frame_id: world
name[]
  name[0]: cell_frame_tilt_arm
  name[1]: tilt_arm_fixture_plate
  name[2]: yaskawa_joint_1_s
  name[3]: yaskawa_joint_2_l
  name[4]: yaskawa_joint_3_u
  name[5]: yaskawa_joint_4_r
  name[6]: yaskawa_joint_5_b
  name[7]: yaskawa_joint_6_t
position[]
  position[0]: 0
  position[1]: -0.142854
  position[2]: 0.013077
  position[3]: 0.268662
  position[4]: -0.476286
  position[5]: -0.0539252
  position[6]: 0.766893
  position[7]: -0.00960335
velocity[]
  velocity[0]: 0
  velocity[1]: 0
  velocity[2]: 0
  velocity[3]: 0
  velocity[4]: 0
  velocity[5]: 0
  velocity[6]: 0
  velocity[7]: 0
effort[]
[14:23:24.630521] [/move_group/move_group]: ATTACHED BODIES (MotionPlanInfo):0
[14:23:24.630529] [/move_group/move_group]: IS DIFF (MotionPlanInfo): 0
[14:23:24.630539] [/move_group/move_group]: Not diff. Clearing attached bodies

[14:23:24.630566] [/move_group/move_group]: Pose frame 'tb_4_26/burr_tip' does not exist.
[14:23:24.630580] [/move_group/move_group]: Inverse kinematics for pose 
  1.38525
0.0182948
 0.603565 has no solution.
[14:23:24.630608] [/move_group/move_group]: Failed to compute inverse kinematics for link: tb_4_26/burr_tip of goal pose

@TSNoble
Copy link
Copy Markdown
Contributor

TSNoble commented Apr 22, 2024

Specifically, this line explicitly sets is_diff to false, even if the RobotState was originally built from a message where it was true. I'm unsure if this is intended behaviour i.e. diffs should be turned into RobotStates when robotStateMsgToRobotState() is called by applying them to the RobotState they are a diff with respect to.

I'd advocate for the opposite for a few reasons:

  1. If this is the case, it doesn't appear to be working as intended. Applying robotStateMsgToRobotState() followed by robotStateToRobotStateMsg() appears to give back a msg with diff contents, but the flag set incorrectly.

  2. Even if the code were modified to implement the above behaviour, this leads to the big drops in performance we're seeing when conversions are applied many times. I think it's nice to be able to pass around and convert RobotStates as diffs for efficiency, and only produce a full RobotState by applying the diff only when necessary.

  3. I think having robotStateMsgToRobotState() and robotStateToRobotStateMsg() be true inverses of one another is a desirable property. The naming convention seems to imply this is the intention, but it appears not to do so in practice.

I can add a test along the lines of...

TEST(TestConversions, RobotStateConversionsAreInverses) {
   moveit_msgs::RobotState original_msg, converted_msg;
   robot_state::RobotState state(ps.getRobotModel());
   ... /// Set msg values and diff flag 
   moveit::core::robotStateMsgToRobotState(original_msg, state);
   moveit::core::robotStateToRobotStateMsg(state, converted_msg);
   EXPECT_EQ(converted_msg, original_msg);
}

...and rework the logic of these methods to implement this functionality, however, I'm unsure whether changing the outputs of these functions will have downstream implications.

That being said, I think only code that falls under point 1 will be affected, and this seems to be a case where the output is invalid anyway (it contains the contents of a diff, but is not considered so by the flag)

@rhaschke
Copy link
Copy Markdown
Contributor

The RobotState class instance doesn't have an is_diff member. It always represents a full state, including all attached collision objects. Thus robotStateMsgToRobotState() and robotStateToRobotStateMsg() cannot be true inverses of each other: a diff cannot be retained.
I suggest changing the behavior of MoveGroupInterface: instead of converting a msg into a RobotState instance, the message should be retained as is.

@riv-tnoble
Copy link
Copy Markdown
Contributor Author

I suggest changing the behavior of MoveGroupInterface: instead of converting a msg into a RobotState instance, the message should be retained as is.

Thanks for the guidance. Currently working on modifying the code and I'll raise a separate PR.

The RobotState class instance doesn't have an is_diff member. It always represents a full state, including all attached collision objects. Thus robotStateMsgToRobotState() and robotStateToRobotStateMsg() cannot be true inverses of each other: a diff cannot be retained.

With this in mind, my understanding would be:

  1. Calling robotStateMsgToRobotState() on a diff message applies the diff and results in a fully populated RobotState
  2. Calling robotStateToRobotStateMsg() on the output should produce msg containing a full RobotState with the diff applied

Even so, the logs seem to show that the final message does not contain a full state. It contains the contents of the original diff, simply with the diff flag set to false. This seems like a bug to me?

@rhaschke
Copy link
Copy Markdown
Contributor

If your message doesn't comprise the full state, you probably passed an empty RobotState, with no attached collision objects defined yet. The existing objects are only cleared when the is_diff flag is not set:
https://github.com/ros-planning/moveit/blob/509ee493a76a99188dff050da903ee13f38a219e/moveit_core/robot_state/src/conversions.cpp#L369-L370

@riv-tnoble
Copy link
Copy Markdown
Contributor Author

Closing in favour of #3592

@riv-tnoble riv-tnoble closed this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants