Skip to content

Fix Constraint Planning Segfault#2130

Merged
sjahr merged 7 commits intomoveit:mainfrom
MarqRazz:pr-fix_constraint_segfault
May 11, 2023
Merged

Fix Constraint Planning Segfault#2130
sjahr merged 7 commits intomoveit:mainfrom
MarqRazz:pr-fix_constraint_segfault

Conversation

@MarqRazz
Copy link
Copy Markdown
Contributor

@MarqRazz MarqRazz commented Apr 21, 2023

Description

This PR is an attempt to understand and hopefully fix a segfault I have been running into when asking MoveIt to solve constrained motion as reported in #2103.

While debugging we have found that the ompl::base::PlannerPtr generated on the first planning request is going out of scope causing the next planning request to fault the program. As you can see in my initial commit I have removed the ability for the planning allocator to reuse the planner between requests which fixes the segfault. This is not ideal if the user is attempting to build a roadmap database because it would only contain the map of a single plan if written to disk, or thrown away on the next request. The planner is created on this line by passing it the SpaceInformationPtr that is created here. If I'm understanding OMPL correctly this SpaceInformationPtr is being moved to the Planner object here and the ConfiguredPlannerAllocator containing it should be kept alive/in context by the known_planners_ object of the Planning Context Manager.

I would really appreciate any help in identifying why the pointer, or objects inside of it, is going out of scope of if my understanding of the program is incorrect! As I have been learning this section of the software I have also wondered if the current implementation is worth fixing / there is a better way to preserve the planning scene between motion requests. Because the planner is constructed with the initial planning request goal's path_constraint does this mean that if it is reused it could have invalid constraints specified if they change between requests?

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 21, 2023

Codecov Report

Patch coverage has no change and project coverage change: +0.01 🎉

Comparison is base (237e918) 50.48% compared to head (0f852cf) 50.48%.

❗ Current head 0f852cf differs from pull request most recent head a1a5925. Consider uploading reports for the commit a1a5925 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2130      +/-   ##
==========================================
+ Coverage   50.48%   50.48%   +0.01%     
==========================================
  Files         387      387              
  Lines       31785    31790       +5     
==========================================
+ Hits        16044    16047       +3     
- Misses      15741    15743       +2     
Impacted Files Coverage Δ
...pl/ompl_interface/src/planning_context_manager.cpp 64.60% <0.00%> (-1.28%) ⬇️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sjahr sjahr self-assigned this Apr 24, 2023
@henningkayser henningkayser self-requested a review April 25, 2023 14:15
// auto planner_map_it = planners_.find(new_name);
// if (planner_map_it != planners_.end())
// {
// return planner_map_it->second;
Copy link
Copy Markdown
Member

@AndyZe AndyZe Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

@MarqRazz MarqRazz Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

->second is a ompl::base::PlannerPtr that is created on this line by passing it the SpaceInformationPtr that is created here and from my understanding does not include the planning_scene. It does use a robot_model which I think comes from that locked scene so that could be the issue!

We do create a locked planning_scene on each planning request here but it gets unlocked after the goal succeeds.

@henningkayser
Copy link
Copy Markdown
Member

Please see my comment here #2103 (comment)

@mamoll
Copy link
Copy Markdown
Contributor

mamoll commented May 4, 2023

If this fixes the crash, then this seems like a useful workaround that should be merged. Recreating a planner instance from planner data adds a bit of overhead, but shouldn't be too bad. I'd add a comment along these lines so that we don't forget why this is done:

// FIXME: make reusing PlannerPtr not crash, so that we don't have to reconstruct a PlannerPtr instance

{
return planner_map_it->second;
ob::PlannerData data(si);
planner_map_it->second->getPlannerData(data);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused now. How can we access the planner data if the planner instance is invalid? Or are there actually any members inside the planner instance that go out of scope? This version looks like it should produce the same segfaults

Copy link
Copy Markdown
Contributor Author

@MarqRazz MarqRazz May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the stack traces I uploaded to the issue it faults using the planning_context here which contains the planner and a bunch of other stuff. Some of its objects go out of scope the but the planner data does not.

planning_context_->getOMPLStateSpace()->copyToRobotState(*robot_state, wrapped_state);

Copy link
Copy Markdown
Member

@henningkayser henningkayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with merging this as a temporary solution. The overhead of creating a new planner instance is really not too much of an issue, but we should keep on looking into the underlying cause anyway.

@MarqRazz
Copy link
Copy Markdown
Contributor Author

MarqRazz commented May 8, 2023

Just so everyone knows, I have done some profiling and re-constructing the planner is taking a little under 0.2 seconds which is not ideal for application that are going for fast cycle times!

ompl::geometric::LazyPRM::LazyPRM(const base::PlannerData &data, bool starStrategy)
*** Profiling statistics. Total counted time : 0.181746 seconds
Blocks of time:
lazyPRMConstructor: 0.181721s (99.9865%), [0.181721s --> 0.181721 s], 1 parts, 0.181721 s on average
Unaccounted time : 2.4583e-05 (0.013526 %)

I have also found that re-using the planning graph does not speed up LazyPRM. If you give it the exact same problem to solve over and over again the graph will increase in size with each attempt but the planning time always uses the max allowed set by the user. This could be an error with my configuration so I will update this PR with more details if I can get it working better.

Copy link
Copy Markdown
Contributor

@sjahr sjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this fix and open a follow-up issue 👍

{
// If we already have an instance, use that one
// If we already have an instance, reuse it's planning data
// FIXME: make reusing PlannerPtr not crash, so that we don't have to reconstruct a PlannerPtr instance
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a follow-up issue for this? With a summery of you findings and (if possible) a path forward so somebody else can take this up in the future?

Copy link
Copy Markdown
Contributor

@sjahr sjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this fix and open a follow-up issue 👍

@sjahr sjahr enabled auto-merge (squash) May 11, 2023 07:38
@sjahr sjahr merged commit 92a7951 into moveit:main May 11, 2023
@MarqRazz MarqRazz deleted the pr-fix_constraint_segfault branch May 11, 2023 14:38
@MarqRazz
Copy link
Copy Markdown
Contributor Author

MarqRazz commented May 11, 2023

@sjahr can we back port this to Humble? I don't have permissions to ask mergify.

@sjahr sjahr added the backport-humble Mergify label that triggers a PR backport to Humble label May 12, 2023
mergify bot pushed a commit that referenced this pull request May 12, 2023
* Fix Constraint Planning Segfault

* Reuse planner data

* apply clang formatting

* apply clang formatting round 2

* add FIXME note and verbose output of planning graph size

---------

Co-authored-by: Sebastian Jahr <sebastian.jahr@picknik.ai>
(cherry picked from commit 92a7951)
sjahr pushed a commit that referenced this pull request May 12, 2023
* Fix Constraint Planning Segfault

* Reuse planner data

* apply clang formatting

* apply clang formatting round 2

* add FIXME note and verbose output of planning graph size

---------

Co-authored-by: Sebastian Jahr <sebastian.jahr@picknik.ai>
(cherry picked from commit 92a7951)

Co-authored-by: Marq Rasmussen <marq.razz@gmail.com>
sjahr added a commit to PickNikRobotics/moveit2 that referenced this pull request May 12, 2023
* Fix Constraint Planning Segfault

* Reuse planner data

* apply clang formatting

* apply clang formatting round 2

* add FIXME note and verbose output of planning graph size

---------

Co-authored-by: Sebastian Jahr <sebastian.jahr@picknik.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-humble Mergify label that triggers a PR backport to Humble

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants