Adds a MeshCache to _msgsToAttachedBody to prevent duplicate attached collision objects being stored in FCLShapeCache by riv-mjohnson · Pull Request #3518 · moveit/moveit

riv-mjohnson · 2023-10-30T12:05:19Z

Description

Previously, when planning a sequence motion, the attached collision objects would be converted to separate shapes::Mesh* for each waypoint along the path. Then each instance would be come a separate fcl::BVHModel instance in the FCLShapeCache, leading to large computational costs and memory usage when using large attached collision objects.

This PR adds caching of shapes constructed from meshes to _msgsToAttachedBody in conversions.cpp. This means that the shapes::Mesh* pointers passed into the FCLShapeCache are correctly aliased, and fcl::BVHModels do not need to be regenerated.

The mesh cache has an (approximate) maximum memory usage, and evicts cache entries using a last-used-first-out strategy when this memory usage is exceeded.

Checklist

Required by CI: Code is auto formatted using clang-format
Extend the tutorials / documentation reference N/A (I think?)
Document API changes relevant to the user in the MIGRATION.md notes N/A (no API changes)
Create tests, which fail without this PR reference
Include a screenshot if changing a GUI N/A (no GUI changes)
While waiting for someone to review your request, please help review another open pull request to support the maintainers

…l needs documenting and testing.

…omments.

welcome · 2023-10-30T12:05:23Z

Thanks for helping in improving MoveIt and open source robotics!

codecov · 2023-10-30T12:18:07Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (984a0ea) 61.81% compared to head (f8566e6) 61.89%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3518      +/-   ##
==========================================
+ Coverage   61.81%   61.89%   +0.08%     
==========================================
  Files         385      387       +2     
  Lines       34121    34176      +55     
==========================================
+ Hits        21088    21149      +61     
+ Misses      13033    13027       -6

Files	Coverage Δ
...obot_state/include/moveit/robot_state/mesh_cache.h	`100.00% <100.00%> (ø)`
moveit_core/robot_state/src/conversions.cpp	`65.14% <66.67%> (+0.14%)`	⬆️
moveit_core/robot_state/src/mesh_cache.cpp	`96.23% <96.23%> (ø)`

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

simonschmeisser · 2023-10-30T12:38:02Z

I really recommend you to install pre-commit, saves some round trips with the ci. https://moveit.ros.org/documentation/contributing/code/

riv-mjohnson · 2023-10-30T12:39:20Z

I really recommend you to install pre-commit, saves some round trips with the ci. https://moveit.ros.org/documentation/contributing/code/

Yeah, sorry, my editor had the wrong clang-format loaded and kept undoing my changes. I think it's fixed now.

riv-mjohnson · 2023-10-30T13:30:44Z

Some open questions I have on this:

What values should be used for min_size_to_cache and max_cache size? The 1Mb and 1Gb values are just guesses. I feel like these should probably be user-configurable, although I don't know how that process works here.
Is there a better way to hash a mesh than converting it to string and hashing the string?
The MeshCacheControlBlock currently stores both the ShapeConstPtr and the std::string mesh, which makes it twice as large as it maybe needs to be. Any smart ideas on how to guarantee meshes are correct without storing both?

…ppy.

simonschmeisser · 2023-10-30T21:56:26Z

Very nice PR, thanks!

Some open questions I have on this:

* What values should be used for min_size_to_cache and max_cache size? The 1Mb and 1Gb values are just guesses. I feel like these should probably be user-configurable, although I don't know how that process works here.

Did you measure the overhead? Is there a size where it does not help to use the cache? I would suspect the turn-even << 1MB?

* Is there a better way to hash a mesh than converting it to string and hashing the string?

Maybe calculating hashes directly on mesh.triangles.data() and mesh.vertices.data()?

* The MeshCacheControlBlock currently stores both the ShapeConstPtr and the std::string mesh, which makes it twice as large as it maybe needs to be. Any smart ideas on how to guarantee meshes are correct without storing both?

So this basically boils down to the question if hash collisions are likely (enough) to warrant the storage costs. Maybe storing something like number of vertices and/or edges would be sufficient?

riv-mjohnson · 2023-10-31T09:43:26Z

What values should be used for min_size_to_cache and max_cache size? The 1Mb and 1Gb values are just guesses. I feel like these should probably be user-configurable, although I don't know how that process works here.

Did you measure the overhead? Is there a size where it does not help to use the cache? I would suspect the turn-even << 1MB?

I haven't measured any overhead, and I'm not sure if my use cases are representative of wider use. I think the use case for the minimum size is if someone has a few large collision objects and many small ones, then the idea is to not cache the small objects to avoid kicking the large objects out of the cache. But I guess that use case is relatively specific, so the min size should default to zero?

For the maximum cache size, I guess it depends on how much RAM the user's computer has.

Is there a better way to hash a mesh than converting it to string and hashing the string?

Maybe calculating hashes directly on mesh.triangles.data() and mesh.vertices.data()?

Makes sense. I'll try that and see how it goes.

The MeshCacheControlBlock currently stores both the ShapeConstPtr and the std::string mesh, which makes it twice as large as it maybe needs to be. Any smart ideas on how to guarantee meshes are correct without storing both?

So this basically boils down to the question if hash collisions are likely (enough) to warrant the storage costs. Maybe storing something like number of vertices and/or edges would be sufficient?

I was assuming hash collisions were very unlikely, but that the cache returning the wrong shape would be critically bad.

…faults min_size_to_cache to 0 and adds a todo.

riv-mjohnson · 2023-10-31T13:39:04Z

What values should be used for min_size_to_cache and max_cache size? The 1Mb and 1Gb values are just guesses. I feel like these should probably be user-configurable, although I don't know how that process works here.

Did you measure the overhead? Is there a size where it does not help to use the cache? I would suspect the turn-even << 1MB?

I haven't measured any overhead, and I'm not sure if my use cases are representative of wider use. I think the use case for the minimum size is if someone has a few large collision objects and many small ones, then the idea is to not cache the small objects to avoid kicking the large objects out of the cache. But I guess that use case is relatively specific, so the min size should default to zero?

For the maximum cache size, I guess it depends on how much RAM the user's computer has.

Is there a better way to hash a mesh than converting it to string and hashing the string?

Maybe calculating hashes directly on mesh.triangles.data() and mesh.vertices.data()?

Makes sense. I'll try that and see how it goes.

The MeshCacheControlBlock currently stores both the ShapeConstPtr and the std::string mesh, which makes it twice as large as it maybe needs to be. Any smart ideas on how to guarantee meshes are correct without storing both?

So this basically boils down to the question if hash collisions are likely (enough) to warrant the storage costs. Maybe storing something like number of vertices and/or edges would be sufficient?

I was assuming hash collisions were very unlikely, but that the cache returning the wrong shape would be critically bad.

I've changed the default min_size_to_cache to 0, and added a TODO to consider exposing these parameters to the user (I don't think I know enough about how user-defined parameters in moveit work to do a good job of this myself).

I looked into changing the hash to use the raw message type rather than the string representation, but I think this is just going to add a lot of code complexity for little gain.

riv-mjohnson · 2023-11-07T11:08:59Z

It looks like there's a problem using this PR with OMPL:

When two attached collision objects in the scene have the same mesh (and so their shape::Mesh* pointers are aliased by the MeshCache in this PR), the OMPL planner hangs forever on ompl_parallel_plan_.solve(...).

Before I investigate further, I'd like to know if either:

my approach in this PR is right, and this is a bug in OMPL (or the OMPL wrapper in Moveit), which I should investigate
OR the aliasing of meshes in this PR is not desirable behaviour, and I should try some other method of de-duplicating the entries in the FCLShapeCache
OR some other course of action would be better

riv-mjohnson · 2023-11-20T08:07:57Z

@rhaschke could I get your thoughts on this PR and the issue with OMPL when you have the time? I'm keen to get this ready to merge, but not sure how to proceed.

simonschmeisser · 2023-11-20T08:49:02Z

This is a bit surprising as I couldn't spot any obvious new locks/mutexes and the cache is thread-local.

Could you provide a backtrace from when it hangs?

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
gdb --pid=$(pidof move_group) -ex 'set logging on' -ex 'set pagination off' -ex 'thread apply all bt full' -ex detach -ex quit

riv-mjohnson · 2023-11-20T09:32:21Z

Could you provide a backtrace from when it hangs?

Sure, I'll give that a go.

This is a bit surprising as I couldn't spot any obvious new locks/mutexes and the cache is thread-local.

I don't think it's my code that's hanging, but rather that OMPL is hanging as a result of the aliased meshes.

riv-mjohnson · 2023-11-20T11:16:16Z

I am struggling to attach gdb to the move_group in our system due to our docker and dependency management setup. I have passed this over to @rr-tom-noble who knows more about these systems. We should have a backtrace for you soon.

While trying to attach gdb, I did notice that it is not a full hang. OMPL just goes a lot slower than it should. I'm wondering if it is some kind of cache thrashing on the OMPL end. Hopefully we'll know more shortly.

riv-mjohnson · 2023-11-20T15:42:07Z

gdb.txt

From which I think the relevant section is

Thread 13 (Thread 0x7f197affd700 (LWP 833)):
#0  0x00007f19a795a5e4 in void fcl::getRadiusAndOriginAndRectangleSize<double>(Eigen::Matrix<double, 3, 1, 0, 3, 1> const*, Eigen::Matrix<double, 3, 1, 0, 3, 1> const*, fcl::Triangle*, unsigned int*, int, Eigen::Matrix<double, 3, 3, 0, 3, 3> const&, Eigen::Matrix<double, 3, 1, 0, 3, 1>&, double*, double&) () from /opt/ros/noetic/lib/x86_64-linux-gnu/libfcl.so.0.6
No symbol table info available.
#1  0x00007f19a8d00e68 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#2  0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#3  0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#4  0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#5  0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#6  0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#7  0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#8  0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#9  0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#10 0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#11 0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#12 0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#13 0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#14 0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#15 0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#16 0x00007f19a8d014bb in fcl::BVHModel<fcl::OBBRSS<double> >::buildTree() () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#17 0x00007f19a8d01812 in fcl::BVHModel<fcl::OBBRSS<double> >::endModel() () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#18 0x00007f19a8d02623 in std::shared_ptr<collision_detection::FCLGeometry const> collision_detection::createCollisionGeometry<fcl::OBBRSS<double>, moveit::core::AttachedBody>(std::shared_ptr<shapes::Shape const> const&, moveit::core::AttachedBody const*, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#19 0x00007f19a8cf857d in collision_detection::createCollisionGeometry(std::shared_ptr<shapes::Shape const> const&, double, double, moveit::core::AttachedBody const*, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#20 0x00007f19a8d08047 in collision_detection::CollisionEnvFCL::getAttachedBodyObjects(moveit::core::AttachedBody const*, std::vector<std::shared_ptr<collision_detection::FCLGeometry const>, std::allocator<std::shared_ptr<collision_detection::FCLGeometry const> > >&) const () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#21 0x00007f19a8d08a89 in collision_detection::CollisionEnvFCL::constructFCLObjectRobot(moveit::core::RobotState const&, collision_detection::FCLObject&) const () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#22 0x00007f19a8d09bfe in collision_detection::CollisionEnvFCL::allocSelfCollisionBroadPhase(moveit::core::RobotState const&, collision_detection::FCLManager&) const () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#23 0x00007f19a8d09d86 in collision_detection::CollisionEnvFCL::checkSelfCollisionHelper(collision_detection::CollisionRequest const&, collision_detection::CollisionResult&, moveit::core::RobotState const&, collision_detection::AllowedCollisionMatrix const*) const () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#24 0x00007f19893d59d0 in planning_scene::PlanningScene::checkCollision (robot_state=..., res=..., req=..., this=0x55fe0b1d4ec0) at /opt/ros/noetic/include/moveit/planning_scene/planning_scene.h:473
No locals.
#25 ompl_interface::StateValidityChecker::isValid (this=0x7f19680964d0, state=0x7f196852e130, verbose=<optimized out>) at ./ompl_interface/src/detail/state_validity_checker.cpp:110
        __PRETTY_FUNCTION__ = "bool ompl_interface::StateValidityChecker::isValid(const ompl::base::State*, bool) const"
        robot_state = 0x7f1968097480
        kset = <optimized out>
        res = {collision = false, distance = 1.7976931348623157e+308, contact_count = 0, contacts = std::map with 0 elements, cost_sources = std::set with 0 elements}
#26 0x00007f1988deab27 in ompl::base::DiscreteMotionValidator::checkMotion(ompl::base::State const*, ompl::base::State const*) const () from /opt/ros/noetic/lib/x86_64-linux-gnu/libompl.so.17
No symbol table info available.
#27 0x00007f19890e8460 in ompl::geometric::PathSimplifier::smoothBSpline(ompl::geometric::PathGeometric&, unsigned int, double) () from /opt/ros/noetic/lib/x86_64-linux-gnu/libompl.so.17
No symbol table info available.
#28 0x00007f19890ecb54 in ompl::geometric::PathSimplifier::simplify(ompl::geometric::PathGeometric&, ompl::base::PlannerTerminationCondition const&, bool) () from /opt/ros/noetic/lib/x86_64-linux-gnu/libompl.so.17
No symbol table info available.
#29 0x00007f19890eda5c in ompl::geometric::SimpleSetup::simplifySolution(ompl::base::PlannerTerminationCondition const&) () from /opt/ros/noetic/lib/x86_64-linux-gnu/libompl.so.17
No symbol table info available.
#30 0x00007f19893b01c5 in ompl_interface::ModelBasedPlanningContext::simplifySolution (this=0x7f196804d7f0, timeout=-14.141440099) at /usr/include/c++/9/bits/shared_ptr_base.h:1020
        start = {__d = {__r = 1700494864349087955}}
        ptc = {impl_ = warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl, std::allocator<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl, std::allocator<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl> (use count 1, weak count 0) = {get() = 0x7f1968c5d290}}
#31 0x00007f19893b627c in ompl_interface::ModelBasedPlanningContext::solve (this=0x7f196804d7f0, res=...) at ./ompl_interface/src/model_based_planning_context.cpp:691

riv-mjohnson · 2023-11-20T15:50:51Z

I'm not quite sure what to make of that. It looks like these changes make OMPL's use of fcl::BVHModel much worse than before?

riv-mjohnson · 2023-11-24T11:56:35Z

I wonder if this is the wrong solution to the problem, and if something a bit more targeted to FCL would be more appropriate?

If I can get FCL to hash the meshes in its cache, I could change that cache to identify identical meshes which don't share a pointer, and change that cache to look more like the cache I have implemented here.

That would hopefully solve the problem without so many possible downstream consequences, and not introduce a second cache where a second cache isn't really needed.

I'll probably look into doing this sometime next week.

simonschmeisser · 2023-11-24T15:49:49Z

When two attached collision objects in the scene have the same mesh (and so their shape::Mesh* pointers are aliased by the MeshCache in this PR), the OMPL planner hangs forever on ompl_parallel_plan_.solve(...).

I missed this particular point (two) until now. The cache used in the FCL collision checker plugin (not actually part of FCL but MoveIt) assumes there's at most one attached and one non-attached instance of a certain mesh. Your use-case breaks this assumption.

https://github.com/ros-planning/moveit/blob/d8e0a72e9a69202521aad251a9fa8e2696d8934e/moveit_core/collision_detection_fcl/src/collision_common.cpp#L370

https://github.com/ros-planning/moveit/blob/d8e0a72e9a69202521aad251a9fa8e2696d8934e/moveit_core/collision_detection_fcl/src/collision_common.cpp#L695

Not sure how to fix/extend this actually ...

riv-mjohnson · 2024-10-15T08:31:11Z

Since this PR introduces downstream problems, and the performance issues appear to be fixed by #3590, I'm going to close it.

riv-mjohnson added 3 commits October 27, 2023 14:43

Adds cache for ShapeConstPtrs generated from shape_msgs::Meshes. Stil…

598f1c4

…l needs documenting and testing.

Corrects approximate memory use calculation. Adds documentation and c…

d199d9b

…omments.

Adds unit tests.

7f36a60

riv-mjohnson requested review from felixvd, mlautman and rhaschke as code owners October 30, 2023 12:05

Fixes lint.

20c54cd

riv-mjohnson added 2 commits October 30, 2023 12:30

Adds newlines to ends of files.

788e9a5

Fixes newlines.

e74b763

Makes shared pointer arguments const references to make clang-tidy ha…

e4d20d3

…ppy.

Adds tests for ThreadLocalCache; deletes unused hashMesh function; de…

f8566e6

…faults min_size_to_cache to 0 and adds a todo.

riv-mjohnson mentioned this pull request Nov 8, 2023

Fix pilz planners to consider and plan Cartesian motions w.r.t. subframes #3519

Merged

6 tasks

riv-mjohnson mentioned this pull request Apr 19, 2024

Removes slow robotStateMsgToRobotState call from pilz #3589

Closed

6 tasks

riv-mjohnson closed this Oct 15, 2024

Conversation

riv-mjohnson commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

welcome bot commented Oct 30, 2023

Uh oh!

codecov bot commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

simonschmeisser commented Oct 30, 2023

Uh oh!

riv-mjohnson commented Oct 30, 2023

Uh oh!

riv-mjohnson commented Oct 30, 2023

Uh oh!

simonschmeisser commented Oct 30, 2023

Uh oh!

riv-mjohnson commented Oct 31, 2023

Uh oh!

riv-mjohnson commented Oct 31, 2023

Uh oh!

riv-mjohnson commented Nov 7, 2023

Uh oh!

riv-mjohnson commented Nov 20, 2023

Uh oh!

simonschmeisser commented Nov 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riv-mjohnson commented Nov 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riv-mjohnson commented Nov 20, 2023

Uh oh!

riv-mjohnson commented Nov 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riv-mjohnson commented Nov 20, 2023

Uh oh!

riv-mjohnson commented Nov 24, 2023

Uh oh!

simonschmeisser commented Nov 24, 2023

Uh oh!

riv-mjohnson commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

riv-mjohnson commented Oct 30, 2023 •

edited

Loading

codecov bot commented Oct 30, 2023 •

edited

Loading

simonschmeisser commented Nov 20, 2023 •

edited

Loading

riv-mjohnson commented Nov 20, 2023 •

edited

Loading

riv-mjohnson commented Nov 20, 2023 •

edited

Loading