Skip to content

Adds a MeshCache to _msgsToAttachedBody to prevent duplicate attached collision objects being stored in FCLShapeCache#3518

Closed
riv-mjohnson wants to merge 8 commits intomoveit:masterfrom
riv-mjohnson:master
Closed

Adds a MeshCache to _msgsToAttachedBody to prevent duplicate attached collision objects being stored in FCLShapeCache#3518
riv-mjohnson wants to merge 8 commits intomoveit:masterfrom
riv-mjohnson:master

Conversation

@riv-mjohnson
Copy link
Copy Markdown
Contributor

@riv-mjohnson riv-mjohnson commented Oct 30, 2023

Description

Fixes #3515

Previously, when planning a sequence motion, the attached collision objects would be converted to separate shapes::Mesh* for each waypoint along the path. Then each instance would be come a separate fcl::BVHModel instance in the FCLShapeCache, leading to large computational costs and memory usage when using large attached collision objects.

This PR adds caching of shapes constructed from meshes to _msgsToAttachedBody in conversions.cpp. This means that the shapes::Mesh* pointers passed into the FCLShapeCache are correctly aliased, and fcl::BVHModels do not need to be regenerated.

The mesh cache has an (approximate) maximum memory usage, and evicts cache entries using a last-used-first-out strategy when this memory usage is exceeded.

Checklist

  • Required by CI: Code is auto formatted using clang-format
  • Extend the tutorials / documentation reference N/A (I think?)
  • Document API changes relevant to the user in the MIGRATION.md notes N/A (no API changes)
  • Create tests, which fail without this PR reference
  • Include a screenshot if changing a GUI N/A (no GUI changes)
  • While waiting for someone to review your request, please help review another open pull request to support the maintainers

@welcome
Copy link
Copy Markdown

welcome bot commented Oct 30, 2023

Thanks for helping in improving MoveIt and open source robotics!

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 30, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (984a0ea) 61.81% compared to head (f8566e6) 61.89%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3518      +/-   ##
==========================================
+ Coverage   61.81%   61.89%   +0.08%     
==========================================
  Files         385      387       +2     
  Lines       34121    34176      +55     
==========================================
+ Hits        21088    21149      +61     
+ Misses      13033    13027       -6     
Files Coverage Δ
...obot_state/include/moveit/robot_state/mesh_cache.h 100.00% <100.00%> (ø)
moveit_core/robot_state/src/conversions.cpp 65.14% <66.67%> (+0.14%) ⬆️
moveit_core/robot_state/src/mesh_cache.cpp 96.23% <96.23%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@simonschmeisser
Copy link
Copy Markdown
Contributor

I really recommend you to install pre-commit, saves some round trips with the ci. https://moveit.ros.org/documentation/contributing/code/

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

I really recommend you to install pre-commit, saves some round trips with the ci. https://moveit.ros.org/documentation/contributing/code/

Yeah, sorry, my editor had the wrong clang-format loaded and kept undoing my changes. I think it's fixed now.

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

Some open questions I have on this:

  • What values should be used for min_size_to_cache and max_cache size? The 1Mb and 1Gb values are just guesses. I feel like these should probably be user-configurable, although I don't know how that process works here.
  • Is there a better way to hash a mesh than converting it to string and hashing the string?
  • The MeshCacheControlBlock currently stores both the ShapeConstPtr and the std::string mesh, which makes it twice as large as it maybe needs to be. Any smart ideas on how to guarantee meshes are correct without storing both?

@simonschmeisser
Copy link
Copy Markdown
Contributor

Very nice PR, thanks!

Some open questions I have on this:

* What values should be used for min_size_to_cache and max_cache size? The 1Mb and 1Gb values are just guesses. I feel like these should probably be user-configurable, although I don't know how that process works here.

Did you measure the overhead? Is there a size where it does not help to use the cache? I would suspect the turn-even << 1MB?

* Is there a better way to hash a mesh than converting it to string and hashing the string?

Maybe calculating hashes directly on mesh.triangles.data() and mesh.vertices.data()?

* The MeshCacheControlBlock currently stores both the ShapeConstPtr and the std::string mesh, which makes it twice as large as it maybe needs to be. Any smart ideas on how to guarantee meshes are correct without storing both?

So this basically boils down to the question if hash collisions are likely (enough) to warrant the storage costs. Maybe storing something like number of vertices and/or edges would be sufficient?

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

  • What values should be used for min_size_to_cache and max_cache size? The 1Mb and 1Gb values are just guesses. I feel like these should probably be user-configurable, although I don't know how that process works here.

Did you measure the overhead? Is there a size where it does not help to use the cache? I would suspect the turn-even << 1MB?

I haven't measured any overhead, and I'm not sure if my use cases are representative of wider use. I think the use case for the minimum size is if someone has a few large collision objects and many small ones, then the idea is to not cache the small objects to avoid kicking the large objects out of the cache. But I guess that use case is relatively specific, so the min size should default to zero?

For the maximum cache size, I guess it depends on how much RAM the user's computer has.

  • Is there a better way to hash a mesh than converting it to string and hashing the string?

Maybe calculating hashes directly on mesh.triangles.data() and mesh.vertices.data()?

Makes sense. I'll try that and see how it goes.

  • The MeshCacheControlBlock currently stores both the ShapeConstPtr and the std::string mesh, which makes it twice as large as it maybe needs to be. Any smart ideas on how to guarantee meshes are correct without storing both?

So this basically boils down to the question if hash collisions are likely (enough) to warrant the storage costs. Maybe storing something like number of vertices and/or edges would be sufficient?

I was assuming hash collisions were very unlikely, but that the cache returning the wrong shape would be critically bad.

…faults min_size_to_cache to 0 and adds a todo.
@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

  • What values should be used for min_size_to_cache and max_cache size? The 1Mb and 1Gb values are just guesses. I feel like these should probably be user-configurable, although I don't know how that process works here.

Did you measure the overhead? Is there a size where it does not help to use the cache? I would suspect the turn-even << 1MB?

I haven't measured any overhead, and I'm not sure if my use cases are representative of wider use. I think the use case for the minimum size is if someone has a few large collision objects and many small ones, then the idea is to not cache the small objects to avoid kicking the large objects out of the cache. But I guess that use case is relatively specific, so the min size should default to zero?

For the maximum cache size, I guess it depends on how much RAM the user's computer has.

  • Is there a better way to hash a mesh than converting it to string and hashing the string?

Maybe calculating hashes directly on mesh.triangles.data() and mesh.vertices.data()?

Makes sense. I'll try that and see how it goes.

  • The MeshCacheControlBlock currently stores both the ShapeConstPtr and the std::string mesh, which makes it twice as large as it maybe needs to be. Any smart ideas on how to guarantee meshes are correct without storing both?

So this basically boils down to the question if hash collisions are likely (enough) to warrant the storage costs. Maybe storing something like number of vertices and/or edges would be sufficient?

I was assuming hash collisions were very unlikely, but that the cache returning the wrong shape would be critically bad.

I've changed the default min_size_to_cache to 0, and added a TODO to consider exposing these parameters to the user (I don't think I know enough about how user-defined parameters in moveit work to do a good job of this myself).

I looked into changing the hash to use the raw message type rather than the string representation, but I think this is just going to add a lot of code complexity for little gain.

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

It looks like there's a problem using this PR with OMPL:

When two attached collision objects in the scene have the same mesh (and so their shape::Mesh* pointers are aliased by the MeshCache in this PR), the OMPL planner hangs forever on ompl_parallel_plan_.solve(...).

Before I investigate further, I'd like to know if either:

  • my approach in this PR is right, and this is a bug in OMPL (or the OMPL wrapper in Moveit), which I should investigate
  • OR the aliasing of meshes in this PR is not desirable behaviour, and I should try some other method of de-duplicating the entries in the FCLShapeCache
  • OR some other course of action would be better

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

@rhaschke could I get your thoughts on this PR and the issue with OMPL when you have the time? I'm keen to get this ready to merge, but not sure how to proceed.

@simonschmeisser
Copy link
Copy Markdown
Contributor

simonschmeisser commented Nov 20, 2023

This is a bit surprising as I couldn't spot any obvious new locks/mutexes and the cache is thread-local.

Could you provide a backtrace from when it hangs?

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
gdb --pid=$(pidof move_group) -ex 'set logging on' -ex 'set pagination off' -ex 'thread apply all bt full' -ex detach -ex quit

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

riv-mjohnson commented Nov 20, 2023

Could you provide a backtrace from when it hangs?

Sure, I'll give that a go.

This is a bit surprising as I couldn't spot any obvious new locks/mutexes and the cache is thread-local.

I don't think it's my code that's hanging, but rather that OMPL is hanging as a result of the aliased meshes.

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

I am struggling to attach gdb to the move_group in our system due to our docker and dependency management setup. I have passed this over to @rr-tom-noble who knows more about these systems. We should have a backtrace for you soon.

While trying to attach gdb, I did notice that it is not a full hang. OMPL just goes a lot slower than it should. I'm wondering if it is some kind of cache thrashing on the OMPL end. Hopefully we'll know more shortly.

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

riv-mjohnson commented Nov 20, 2023

gdb.txt

From which I think the relevant section is

Thread 13 (Thread 0x7f197affd700 (LWP 833)):
#0  0x00007f19a795a5e4 in void fcl::getRadiusAndOriginAndRectangleSize<double>(Eigen::Matrix<double, 3, 1, 0, 3, 1> const*, Eigen::Matrix<double, 3, 1, 0, 3, 1> const*, fcl::Triangle*, unsigned int*, int, Eigen::Matrix<double, 3, 3, 0, 3, 3> const&, Eigen::Matrix<double, 3, 1, 0, 3, 1>&, double*, double&) () from /opt/ros/noetic/lib/x86_64-linux-gnu/libfcl.so.0.6
No symbol table info available.
#1  0x00007f19a8d00e68 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#2  0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#3  0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#4  0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#5  0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#6  0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#7  0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#8  0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#9  0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#10 0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#11 0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#12 0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#13 0x00007f19a8d011a7 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#14 0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#15 0x00007f19a8d01187 in fcl::BVHModel<fcl::OBBRSS<double> >::recursiveBuildTree(int, int, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#16 0x00007f19a8d014bb in fcl::BVHModel<fcl::OBBRSS<double> >::buildTree() () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#17 0x00007f19a8d01812 in fcl::BVHModel<fcl::OBBRSS<double> >::endModel() () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#18 0x00007f19a8d02623 in std::shared_ptr<collision_detection::FCLGeometry const> collision_detection::createCollisionGeometry<fcl::OBBRSS<double>, moveit::core::AttachedBody>(std::shared_ptr<shapes::Shape const> const&, moveit::core::AttachedBody const*, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#19 0x00007f19a8cf857d in collision_detection::createCollisionGeometry(std::shared_ptr<shapes::Shape const> const&, double, double, moveit::core::AttachedBody const*, int) () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#20 0x00007f19a8d08047 in collision_detection::CollisionEnvFCL::getAttachedBodyObjects(moveit::core::AttachedBody const*, std::vector<std::shared_ptr<collision_detection::FCLGeometry const>, std::allocator<std::shared_ptr<collision_detection::FCLGeometry const> > >&) const () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#21 0x00007f19a8d08a89 in collision_detection::CollisionEnvFCL::constructFCLObjectRobot(moveit::core::RobotState const&, collision_detection::FCLObject&) const () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#22 0x00007f19a8d09bfe in collision_detection::CollisionEnvFCL::allocSelfCollisionBroadPhase(moveit::core::RobotState const&, collision_detection::FCLManager&) const () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#23 0x00007f19a8d09d86 in collision_detection::CollisionEnvFCL::checkSelfCollisionHelper(collision_detection::CollisionRequest const&, collision_detection::CollisionResult&, moveit::core::RobotState const&, collision_detection::AllowedCollisionMatrix const*) const () from /opt/ros/noetic/lib/libmoveit_collision_detection_fcl.so.1.1.13
No symbol table info available.
#24 0x00007f19893d59d0 in planning_scene::PlanningScene::checkCollision (robot_state=..., res=..., req=..., this=0x55fe0b1d4ec0) at /opt/ros/noetic/include/moveit/planning_scene/planning_scene.h:473
No locals.
#25 ompl_interface::StateValidityChecker::isValid (this=0x7f19680964d0, state=0x7f196852e130, verbose=<optimized out>) at ./ompl_interface/src/detail/state_validity_checker.cpp:110
        __PRETTY_FUNCTION__ = "bool ompl_interface::StateValidityChecker::isValid(const ompl::base::State*, bool) const"
        robot_state = 0x7f1968097480
        kset = <optimized out>
        res = {collision = false, distance = 1.7976931348623157e+308, contact_count = 0, contacts = std::map with 0 elements, cost_sources = std::set with 0 elements}
#26 0x00007f1988deab27 in ompl::base::DiscreteMotionValidator::checkMotion(ompl::base::State const*, ompl::base::State const*) const () from /opt/ros/noetic/lib/x86_64-linux-gnu/libompl.so.17
No symbol table info available.
#27 0x00007f19890e8460 in ompl::geometric::PathSimplifier::smoothBSpline(ompl::geometric::PathGeometric&, unsigned int, double) () from /opt/ros/noetic/lib/x86_64-linux-gnu/libompl.so.17
No symbol table info available.
#28 0x00007f19890ecb54 in ompl::geometric::PathSimplifier::simplify(ompl::geometric::PathGeometric&, ompl::base::PlannerTerminationCondition const&, bool) () from /opt/ros/noetic/lib/x86_64-linux-gnu/libompl.so.17
No symbol table info available.
#29 0x00007f19890eda5c in ompl::geometric::SimpleSetup::simplifySolution(ompl::base::PlannerTerminationCondition const&) () from /opt/ros/noetic/lib/x86_64-linux-gnu/libompl.so.17
No symbol table info available.
#30 0x00007f19893b01c5 in ompl_interface::ModelBasedPlanningContext::simplifySolution (this=0x7f196804d7f0, timeout=-14.141440099) at /usr/include/c++/9/bits/shared_ptr_base.h:1020
        start = {__d = {__r = 1700494864349087955}}
        ptc = {impl_ = warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl, std::allocator<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl, std::allocator<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr<ompl::base::PlannerTerminationCondition::PlannerTerminationConditionImpl> (use count 1, weak count 0) = {get() = 0x7f1968c5d290}}
#31 0x00007f19893b627c in ompl_interface::ModelBasedPlanningContext::solve (this=0x7f196804d7f0, res=...) at ./ompl_interface/src/model_based_planning_context.cpp:691

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

I'm not quite sure what to make of that. It looks like these changes make OMPL's use of fcl::BVHModel much worse than before?

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

I wonder if this is the wrong solution to the problem, and if something a bit more targeted to FCL would be more appropriate?

If I can get FCL to hash the meshes in its cache, I could change that cache to identify identical meshes which don't share a pointer, and change that cache to look more like the cache I have implemented here.

That would hopefully solve the problem without so many possible downstream consequences, and not introduce a second cache where a second cache isn't really needed.

I'll probably look into doing this sometime next week.

@simonschmeisser
Copy link
Copy Markdown
Contributor

When two attached collision objects in the scene have the same mesh (and so their shape::Mesh* pointers are aliased by the MeshCache in this PR), the OMPL planner hangs forever on ompl_parallel_plan_.solve(...).

I missed this particular point (two) until now. The cache used in the FCL collision checker plugin (not actually part of FCL but MoveIt) assumes there's at most one attached and one non-attached instance of a certain mesh. Your use-case breaks this assumption.

https://github.com/ros-planning/moveit/blob/d8e0a72e9a69202521aad251a9fa8e2696d8934e/moveit_core/collision_detection_fcl/src/collision_common.cpp#L370

https://github.com/ros-planning/moveit/blob/d8e0a72e9a69202521aad251a9fa8e2696d8934e/moveit_core/collision_detection_fcl/src/collision_common.cpp#L695

Not sure how to fix/extend this actually ...

@riv-mjohnson
Copy link
Copy Markdown
Contributor Author

Since this PR introduces downstream problems, and the performance issues appear to be fixed by #3590, I'm going to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FCLShapeCache does not cache attached collision objects between different robot states on the same trajectory

2 participants