Skip to content

[test] move_group pick place test#2031

Merged
henningkayser merged 2 commits intomoveit:masterfrom
PickNikRobotics:p/tylerjw/pick_place_test
May 19, 2020
Merged

[test] move_group pick place test#2031
henningkayser merged 2 commits intomoveit:masterfrom
PickNikRobotics:p/tylerjw/pick_place_test

Conversation

@tylerjw
Copy link
Copy Markdown
Member

@tylerjw tylerjw commented Apr 19, 2020

Description

Another test for basic functionality from our tutorials to protect the experience of new users.

There is a bunch of hard coded strings and magic numbers for the geometry for this test. I would appreciate any way to reduce the complexity of the test code. Most of the code for this was copied from the tutorial that does the same thing. @mlautman do you see anyway to reduce the complexity of this test and the tutorial?

The actual test itself is just the return values of the pick and place method calls in the move_group. Another potential improvement to this test would be to get the new location of the object that was placed and validate it is in the correct new position. I'm not sure what the return value of the move_group methods actually mean (I'm assuming it means the trajectory ran without error) and if testing the location of the moved object would be redundant.

The last thing is the tutorial sets the planning time to 45s so I did that here too. I'm assuming needs to be long because on a debug build (what the codecov build is) planning takes a long time. Is there a good way to know how long to set this to avoid the test being flaky?

Another thing to point out is that I'm using moveit_resources but I have tested this with my PR to update it to the current state of the upstream panda_moveit_config which is what the tutorial uses so this test should work after my change to moveit_resources is updated.

Checklist

  • Required by CI: Code is auto formatted using clang-format
  • While waiting for someone to review your request, please help review another open pull request to support the maintainers
  • Create tests, which fail without this PR reference
  • Moves the CodeCov report in the right direction ⬆️ ⬆️ ⬆️

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented Apr 19, 2020

Sadness, this is failing on kinetic with a timeout and a ton this in the output of running the test:
eigen: too many iterations in Jacobi transform.

I tried increasing the length of the test but I don't like that solution even if it does work. Has anyone seen this? Any ideas how to troubleshoot this?

It also looks like fake_controller_joint_states is crashing during the test:


Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(/move_group/fake_controller_joint_states, stopped daemon 140006693263104)>>
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 814, in __bootstrap_inner
    (self.name, _format_exc()))
  File "/usr/lib/python2.7/traceback.py", line 241, in format_exc
    etype, value, tb = sys.exc_info()
AttributeError: 'NoneType' object has no attribute 'exc_info'

@v4hn
Copy link
Copy Markdown
Contributor

v4hn commented Apr 20, 2020 via email

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented Apr 20, 2020

@v4hn I'll do that and get back to you. It'd be nice to squash the kinetic bug too if possible.

One other note is that @henningkayser pointed out that there are talks about deprecating this pick and place interface in favor of something more powerful (MTC maybe). I think it is still useful to have a test for it until we do that work though.

The primary reason is I'm concerned about the experience of new users and I think that this is a likely interface for them to start playing with. The tutorial is the first result to the google search "moveit pick and place".

@tylerjw tylerjw force-pushed the p/tylerjw/pick_place_test branch from f6b9052 to 058b75f Compare April 30, 2020 20:29
@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented Apr 30, 2020

I sunk a couple hours into this and don't feel any closer to a solution. I can reproduce the error locally by checking out docker and running the test locally. I'm pasting my set of commands here in case someone else with more knowledge of the issue would find these steps useful in debugging the bug in kinetic this exposed:

## start docker image
docker pull moveit/moveit:kinetic-ci
docker run -it moveit/moveit:kinetic-ci /bin/bash

## Inside docker container
apt-get update
apt-get dist-upgrade
apt-get install -y python-catkin-tools ssh git gdb valgrind vim

## set enviroment variables
export ROS_DISTRO=kinetic

## Craete ROS workspace
CATKIN_WS=/root/ros_ws
mkdir -p ${CATKIN_WS}/src
cd ${CATKIN_WS}/src

## checkout moveit and dependencies library into workspace
cd ${CATKIN_WS}/src
wstool init .
git clone https://github.com/PickNikRobotics/moveit.git -b p/tylerjw/pick_place_test
wstool merge -t . moveit/moveit.rosinstall
wstool update  # skip pulling moveit

## Install dependencies
cd ${CATKIN_WS}/src
rosdep install -y -q -n --from-paths . --ignore-src --rosdistro ${ROS_DISTRO}

## Config and build
cd $CATKIN_WS
export CXXFLAGS="-Wall -Wextra -Wwrite-strings -Wunreachable-code -Wpointer-arith -Wredundant-decls"
catkin config --extend /opt/ros/${ROS_DISTRO} --no-install --cmake-args -DCMAKE_BUILD_TYPE=Debug
catkin build --summarize

## build the tests
cd ${CATKIN_WS}
catkin build --summarize --make-args tests

## source the workspace
cd ${CATKIN_WS}
source devel/setup.bash

## run my test that is failing
rostest moveit_ros_planning_interface move_group_pick_place_test.test --text

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 1, 2020

More poking the bear I found that the move_group binary is crashing when I launch it. Here is the valgrind output. It looks to be something eigen related:

==2495== Invalid read of size 4
==2495==    at 0x45C378: __gnu_cxx::__atomic_add(int volatile*, int) (atomicity.h:53)
==2495==    by 0x45C437: __gnu_cxx::__atomic_add_dispatch(int*, int) (atomicity.h:96)
==2495==    by 0x4652B4: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy() (shared_ptr_base.h:134)
==2495==    by 0x461A7C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:676)
==2495==    by 0xB28C392: std::__shared_ptr<bodies::ConvexMesh::MeshData, (__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_ptr<bodies::ConvexMesh::MeshData, (__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:867)
==2495==    by 0xB28C3BC: std::shared_ptr<bodies::ConvexMesh::MeshData>::operator=(std::shared_ptr<bodies::ConvexMesh::MeshData> const&) (shared_ptr.h:93)
==2495==    by 0xB288E67: bodies::ConvexMesh::cloneAt(Eigen::Transform<double, 3, 1, 0> const&, double, double) const (bodies.cpp:1104)
==2495==    by 0x21E586AE: point_containment_filter::ShapeMask::addShape(std::shared_ptr<shapes::Shape const> const&, double, double) (in /opt/ros/kinetic/lib/libmoveit_point_containment_filter.so.0.9.18)
==2495==    by 0x219C51F9: occupancy_map_monitor::PointCloudOctomapUpdater::excludeShape(std::shared_ptr<shapes::Shape const> const&) (in /opt/ros/kinetic/lib/libmoveit_pointcloud_octomap_updater_core.so.0.9.18)
==2495==    by 0x99F21C4: occupancy_map_monitor::OccupancyMapMonitor::excludeShape(std::shared_ptr<shapes::Shape const> const&) (occupancy_map_monitor.cpp:237)
==2495==    by 0x5364E5E: planning_scene_monitor::PlanningSceneMonitor::excludeRobotLinksFromOctree() (planning_scene_monitor.cpp:692)
==2495==    by 0x53697A5: planning_scene_monitor::PlanningSceneMonitor::startWorldGeometryMonitor(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (planning_scene_monitor.cpp:1076)
==2495==  Address 0x3efdb90000000008 is not stack'd, malloc'd or (recently) free'd
==2495== 
==2495== 
==2495== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==2495==  General Protection Fault
==2495==    at 0x45C378: __gnu_cxx::__atomic_add(int volatile*, int) (atomicity.h:53)
==2495==    by 0x45C437: __gnu_cxx::__atomic_add_dispatch(int*, int) (atomicity.h:96)
==2495==    by 0x4652B4: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy() (shared_ptr_base.h:134)
==2495==    by 0x461A7C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:676)
==2495==    by 0xB28C392: std::__shared_ptr<bodies::ConvexMesh::MeshData, (__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_ptr<bodies::ConvexMesh::MeshData, (__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:867)
==2495==    by 0xB28C3BC: std::shared_ptr<bodies::ConvexMesh::MeshData>::operator=(std::shared_ptr<bodies::ConvexMesh::MeshData> const&) (shared_ptr.h:93)
==2495==    by 0xB288E67: bodies::ConvexMesh::cloneAt(Eigen::Transform<double, 3, 1, 0> const&, double, double) const (bodies.cpp:1104)
==2495==    by 0x21E586AE: point_containment_filter::ShapeMask::addShape(std::shared_ptr<shapes::Shape const> const&, double, double) (in /opt/ros/kinetic/lib/libmoveit_point_containment_filter.so.0.9.18)
==2495==    by 0x219C51F9: occupancy_map_monitor::PointCloudOctomapUpdater::excludeShape(std::shared_ptr<shapes::Shape const> const&) (in /opt/ros/kinetic/lib/libmoveit_pointcloud_octomap_updater_core.so.0.9.18)
==2495==    by 0x99F21C4: occupancy_map_monitor::OccupancyMapMonitor::excludeShape(std::shared_ptr<shapes::Shape const> const&) (occupancy_map_monitor.cpp:237)
==2495==    by 0x5364E5E: planning_scene_monitor::PlanningSceneMonitor::excludeRobotLinksFromOctree() (planning_scene_monitor.cpp:692)
==2495==    by 0x53697A5: planning_scene_monitor::PlanningSceneMonitor::startWorldGeometryMonitor(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (planning_scene_monitor.cpp:1076)

While it says that it dumped a core... even though I have core dumps turned on there is no core file produced. If you run it in gdb it crashes before it ever reaches main. This is something that is going bad while it is trying to load the executable. I'm going to try to build with debug symbols to see if I can get more detail.

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 1, 2020

I might be just complaining to a black hole... but this is the sort of bug you invite when you use dynamic memory everywhere (shared_ptr all the things). 😢

My best guess so far is we are copying a shared pointer that doesn't have a valid control block.

My best guess why this policy of using dynamic lifetime managed memory everywhere took off is the syntax is easy to use (easier than learning how to live without it in many cases) and problems like this are kind of rare (or worse, they show up years after you wrote the code for someone else to fix). To make these bugs worse we have really low code coverage. Because we've built this system where errors are pushed from compile time frustration into run-time bugs we need more complete and robust testing.

When you push your bugs into run-time without tests you invite "works on my machine" attitudes and sad users. Running move_group on kinetic feels like a really really basic use case, which has a run-time bug in it. sadness

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 1, 2020

This might be a good clue for this one:

==17790==    by 0x21DC51F9: occupancy_map_monitor::PointCloudOctomapUpdater::excludeShape(std::shared_ptr<shapes::Shape const> const&) (in /opt/ros/kinetic/lib/libmoveit_pointcloud_octomap_updater_core.so.0.9.18)
==17790==    by 0x99F21C4: occupancy_map_monitor::OccupancyMapMonitor::excludeShape(std::shared_ptr<shapes::Shape const> const&) (occupancy_map_monitor.cpp:237)

Why would the kinetic build be using a debian of something that originates from this package and not the source built one. That seems wrong, what could cause that?

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 1, 2020

When I try to add the dependency to moveit_ros_perception to moveit_ros_occupancy_map_monitor I introduce a circular dependency. Is there a good way to fix this problem so they can depend on eachother's source builds instead of having moveit_ros_occupancy_map_monitor depend on the debaian for moveit_ros_preception which is who-knows-how-old in kenetic.

The only way I can think to untangle this mess is to refactor the code so there isn't a circular dependency. Since I don't have any experience with either of these packages does someone else have a better idea?

Maybe the occupancy_map_monitor should just be part of moveit_ros_preception?

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

The _preception blacklist issue is no longer a problem because the failing test was fixed and therefore blacklist isn't needed. That being said there is an unrelated move_group crash on kinetic. I will report back if I find more information on this.

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

I found our smoking gun. The length of the array is 6 and the code is accessing the 6th element (one past the end of the array) of it (from current_state_monitor.cpp:450 with a ros warn added by me):

      std::vector<double> new_values(joint->getStateSpaceDimension());
      ROS_WARN_STREAM("new_values size: " << new_values.size());
      const moveit::core::LinkModel* link = joint->getChildLinkModel();
      if (link->jointOriginTransformIsIdentity())
        joint->computeVariablePositions(tf2::transformToEigen(transf), new_values.data());

This seems fine except computeVariablePositions takes a raw pointer to a double and has this code (floating_joint_model.cpp:223):

void FloatingJointModel::computeVariablePositions(const Eigen::Isometry3d& transf, double* joint_values) const
{
  joint_values[0] = transf.translation().x();
  joint_values[1] = transf.translation().y();
  joint_values[2] = transf.translation().z();
  Eigen::Quaterniond q(transf.rotation());
  joint_values[3] = q.x();
  joint_values[4] = q.y();
  joint_values[5] = q.z();
  joint_values[6] = q.w();
}

The logs I get before the segfault have the length of that array at 6.

[ WARN] [1588755613.247947347]: new_values size: 6

@rhaschke
Copy link
Copy Markdown
Contributor

rhaschke commented May 6, 2020

I think the culprit is that the FloatingJointModel reports the wrong state-space dimension (6 instead of 7):
https://github.com/ros-planning/moveit/blob/2bebf052a76b1dfe06eab3731bf7b460dd883563/moveit_core/robot_model/src/floating_joint_model.cpp#L193-L196

What is really weird, is that this bug was never experienced before... The above code line hasn't changed since the very beginnings of MoveIt...

We should run our CI tests with sanitizing enabled to spot such issues!
ASan is a much better sanitizer than valgrind by the way 😉

@v4hn
Copy link
Copy Markdown
Contributor

v4hn commented May 6, 2020

Why would the FloatingJoint have 7 DOF?
It represents an isometry.

Edit: .... I see, we store the full quaternion for rotation...

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

I don't know why it would be 7, if it should be 6 then we shouldn't write into the 7th position.

To me, the interface design of computeVariablePositions seems like we set ourselves up to fail. It has never been a good idea to accept a raw pointer to an array without a length and then go about writing into it.

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

How have we not discovered this before? Maybe in most builds, it just harmlessly corrupts some memory that doesn't matter?

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

If I change the magic number from 6 to 7... what other code depends on a 6 length type?

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

ASan is a much better sanitizer than valgrind by the way

I'll look into that... ty for the tip.

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

After searching through the code I now see that this is the only place it is being used to construct an array that is then passed into functions in the joint model. I made some minor changes to the floating joint model that should fix that bug... now onto the next one.

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

Here is a stack trace of the current issue here. It looks like it is something in fcl:

move_group: /var/lib/jenkins/workspace/fcl0.5-pkg_builder-master-generic/repo/src/math/transform.cpp:90: void fcl::Quaternion3f::toRotation(fcl::Matrix3f&) const: Assertion `.99 < data [0]*data [0] + data [1]*data [1] + data [2]*data [2] + data [3]*data [3]' failed.

Thread 16 "move_group" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffb2aaa700 (LWP 25431)]
0x00007ffff52fa428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff52fa428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff52fc02a in __GI_abort () at abort.c:89
#2  0x00007ffff52f2bd7 in __assert_fail_base (fmt=<optimized out>, 
    assertion=assertion@entry=0x7fffea0fd878 ".99 < data [0]*data [0] + data [1]*data [1] + data [2]*data [2] + data [3]*data [3]", 
    file=file@entry=0x7fffea0fd818 "/var/lib/jenkins/workspace/fcl0.5-pkg_builder-master-generic/repo/src/math/transform.cpp", line=line@entry=90, 
    function=function@entry=0x7fffea0fd940 "void fcl::Quaternion3f::toRotation(fcl::Matrix3f&) const")
    at assert.c:92
#3  0x00007ffff52f2c82 in __GI___assert_fail (
    assertion=0x7fffea0fd878 ".99 < data [0]*data [0] + data [1]*data [1] + data [2]*data [2] + data [3]*data [3]", 
    file=0x7fffea0fd818 "/var/lib/jenkins/workspace/fcl0.5-pkg_builder-master-generic/repo/src/math/transform.cpp", line=90, 
    function=0x7fffea0fd940 "void fcl::Quaternion3f::toRotation(fcl::Matrix3f&) const") at assert.c:101
#4  0x00007fffe9e7eb0e in fcl::Quaternion3f::toRotation(fcl::Matrix3fX<fcl::details::Matrix3Data<double> >&) const () from /usr/lib/x86_64-linux-gnu/libfcl.so.0.5
#5  0x00007fffe9e7fdd3 in fcl::Transform3f::getRotationInternal() const ()
   from /usr/lib/x86_64-linux-gnu/libfcl.so.0.5
#6  0x00007fffe9e5f290 in fcl::details::boxBoxIntersect(fcl::Box const&, fcl::Transform3f const&, fcl::Box const&, fcl::Transform3f const&, std::vector<fcl::ContactPoint, std::allocator<fcl::ContactPoint> >*) () from /usr/lib/x86_64-linux-gnu/libfcl.so.0.5
#7  0x00007fffe9d1e5f0 in fcl::ShapeCollisionTraversalNode<fcl::Box, fcl::Box, fcl::GJKSolver_libccd>::leafTesting(int, int) const () from /usr/lib/x86_64-linux-gnu/libfcl.so.0.5
#8  0x00007fffe9bd33de in unsigned long fcl::ShapeShapeCollide<fcl::Box, fcl::Box, fcl::GJKSolver_libccd>(fcl::CollisionGeometry const*, fcl::Transform3f const&, fcl::CollisionGeometry const*, fcl::Transform3f const&, fcl::GJKSolver_libccd const*, fcl::CollisionRequest const&, fcl::CollisionResult&) ()
   from /usr/lib/x86_64-linux-gnu/libfcl.so.0.5
#9  0x00007fffe9e7e2f0 in unsigned long fcl::collide<fcl::GJKSolver_libccd>(fcl::CollisionGeometry const*, fcl::Transform3f const&, fcl::CollisionGeometry const*, fcl::Transform3f const&, fcl::GJKSolver_libccd const*, fcl::CollisionRequest const&, fcl::CollisionResult&) ()
   from /usr/lib/x86_64-linux-gnu/libfcl.so.0.5
#10 0x00007fffe9e7df33 in fcl::collide(fcl::CollisionObject const*, fcl::CollisionObject const*, fcl::CollisionRequest const&, fcl::CollisionResult&) () from /usr/lib/x86_64-linux-gnu/libfcl.so.0.5
#11 0x00007fffed96ef15 in collision_detection::collisionCallback (o1=0x7fffb40048b0, 
    o2=0x7fffa801c630, data=0x7fffb2aa74f0)
    at /root/ros_ws/src/moveit/moveit_core/collision_detection_fcl/src/collision_common.cpp:306
#12 0x00007fffea0d53e3 in fcl::details::dynamic_AABB_tree::collisionRecurse(fcl::NodeBase<fcl::AABB>*, fcl::CollisionObject*, void*, bool (*)(fcl::CollisionObject*, fcl::CollisionObject*, void*)) ()
   from /usr/lib/x86_64-linux-gnu/libfcl.so.0.5
#13 0x00007fffea0d9be2 in fcl::DynamicAABBTreeCollisionManager::collide(fcl::CollisionObject*, void*, bool (*)(fcl::CollisionObject*, fcl::CollisionObject*, void*)) const ()
   from /usr/lib/x86_64-linux-gnu/libfcl.so.0.5
#14 0x00007fffed98d16c in collision_detection::CollisionEnvFCL::checkRobotCollisionHelper (
    this=0x7fffa8005830, req=..., res=..., state=..., acm=0x7fffa4002b00)
    at /root/ros_ws/src/moveit/moveit_core/collision_detection_fcl/src/collision_env_fcl.cpp:297
#15 0x00007fffed98cbdb in collision_detection::CollisionEnvFCL::checkRobotCollision (
    this=0x7fffa8005830, req=..., res=..., state=..., acm=...)
    at /root/ros_ws/src/moveit/moveit_core/collision_detection_fcl/src/collision_env_fcl.cpp:269
#16 0x00007ffff21b2c47 in planning_scene::PlanningScene::checkCollision (this=0x7fffa8012b90, req=..., 
    res=..., robot_state=..., acm=...)
    at /root/ros_ws/src/moveit/moveit_core/planning_scene/src/planning_scene.cpp:507
#17 0x00007fffb3ce050d in planning_scene::PlanningScene::checkCollision (this=0x7fffa8012b90, req=..., 
    res=..., robot_state=..., acm=...)
    at /root/ros_ws/src/moveit/moveit_core/planning_scene/include/moveit/planning_scene/planning_scene.h

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

The bug behaves the same with both the current version of the upstream panda_moveit_config and our released moveit_resources one in kinetic. I was hoping this was just a config error.

Maybe the bug is in the libfcl that is in kinetic as the moveit code is from master and works in melodic? Is there some way to get a newer version of libfcl for kinetic to test that theory?

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 6, 2020

I found this: flexible-collision-library/fcl#190 and I'm also getting the same error @davetcoleman calls out in the output before the crash. I don't know if it is related.

@tylerjw tylerjw force-pushed the p/tylerjw/pick_place_test branch from 5906ba9 to 9ef6346 Compare May 13, 2020 18:05
@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 13, 2020

I found a clue:

(gdb) print (*state.attached_body_map_.begin()->second)
$20 = {parent_link_model_ = 0x8a6910, id_ = "object", shapes_ = std::vector of length 1, capacity 1 = {
    std::shared_ptr (count 4, weak 2) 0x7fffcc004060}, 
  attach_trans_ = std::vector of length 1, capacity 1 = {{
      m_matrix = {<Eigen::PlainObjectBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = {<Eigen::MatrixBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = {<Eigen::DenseBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = {<Eigen::internal::special_scalar_op_base<Eigen::Matrix<double, 4, 4, 0, 4, 4>, double, double, Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 3>, false>> = {<Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 3>> = {<Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 1>> = {<Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 0>> = {<Eigen::EigenBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, m_storage = {m_data = {array = {
                -nan(0x8000000000000), -nan(0x8000000000000), -nan(0x8000000000000), 0, 
                -nan(0x8000000000000), -nan(0x8000000000000), -nan(0x8000000000000), 0, 
                -nan(0x8000000000000), -nan(0x8000000000000), -nan(0x8000000000000), 0, 
                0.00044243504144125811, 0.00035635795687904981, 0.084564638127313407, 
                1}}}}, <No data fields>}}}, touch_links_ = std::set with 3 elements = {
    [0] = "panda_hand", [1] = "panda_leftfinger", [2] = "panda_rightfinger"}, detach_posture_ = {
    header = {seq = 0, stamp = {<ros::TimeBase<ros::Time, ros::Duration>> = {sec = 0, 
          nsec = 0}, <No data fields>}, frame_id = ""}, 
    joint_names = std::vector of length 0, capacity 0, points = std::vector of length 0, capacity 0}, 
  global_collision_body_transforms_ = std::vector of length 1, capacity 1 = {{
      m_matrix = {<Eigen::PlainObjectBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = {<Eigen::MatrixBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = {<Eigen::DenseBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = {<Eigen::internal::special_scalar_op_base<Eigen::Matrix<double, 4, 4, 0, 4, 4>, double, double, Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 3>, false>> = {<Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 3>> = {<Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 1>> = {<Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 0>> = {<Eigen::EigenBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, m_storage = {m_data = {array = {
                -nan(0x8000000000000), -nan(0x8000000000000), -nan(0x8000000000000), 0, 
                -nan(0x8000000000000), -nan(0x8000000000000), -nan(0x8000000000000), 0, 
                -nan(0x8000000000000), -nan(0x8000000000000), -nan(0x8000000000000), 0, 
                0.49999990181087917, 3.0970443763039292e-07, 0.51923044559676901, 
                1}}}}, <No data fields>}}}, subframe_poses_ = std::map with 0 elements, 
  global_subframe_poses_ = std::map with 0 elements}

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 13, 2020

When this pose input:

{position = {x = 0.5, y = 0, z = 0.5}, orientation = {x = 0, y = 0, z = 0, w = 0}}

is sent into PlanningScene::poseMsgToEigen we get this output:

{
  m_matrix = 
  {<Eigen::PlainObjectBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = 
    {<Eigen::MatrixBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = 
    {<Eigen::DenseBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = 
    {<Eigen::internal::special_scalar_op_base<Eigen::Matrix<double, 4, 4, 0, 4, 4>, double, double, Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 3>, false>> = 
    {<Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 3>> = 
    {<Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 1>> = 
    {<Eigen::DenseCoeffsBase<Eigen::Matrix<double, 4, 4, 0, 4, 4>, 0>> = 
    {<Eigen::EigenBase<Eigen::Matrix<double, 4, 4, 0, 4, 4> >> = 
    {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, m_storage = 
  {m_data = 
    {array = 
    {-nan(0x8000000000000), 
            -nan(0x8000000000000), -nan(0x8000000000000), 0, -nan(0x8000000000000), 
            -nan(0x8000000000000), -nan(0x8000000000000), 0, -nan(0x8000000000000), 
            -nan(0x8000000000000), -nan(0x8000000000000), 0, 0.5, 0, 0.5, 1}}}}, <No data fields>}
}

I think all those -nan values look suspect to me.

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 13, 2020

for more context this happens when you send this to planning_scene::PlanningScene::processCollisionObjectAdd

$12 = (const moveit_msgs::CollisionObject &) @0x7fffcc0016a0: {header = {seq = 0, 
    stamp = {<ros::TimeBase<ros::Time, ros::Duration>> = {sec = 0, nsec = 0}, <No data fields>}, 
    frame_id = "panda_link0"}, pose = {position = {x = 0, y = 0, z = 0}, orientation = {x = 0, y = 0, 
      z = 0, w = 0}}, id = "object", type = {key = "", db = ""}, 
  primitives = std::vector of length 1, capacity 1 = {{type = 1 '\001', 
      dimensions = std::vector of length 3, capacity 3 = {0.02, 0.02, 0.20000000000000001}}}, 
  primitive_poses = std::vector of length 1, capacity 1 = {{position = {x = 0.5, y = 0, z = 0.5}, 
      orientation = {x = 0, y = 0, z = 0, w = 0}}}, meshes = std::vector of length 0, capacity 0, 
  mesh_poses = std::vector of length 0, capacity 0, planes = std::vector of length 0, capacity 0, 
  plane_poses = std::vector of length 0, capacity 0, 
  subframe_names = std::vector of length 0, capacity 0, 
  subframe_poses = std::vector of length 0, capacity 0, operation = 0 '\000'}

What does it mean to normalize a quaternion or {0,0,0,0}? Should I instead be looking to see why such a quaternion was created in the first place or is there something suspect here?

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 13, 2020

For more context, here is the code from this test that is triggering that:

  /* Define the primitive and its dimensions. */
  collision_objects[2].primitives.resize(1);
  collision_objects[2].primitives[0].type = collision_objects[1].primitives[0].BOX;
  collision_objects[2].primitives[0].dimensions.resize(3);
  collision_objects[2].primitives[0].dimensions[0] = 0.02;
  collision_objects[2].primitives[0].dimensions[1] = 0.02;
  collision_objects[2].primitives[0].dimensions[2] = 0.2;

  /* Define the pose of the object. */
  collision_objects[2].primitive_poses.resize(1);
  collision_objects[2].primitive_poses[0].position.x = 0.5;
  collision_objects[2].primitive_poses[0].position.y = 0;
  collision_objects[2].primitive_poses[0].position.z = 0.5;

  collision_objects[2].operation = collision_objects[2].ADD;

  planning_scene_interface_.applyCollisionObjects(collision_objects);

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 13, 2020

To be clear, where it is barfing inside of fcl is when doing some operation on this object. That is why I suspect it.

@tylerjw tylerjw changed the title WIP: [test] move_group pick place test [test] move_group pick place test May 14, 2020
@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 14, 2020

I believe this is ready as it passes Travis. there is a problem with the moveit_tutorials that I copied from I discovered the hard way. I'll make a PR to fix that. A default constructed quaternion message is invalid and can cause FCL on Kinetic to fail during collision checking.

@rhaschke

@codecov-io
Copy link
Copy Markdown

codecov-io commented May 14, 2020

Codecov Report

Merging #2031 into master will increase coverage by 3.67%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2031      +/-   ##
==========================================
+ Coverage   54.11%   57.78%   +3.67%     
==========================================
  Files         328      328              
  Lines       25658    25660       +2     
==========================================
+ Hits        13884    14828     +944     
+ Misses      11774    10832     -942     
Impacted Files Coverage Δ
...veit_core/robot_model/src/floating_joint_model.cpp 48.35% <100.00%> (ø)
...meterization/work_space/pose_model_state_space.cpp 82.31% <0.00%> (-0.69%) ⬇️
...pl/ompl_interface/src/planning_context_manager.cpp 70.94% <0.00%> (-0.17%) ⬇️
...raint_samplers/src/default_constraint_samplers.cpp 82.90% <0.00%> (+0.36%) ⬆️
...nning_scene_monitor/src/planning_scene_monitor.cpp 71.70% <0.00%> (+0.74%) ⬆️
moveit_core/robot_model/src/robot_model.cpp 78.36% <0.00%> (+0.78%) ⬆️
...ution_manager/src/trajectory_execution_manager.cpp 47.88% <0.00%> (+1.02%) ⬆️
...mpl_interface/src/model_based_planning_context.cpp 59.62% <0.00%> (+1.06%) ⬆️
...e/collision_detection_fcl/src/collision_common.cpp 79.38% <0.00%> (+1.23%) ⬆️
moveit_core/planning_scene/src/planning_scene.cpp 59.37% <0.00%> (+1.68%) ⬆️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50e3c21...d3bd523. Read the comment docs.

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 14, 2020

In case anyone finds this in the future because you are seeing a ton of these errors and then a crash in FCL: too many iterations in Jacobi transform Make sure you are not adding any objects to the planning scene with a default constructed rotation in the pose. {0, 0, 0, 0} is not a valid pose and when it tries to normalize that quaternion you get a bunch of nan in the resulting transform. FCL barfs on this when it is trying to do collision checking and you get the crash.

@v4hn
Copy link
Copy Markdown
Contributor

v4hn commented May 14, 2020 via email

@felixvd
Copy link
Copy Markdown
Contributor

felixvd commented May 14, 2020

Yeah, it is a super old problem, and evidently everyone makes the mistake once in a while. I filed the PR already, it just didn't appear in the email thread you responded to: #2089

@v4hn
Copy link
Copy Markdown
Contributor

v4hn commented May 18, 2020

@tylerjw could you please cleanup the commit history, I would like to keep separate commits but you added fixups.

@tylerjw tylerjw force-pushed the p/tylerjw/pick_place_test branch from d25bf5b to d3bd523 Compare May 18, 2020 20:34
@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 18, 2020

@v4hn I rebased and separated the two logical changes in this PR into separate commits, the test, and the bug fix. lmk if that is what you wanted. I originally assumed this would be squashed, but I could see how separating these two things would make for back-porting them simpler if they were separate commits.

Situations like this make me really miss working on a project with gerrit. Maybe sometime github will support a gerrit like workflow.

@v4hn
Copy link
Copy Markdown
Contributor

v4hn commented May 18, 2020

Situations like this make me really miss working on a project with gerrit. Maybe sometime github will support a gerrit like workflow.

What exactly are you missing?

separated the two logical changes in this PR into separate commits, the test, and the bug fix. lmk if that is what you wanted. I originally assumed this would be squashed, but I could see how separating these two things would make for back-porting them simpler

True, although I simply want to keep commits with new test cases and actual bug fixes apart, also to reference them in years to come.
Thanks, good to go after CI finished.

@tylerjw
Copy link
Copy Markdown
Member Author

tylerjw commented May 19, 2020

What exactly are you missing?

Gerrit made it seamless to keep your changes in a clean history. Each PR can only be one commit that can be cleanly cherry-picked and when you push updates you force push. However, the force push doesn't destroy the history of the PR as gerrit keeps a history of all your changes so it can be easily code reviewed and there is no need for fixup commits to make the review process easy.

The one commit per PR also makes it so that when you have something like a bug fix and a new feature that depends on that bug fix you stack the PRs putting the bug fix first and they are reviewed separately. GitHub does not provide a nice interface for stacked PRs.

With github you have to first do fixup commits until you are through code review so that the reviewers can see the new changes (force push destroys the history on github), and then once the reviewers are happy you can rebase and clean up the history of the commits.

@henningkayser henningkayser merged commit a5864cc into moveit:master May 19, 2020
@v4hn
Copy link
Copy Markdown
Contributor

v4hn commented May 19, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants