-
Notifications
You must be signed in to change notification settings - Fork 874
Description
As detailed in #656 (comment) and eProsima/Fast-DDS#359 , there are some serious RTPS discovery issues with the version of Fast-RTPS released in Crystal (1.7.0). This bug is a tracking bug for investigations into backporting those fixes into Crystal for all users.
Problem Statement
When using Fast-RTPS with ROS 2, it can sometimes take quite a while for nodes to discover each other, particularly when there are many topics or the machine is under stress. In testing, I've sometimes seen it take upwards of 30 seconds to do discovery. There also seem to be some memory leaks that may need addressing in certain situations.
Investigation
Several bugs in Fast-RTPS have been discovered that contribute to this problem:
- Discovery takes too long, ROS2 w/ Fast-RTPS: Maximum number of subscriptions to a topic [4190] eProsima/Fast-DDS#359 - Fixed by Improved discovery time [4661] eProsima/Fast-DDS#411 - breaks ABI
- Race condition, FastRTPS drops messages under stress rmw_fastrtps#258 - Fixed by richiware/rmw_fastrtps@b262caa , no PR yet - breaks ABI, may be possible to make it not
- Memory leak, Strange RAM usage when creating subscriptions rmw_fastrtps#257 - Fixed by Made Subscriber history resource limits dependent on history depth [4782] eProsima/Fast-DDS#434 - Probably does not break ABI
For some context, I'm currently working on a moderately complex project (16 nodes, 28 topics publishing at rates between 100Hz and 5Hz). When running on Crystal patch release 3, it can take up to 30 seconds after launch for all of the nodes to discover each other and start actually talking. With the current state of Dashing (which includes the fixes above, plus one applied by hand), all of the nodes discover each other within ~2 seconds of launch.
Potential Solutions
- Take Fast-RTPS 1.7.2 into Crystal Patch release 4
- Get the fix for FastRTPS drops messages under stress rmw_fastrtps#258 into master, then backport to crystal
- Release Fast-RTPS 1.7.2 into Crystal Patch release 4
- Pros
- Uses the tested release from eProsima
- Least amount of backporting work
- Cons
- Causes an ABI break
- Backport selected fixes from Fast-RTPS into Crystal Patch release 4
- Get the fix for FastRTPS drops messages under stress rmw_fastrtps#258 into master, then backport to crystal
- Cherry-pick specific fixes from Fast-RTPS 1.7.2 into our copy of Fast-RTPS 1.7.0
- Pros
- Avoids ABI break
- Gets fixes out to users
- Cons
- Uses an unsupported configuration from upstream
- Not entirely clear (yet) that the backports can be ABI compatible
- Large amount of changes 1.7.0 -> 1.7.2 (
175 files changed, 14064 insertions(+), 10397 deletions(-))