Bug report
Required Info:
- Operating System:
- Shown using official ros:humble docker image
- Installation type:
- Official ros:humble docker image
- Version or commit hash:
- ros-humble-cyclonedds/jammy,now 0.10.3-1jammy.20230822.172333 amd64 [installed,automatic]
- ros-humble-rmw-cyclonedds-cpp/jammy,now 1.3.4-1jammy.20230919.205940 amd64 [installed]
- ros-humble-iceoryx-binding-c/jammy,now 2.0.3-1jammy.20230623.023950 amd64 [installed,automatic]
- ros-humble-iceoryx-hoofs/jammy,now 2.0.3-1jammy.20230623.020506 amd64 [installed,automatic]
- ros-humble-iceoryx-posh/jammy,now 2.0.3-1jammy.20230623.021542 amd64 [installed,automatic]
- DDS implementation:
- Client library (if applicable):
Steps to reproduce issue
I prepared separate repo with two very simple applications (publisher/subscriber) to show the issue. https://github.com/ksuszka/cyclonedds_iceoryx_memory_leak/tree/chunks-leak
Using this repo:
Build docker image:
docker build -f Dockerfile -t cyclone-leak-test .
Open four terminal windows.
In the first terminal window run:
docker run -it --shm-size 1GB --rm --name cyclone-leak-test cyclone-leak-test iox-roudi
In the second terminal window run:
docker exec -it cyclone-leak-test bash -c ". /ws/install/setup.bash && ros2 run test_listener test_listener"
In the third terminal window run:
docker exec -it cyclone-leak-test bash -c ". /ws/install/setup.bash && while true; do ros2 run test_publisher test_publisher; done"
In the fourth terminal window again run:
docker exec -it cyclone-leak-test bash -c ". /ws/install/setup.bash && while true; do ros2 run test_publisher test_publisher; done"
And wait a minute.
After some time you will most likely start to get errors:
1696239369.966511 [0] test_liste: DDS reader with topic rt/topic : iceoryx subscriber - TOO_MANY_CHUNKS_HELD_IN_PARALLEL -could not take sample
Expected behavior
Messages which cannot be processed due to a slow subscriber are dropped silently.
Actual behavior
Messages which cannot be processed due to a slow subscriber are dropped silently for a few seconds and then Iceoryx errors start to appear.
Additional information
In this example a really slow subscriber has a QoS with a history depth slightly smaller (250) than the maximum history depth available in the precompiled ros-humble-iceoryx-* package (256). When subscriber's queue is filled up it should stay at a constant size and it is mostly the case if there is a single very fast publisher. If there are multiple parallel publishers is starts to slowly leak, what can be observed with the iox-introspection-client (if it is compiled separately).
The issue is easily reproducible if publishers abort execution abruptly (this method is used in the example repository), however AFAIK it is not a requirement for the issue to occur. We noticed the issue with leaking chunks in our system which has a few dozen of nodes and then we tried to find an easily reproducible, simple case.
For more background, we found this issue due to another possible bug: ros2/geometry2#630. That bug makes tf_buffer a really slow reader of the /parameter_events topic. This topic has QoS history of 1000 so it cannot be even handled at the moment with the default Iceoryx limits. We recompiled Iceoryx with history depth of 4096 and the system seemed to work fine for a few hours, but after a few hours we started to get errors that too many chunks were held in parallel on topic /parameter_events which didn't make sense. But then we observed with the iox-introspection-client that if you start some simple random node with the default parameters handling, and next you start to spawn and close in parallel other, unrelated nodes that broadcast their parameters, the number of memory chunks held by the first node slowly and randomly increases over time.
Bug report
Required Info:
Steps to reproduce issue
I prepared separate repo with two very simple applications (publisher/subscriber) to show the issue. https://github.com/ksuszka/cyclonedds_iceoryx_memory_leak/tree/chunks-leak
Using this repo:
Build docker image:
Open four terminal windows.
In the first terminal window run:
In the second terminal window run:
In the third terminal window run:
In the fourth terminal window again run:
And wait a minute.
After some time you will most likely start to get errors:
Expected behavior
Messages which cannot be processed due to a slow subscriber are dropped silently.
Actual behavior
Messages which cannot be processed due to a slow subscriber are dropped silently for a few seconds and then Iceoryx errors start to appear.
Additional information
In this example a really slow subscriber has a QoS with a history depth slightly smaller (250) than the maximum history depth available in the precompiled ros-humble-iceoryx-* package (256). When subscriber's queue is filled up it should stay at a constant size and it is mostly the case if there is a single very fast publisher. If there are multiple parallel publishers is starts to slowly leak, what can be observed with the iox-introspection-client (if it is compiled separately).
The issue is easily reproducible if publishers abort execution abruptly (this method is used in the example repository), however AFAIK it is not a requirement for the issue to occur. We noticed the issue with leaking chunks in our system which has a few dozen of nodes and then we tried to find an easily reproducible, simple case.
For more background, we found this issue due to another possible bug: ros2/geometry2#630. That bug makes tf_buffer a really slow reader of the /parameter_events topic. This topic has QoS history of 1000 so it cannot be even handled at the moment with the default Iceoryx limits. We recompiled Iceoryx with history depth of 4096 and the system seemed to work fine for a few hours, but after a few hours we started to get errors that too many chunks were held in parallel on topic /parameter_events which didn't make sense. But then we observed with the iox-introspection-client that if you start some simple random node with the default parameters handling, and next you start to spawn and close in parallel other, unrelated nodes that broadcast their parameters, the number of memory chunks held by the first node slowly and randomly increases over time.