Skip to content

Possible memory leak in Cyclone/Iceoryx subscriber history queue #471

@ksuszka

Description

@ksuszka

Bug report

Required Info:

  • Operating System:
    • Shown using official ros:humble docker image
  • Installation type:
    • Official ros:humble docker image
  • Version or commit hash:
    • ros-humble-cyclonedds/jammy,now 0.10.3-1jammy.20230822.172333 amd64 [installed,automatic]
    • ros-humble-rmw-cyclonedds-cpp/jammy,now 1.3.4-1jammy.20230919.205940 amd64 [installed]
    • ros-humble-iceoryx-binding-c/jammy,now 2.0.3-1jammy.20230623.023950 amd64 [installed,automatic]
    • ros-humble-iceoryx-hoofs/jammy,now 2.0.3-1jammy.20230623.020506 amd64 [installed,automatic]
    • ros-humble-iceoryx-posh/jammy,now 2.0.3-1jammy.20230623.021542 amd64 [installed,automatic]
  • DDS implementation:
    • rmw_cyclonedds_cpp
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

I prepared separate repo with two very simple applications (publisher/subscriber) to show the issue. https://github.com/ksuszka/cyclonedds_iceoryx_memory_leak/tree/chunks-leak

Using this repo:
Build docker image:

docker build -f Dockerfile -t cyclone-leak-test .

Open four terminal windows.
In the first terminal window run:

docker run -it --shm-size 1GB --rm --name cyclone-leak-test cyclone-leak-test iox-roudi

In the second terminal window run:

docker exec -it cyclone-leak-test bash -c ". /ws/install/setup.bash && ros2 run test_listener test_listener"

In the third terminal window run:

docker exec -it cyclone-leak-test bash -c ". /ws/install/setup.bash && while true; do ros2 run test_publisher test_publisher; done"

In the fourth terminal window again run:

docker exec -it cyclone-leak-test bash -c ". /ws/install/setup.bash && while true; do ros2 run test_publisher test_publisher; done"

And wait a minute.

After some time you will most likely start to get errors:

1696239369.966511 [0] test_liste: DDS reader with topic rt/topic : iceoryx subscriber - TOO_MANY_CHUNKS_HELD_IN_PARALLEL -could not take sample

Expected behavior

Messages which cannot be processed due to a slow subscriber are dropped silently.

Actual behavior

Messages which cannot be processed due to a slow subscriber are dropped silently for a few seconds and then Iceoryx errors start to appear.

Additional information

In this example a really slow subscriber has a QoS with a history depth slightly smaller (250) than the maximum history depth available in the precompiled ros-humble-iceoryx-* package (256). When subscriber's queue is filled up it should stay at a constant size and it is mostly the case if there is a single very fast publisher. If there are multiple parallel publishers is starts to slowly leak, what can be observed with the iox-introspection-client (if it is compiled separately).

The issue is easily reproducible if publishers abort execution abruptly (this method is used in the example repository), however AFAIK it is not a requirement for the issue to occur. We noticed the issue with leaking chunks in our system which has a few dozen of nodes and then we tried to find an easily reproducible, simple case.

For more background, we found this issue due to another possible bug: ros2/geometry2#630. That bug makes tf_buffer a really slow reader of the /parameter_events topic. This topic has QoS history of 1000 so it cannot be even handled at the moment with the default Iceoryx limits. We recompiled Iceoryx with history depth of 4096 and the system seemed to work fine for a few hours, but after a few hours we started to get errors that too many chunks were held in parallel on topic /parameter_events which didn't make sense. But then we observed with the iox-introspection-client that if you start some simple random node with the default parameters handling, and next you start to spawn and close in parallel other, unrelated nodes that broadcast their parameters, the number of memory chunks held by the first node slowly and randomly increases over time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions