Ipc proposal by irobot-ros · Pull Request #1 · irobot-ros/ros2-performance

irobot-ros · 2019-05-31T17:40:41Z

New IPC proposal

added support for unique ptr and messageT queues in design doc

wjwwood

I'm not quite done reviewing this, but I wanted to post what I have now. I'll keep going and hopefully finish up tomorrow some time.

proposals/intraprocess_communication.md

wjwwood · 2019-06-05T23:38:45Z

proposals/intraprocess_communication.md

+RMW implementation.
+
+At the moment, all the ROS2 supported middlewares (Fast-RTPS, OpenSplice, Connext, DPS) do not make
+distinctions whether the `Publisher` and the `Subscription` are in the same process or not. In consequence,


At the moment, all the ROS2 supported middlewares (Fast-RTPS, OpenSplice, Connext, DPS) do not make
distinctions whether the Publisher and the Subscription are in the same process or not.

I'm not sure that's correct either. Needs citation at the very least.

For now, I removed the sentence.

Looking at Fast-RTPS and DPS (and their RMW) I'm pretty sure that's correct.
It's also supported from experimental evaluations where having a publisher and a subscription in the same process or in two processes of the same machine does not affect the performances.

For what concerns OpenSplice, that statement is not correct, as this middleware uses the same ports for all the entities in the same process. This results in an almost instantaneous discovery, but I haven't seen any changes in latency for example.

No idea about Connext.

Either way, they could do that if they wanted. I think it's worth separating whether or not they could make optimizations versus whether or not they do. Not being able to make these optimizations is a strong case for improving our intra-process system, but in the current situation, a valid question is "why not improve the rmw implementations using the information they have first"? I'm not saying we should do that, it's just a weak justification for adding more stuff into our intra-process pipeline instead, whereas if it were not possible for them to do those optimizations, that would be a strong reason to do it in rclcpp.

For example, a strong reason to do it in rclcpp (in my opinion) is that when we add type masquerading (using OpenCV Image or PCL point cloud instead of the ROS message type) it will be easier to support intra-process with those types if it is in rclcpp, though that doesn't imply that we use an envelope message or have our own queueing system.

"Optimization" is not a binary operation though. There's no doubt that DDS vendors could do some optimizations and improve the efficiency of intra-process communication but there's only so much that can be done at the rmw layer. Unless some APIs are re-designed, there would always be the overhead of some additional copies. This overhead can be avoided entirely by doing this optimization in rclcpp directly without depending on the DDS vendor. IMO that is a strong enough reason since it guarantees a minimum level of performance irrespective of the DDS library.

I agree, but my point is that it's not a strong supporting reason to put more complexity into rclcpp rather than try to improve the existing design. Since we haven't even tried to improve performance within the rmw layer. If performance (latency and cpu/memory usage) were improved for small messages within the same process at the rmw layer, then the drawbacks of using a "meta-message" decrease significantly, even if they will perhaps never be as good as a queue in rclcpp itself.

It's a bit of a moot point since the work is essentially already done, but adding things to the rclcpp layer adds complexity to a part of the system that is not optional (or at least not without removing code manually -- they don't have to use intra process after all).

I'm thinking towards certification efforts where they might have a much simpler or already certified rmw/middleware layer below.

To be clear, I think the approach described here is the best option, I'm just trying to make sure it's properly motivated as it does cost something now and in the future (in terms of maintenance). For example, each time we expose a QoS in ROS 2 we'll have to update the intra process system, and I think the proposed system will be more fragile/complex to update than the meta-message approach we have now.

I agree that the proposed system will require an effort every time a new QoS is added.
However, I don't think that the current one handles this in a better way.

The RMW delivers the meta-messages enforcing the QoS, however if the IntraProcessManager "thinks" that a message has to delivered to N subscription, it will wait for all of them before releasing the memory in the ring buffer. This could be a potential source of problems.

Moreover, even if this proposal adds complexity to the rclcpp layer, I think that it's important that the intra-process communication is implemented in a way that is agnostic to the RMW.
Implementing it in the RMW will potentially lead to have different versions of intra-process communication for each RMW implementation...

proposals/intraprocess_communication.md

wjwwood · 2019-06-05T23:53:02Z

proposals/intraprocess_communication.md

+RMW which does not have a notion of inter and intra-process communication. Consequently, the real message will
+be delivered to both `Subscription`s present in the system. The `Subscription` that is in the same process as
+the `Publisher` will actually discard the message, but it will be able to do that only after receiving and
+deserializing it.


There is a potential way to fix this, e.g.:

Search for ignore_participant or ignore_publication or ignore_subscription in the DDS Standard:

https://www.omg.org/spec/DDS/1.4/

Note that we didn't use this option because at the time, OpenSplice didn't support it (years ago), and I don't know if Fast-RTPS supports it or not.

But this might be affected by our decision to change the relationship between nodes and participants (again trying to save resources).

This is the missing piece that prevents me from recommending we enable intra-process by default, by the way.

proposals/intraprocess_communication.md

…tion

clalancette

I had a few comments inline.

proposals/intraprocess_communication.md

wjwwood

The approach sounds reasonable to me, though I would add to the open issue that we're not reusing resources as much as we could because we create a ring buffer or FIFO queue for each subscription, but in reality we could have them share a single buffer in many cases. The overhead isn't extreme, as these ring buffers/queues only store pointers, but still in cases where there are many subscriptions in the same process (e.g. a process with hundreds of nodes that all subscribe to /tf) it could be a really good optimization.

I'll have another look later, but that's my first pass review. Sorry it took so long.

wjwwood · 2019-06-12T00:03:42Z

proposals/intraprocess_communication.md

+However, it does not know whether the entities have intra-process communication enabled or not.
+Consequently, the real message will be delivered to both `Subscription`s present in the system. The
+`Subscription` that is in the same process as the `Publisher` will actually discard the message, but it will
+be able to do that only after receiving and deserializing it.


I still think it's worth pointing out that it should be possible to avoid this problem in the future, as I discussed here:

#1 (comment)

Otherwise it seems like an insurmountable technical issue (which I don't think is accurate).

I updated this section explaining that this problem are only present in the current implementation and pointing to possible solutions.

proposals/intraprocess_communication.md

wjwwood · 2019-06-12T00:13:03Z

proposals/intraprocess_communication.md

+`shared_ptr<constMessageT>` and `unique_ptr<MessageT>` that better fits with its callback type. This is
+deduced looking at the output of `AnySubscriptionCallback::use_take_shared_method()`.
+
+If the history QoS is set to `keep all`, the buffers are dynamically allocated. On the other hand, if the


This isn't how KEEP_ALL works in DDS, btw. There's another QoS that we're not currently exposing (but we should) which is max_samples, max_instances, and max_samples_per_instance, which is the actual limit for a KEEP_ALL since it doesn't have a depth. See:

https://community.rti.com/static/documentation/connext-dds/5.3.0/doc/api/connext_dds/api_cpp/structDDS__ResourceLimitsQosPolicy.html

Mmm could you clarify the difference between sample and instance?

See my last comment #1 (comment), I decoupled all the different components and in this way it will be much simpler to support different options for the length of the queue

Here's what the documentation for Connext [1] says :

Keep all the samples.

On the publishing side, RTI Connext will attempt to keep all samples (representing each value written) of each instance of data (identified by its key) managed by the DDSDataWriter until they can be delivered to all subscribers.

On the subscribing side, RTI Connext will attempt to keep all samples of each instance of data (identified by its key) managed by the DDSDataReader. These samples are kept until the application takes them from RTI Connext via the take() operation.

@wjwwood seems to be suggesting that there's another limit above which the size of the history does not grow. That's reasonable I think.

In this intra-process proposal, the messages are delivered directly to the subscriber's buffer so publishing side does not need to do anything different. On the sub side, we would have to allow dynamic resizing of the buffer (i.e. standard queue instead of ring buffer) if this QoS is enabled.

Do you agree @wjwwood?

[1] https://community.rti.com/static/documentation/connext-dds/5.3.0/doc/api/connext_dds/api_cpp/group__DDSHistoryQosModule.html#gga96c54dadb2beb87b23776e766e282185a9e526d876dae232aa083125c72135f5b

proposals/intraprocess_communication.md

wjwwood · 2019-06-12T01:02:32Z

proposals/intraprocess_communication.md

+   intra-process `Subscription`s when it is created, but, eventually, a `transient local` `Subscription` in a
+   different process joins. Initially, published messages are not passed to the middleware, since all the
+   `Subscription`s are in the same process. This means that the middleware is not able to store old messages
+   for eventual late-joiners.


Yeah, this one is a kind of a deal-breaker. We just have to publish to inter-process anytime this QoS is used. There's no choice.

In addition to intra-process buffers, we can unconditionally forward the messages to rmw layer if transient_local QoS is used. That will address this issue since rmw layer should take care of sending these messages to later joiners from a different process.

wjwwood · 2019-06-12T01:03:40Z

proposals/intraprocess_communication.md

+ - The proposal does not take into account the problem of having a queue with twice the size when both inter
+   and intra-process communication are used. A node with a history depth of 10 will be able to store up to
+   20 messages without processing them (10 intra-process and 10 inter-process). This issue
+   is also present in the current implementation, since each `Subscription` is doubled.


I don't think this can be addressed though right? Also what does "A node with a history depth of 10" mean? Shouldn't you say publisher or subscription?

Yes, I meant subscription or publisher.

I don't have any nice idea for addressing this without entangling RMW and Intra-Process Manager

…ignore publications

alsora · 2019-06-12T15:41:28Z

I made a big update of the intra-process implementation in order to fix the type-erasing and to better prepare for the future.

Briefly this is how I implemented it now.

Intra-process communication involves potentially different data-types

MessageT: the "basic" data-type, i.e. something like std_msgs::msg::int
PublicationT: the data-type published by the Publisher
BufferT: the data-type stored in a buffer.
CallbackT: the data-type used in the Subcription callback.

By default BufferT = CallbackT.
I implemented a SubscriptionOption that can be used to manually specify the BufferT.
This can be used for example to set the BufferT = MessageT that can be used for very small messages.

I defined a hierarchy of classes for the IntraProcessBuffer

IntraProcessBufferBase
IntraProcessBuffer<MessageT> inherits from IntraProcessBufferBase
TypedIntraProcessBuffer<MessageT, BufferT> inherits from IntraProcessBuffer<MessageT>

Once the TypedBuffer is created, the classes that have to use it will only access a Buffer<MessageT> instance, thus being agnostic with respect to the type that is actually stored in the buffer.
There are methods for pushing or popping any possible variant of MessageT.

The IntraProcessBuffer contains a buffer implementation that can be any object implementing this minimal interface.

template<typename BufferT>
class BufferImplementationBase
{
public:
  virtual void dequeue(BufferT & request) = 0;
  virtual void enqueue(BufferT request) = 0;

  virtual void clear() = 0;
  virtual bool has_data() const = 0;
};

This potentially allows to use different buffer implementations (i.e. simple queues, ring buffers, wait-free implementations or whatever).
This is a good place where to put the logic related to the depth of the queue for example and maybe also for other QoS.

dirk-thomas · 2019-06-12T17:29:13Z

proposals/intraprocess_communication.md

+show that creating a `Publisher` or a `Subscription` has a non-negligible memory cost.
+
+Moreover, when using the default RMW implementation, Fast-RTPS, the memory consumed by a ROS2 application
+increases almost exponentially with the number of `Publisher` and `Subscriptions`.


I think this argument alone is not a good rational for avoiding to use publishers and subscribers. ROS is a publish / subscribe middleware - instead of avoid to use publishers and subscribers the memory consumption should be reduced where possible.

No disagreement there. It's more of a practical consideration (than philosophical). I hope you agree that exponential increase in memory with the number of pubs (or subs) is definitely something that can't be ignored. And every core design decision that increases this number makes ROS2 less and less friendly for a resource constrained system (e.g. 32-bit arm linux with 100s of MB of RAM).

This statement (and others about rmw) are simply mentioned as our motivation for creating a new intra-process design.

dirk-thomas · 2019-06-12T17:42:49Z

proposals/intraprocess_communication.md

+
+In order to avoid the issue previously described, i.e., that a `Subscription` would receive both inter and
+intra-process messages, it is necessary to inform the underlying middleware that some "connections" will be
+handled at the `rclcpp` level, enabling the RMW to neglect them.


This functionality could be implemented independently of the separate subscriber-specific buffers to address the problem described in scenario 1. I think the proposal should separate this into two independent changes.

dirk-thomas · 2019-06-12T17:52:36Z

proposals/intraprocess_communication.md

+| unique_ptr\<MessageT\> @1        | <ul><li>shared_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li></ul>  |    <ul><li>@1</li><li>@1</li></ul>      |
+| unique_ptr\<MessageT\> @1        | <ul><li>unique_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li></ul>  |    <ul><li>@1</li><li>@2</li></ul>      |
+| unique_ptr\<MessageT\> @1        | <ul><li>unique_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li></ul>  |    <ul><li>@1</li><li>@2</li><li>@2</li></ul>      |
+| unique_ptr\<MessageT\> @1        | <ul><li>unique_ptr\<MessageT\></li><li>unique_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li></ul>  |    <ul><li>@1</li><li>@2</li><li>@3</li><li>@3</li></ul>      |


The proposed approach with subscriber specific buffers does come with drawback though which isn't described in the current document. In the existing implementation the potential duplication of the message happens on demand when a subscriber actually takes the message. In the proposal the duplication directly at publishing time.

Considering a scenario (3) with N subscribers all taking a unique pointer. If the subscribers don't actually take the message (e.g. it busy and the message is being overwritten due to QoS settings) the existing implementation doesn't require a copy where as the proposal does result in the copy taking place anyway. So in a high load scenario the message being skipped since the subscriber is e.g. too slow still adds to the system load.

In another scenario (4) the separate buffers with the already duplicated data can't optimize the number of copies in all cases. Consider two subscribers - one taking a shared pointer and one taking a unique pointer. In the current proposal the message is always copies once. In the existing code with a single buffer serving both subscribers the logic can avoid this if e.g. the first subscriber takes the shared pointer and already gives away its ownership of the shared pointer before the second subscriber even tries to take the message. In that example there is no requirement to actually copy the data.

One way to delay creating additional copies would be to always store shared pointers in the subscribers' buffers. Then, any additional copies (say, to covert to unique pointer) are only done when the subscriber removes the message from the buffer. This directly addresses (3).

Regarding (4), I don't see how you can avoid the copy if two subscribers want the same message but one wants a shared pointer and the other a unique pointer. Could you perhaps elaborate?

Could you perhaps elaborate?

As mentioned in the scenario: the first subscriber can get a const shared pointer of the existing message. After reading some data from it the subscriber releases the shared pointer. So the only reference of the message is the one in the buffer. If now the second subscriber wants to get a unique pointer the ownership can be passed from the buffer to the second subscriber. There was no need to copy the data because the usage of the message between the subscribers didn't overlap.

@dirk-thomas sorry, but I still don't understand exactly.

The first subscriber will call the method get, while the second will call pop.

Regardless whether the ring buffer contains a unique_ptr or a shared_ptr, for the first subscriber no copies are needed and the internal message is promoted to shared_ptr.
When the second subscriber asks for the message now there is a shared_ptr inside the buffer and I don't see the mechanism that allows to avoid the copy

From what I can understand, @dirk-thomas is trying to describe the following scenario (note: the temporal aspect of the scenario is important):

time_0: subscriber_a wants a const shared pointer to the message (hence, no copy)
time_1: subscriber_a is done processing the message and relinquishes ownership of the message.
time_2: subscriber_b wants a unique pointer to the message (shared pointer is converted to unique pointer, thus avoiding the copy).

Requirement: time_2 comes after time_1

Is this correct, @dirk-thomas?

I think it is possible to do this optimization (essentially, move instead of copy for a single remaining sub) though it may not be trivial. The shared pointer counter alone can not be used as a proxy for the number of subs since the message may not have been copied to all the subscribers' buffers when this check is done, so this counter may underestimate the number of remaining subs.

Would you be okay with this being addressed as a future optimization considering it may not be a very common case (large message + multiple subs with atleast one sub, say A, requiring unique ptr + A tries to get the message only after other subs are done processing the message)?

I am not suggesting that this must be addressed. But I think this case should be described in the document and that the separate buffers make that kind of optimization more difficult.

Fair enough.

dirk-thomas · 2019-06-12T17:55:17Z

proposals/intraprocess_communication.md

+   intra-process `Subscription`s when it is created, but, eventually, a `transient local` `Subscription` in a
+   different process joins. Initially, published messages are not passed to the middleware, since all the
+   `Subscription`s are in the same process. This means that the middleware is not able to store old messages
+   for eventual late-joiners.


That is a good example for why a transient local publisher must publish to the RMW impl even if at the time of publishing no inter process subscribers exist.

dgoel · 2019-06-12T20:04:51Z

The approach sounds reasonable to me, though I would add to the open issue that we're not reusing resources as much as we could because we create a ring buffer or FIFO queue for each subscription, but in reality we could have them share a single buffer in many cases. The overhead isn't extreme, as these ring buffers/queues only store pointers, but still in cases where there are many subscriptions in the same process (e.g. a process with hundreds of nodes that all subscribe to /tf) it could be a really good optimization.

I'll have another look later, but that's my first pass review. Sorry it took so long.

Thanks @wjwwood for taking the time to review!

The design to have independent buffers for each of the subscriber was deliberate:

It's easy to support different QoS for each subscriber. Overall, implementation is less complex.
Since the buffers are separate, multiple subscribers can remove messages from their own buffers in parallel and not block each other (except for the overhead of synchronizing the shared pointer counter). If you have 100s of subs to a topic, this design can provide higher throughput at the expense of slightly more memory. (I also think if you have 100s of subscribers, slight memory increase is probably a tiny delta compared to the overall memory usage).

dgoel · 2019-06-12T20:45:12Z

@wjwwood @dirk-thomas Thanks for taking time to review the design proposal. We want to figure out what should be the next step(s).

Ideally, we would want to close this PR and open an official design proposal and continue conversations there. We can address some of the topics that you have brought up before that:

Forward messages to rmw unconditionally if transient_local QoS is set by the publisher. This would allow late joiners from a different process to get old messages.
Move the proposal to add disconnect APIs in rmw into a separate design.
Future optimizations:
a. Move instead of copy if there's only one sub left and it wants a unique pointer.

Sound good?

dgoel

I am closing this review in favor of the official ros2 design PR.

Dhiraj Goel added 2 commits May 30, 2019 13:49

Add proposal for new intra process communication

e7becb4

Add block diagrams for the old and new IPC manager

1ab3533

irobot-ros requested review from alsora, dgoel, joxoby and mauropasse May 31, 2019 17:41

alsora added 3 commits June 3, 2019 15:20

added tests for intra_process_demo

4951eec

added support for unique ptr and messageT queues in design doc

1749aae

Merge pull request #2 from alsora/alsora/ipc_unique_ptr_support

28e57e3

added support for unique ptr and messageT queues in design doc

wjwwood reviewed Jun 6, 2019

View reviewed changes

alsora added 3 commits June 6, 2019 10:07

addressed comments about current situation of intra-process communica…

4cbf759

…tion

small fix

0b46af6

clarified which info can be accessed from the RMW

c99db1b

alsora mentioned this pull request Jun 6, 2019

checking origin of intra-process msg before taking them ros2/rclcpp#753

Merged

removed description of shared_ptr case as it's deprecated

a2214c2

clalancette reviewed Jun 6, 2019

View reviewed changes

alsora added 3 commits June 6, 2019 17:51

clarified some general points, renamed waitable class

10f37a9

specified number of copies for each use-case

8705c75

more detailed node creation

e2c01e9

wjwwood reviewed Jun 12, 2019

View reviewed changes

addressed some errors, added proposals, pointed to some solutions to …

48048e0

…ignore publications

fix for late joining subscription

a5e1c8a

dirk-thomas reviewed Jun 12, 2019

View reviewed changes

alsora added 2 commits June 13, 2019 16:38

updated qos description

223a556

changed format for official design proposal

e1268b8

alsora mentioned this pull request Jun 13, 2019

New intra-process communication design ros2/design#239

Merged

dgoel approved these changes Jun 14, 2019

View reviewed changes

dgoel merged commit b5ce943 into master Jun 14, 2019

alsora deleted the ipc_proposal branch February 8, 2022 10:48

Conversation

irobot-ros commented May 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjwwood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alsora Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clalancette left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wjwwood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alsora commented Jun 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irobot-ros commented May 31, 2019 •

edited

Loading

alsora Jun 6, 2019 •

edited

Loading

alsora Jun 12, 2019 •

edited

Loading

dgoel commented Jun 12, 2019 •

edited

Loading