Skip to content

Ipc proposal#1

Merged
dgoel merged 16 commits intomasterfrom
ipc_proposal
Jun 14, 2019
Merged

Ipc proposal#1
dgoel merged 16 commits intomasterfrom
ipc_proposal

Conversation

@irobot-ros
Copy link
Copy Markdown
Collaborator

@irobot-ros irobot-ros commented May 31, 2019

New IPC proposal

CC: @clalancette @raghaprasad

Copy link
Copy Markdown

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite done reviewing this, but I wanted to post what I have now. I'll keep going and hopefully finish up tomorrow some time.

RMW implementation.

At the moment, all the ROS2 supported middlewares (Fast-RTPS, OpenSplice, Connext, DPS) do not make
distinctions whether the `Publisher` and the `Subscription` are in the same process or not. In consequence,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, all the ROS2 supported middlewares (Fast-RTPS, OpenSplice, Connext, DPS) do not make
distinctions whether the Publisher and the Subscription are in the same process or not.

I'm not sure that's correct either. Needs citation at the very least.

Copy link
Copy Markdown
Collaborator

@alsora alsora Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I removed the sentence.

Looking at Fast-RTPS and DPS (and their RMW) I'm pretty sure that's correct.
It's also supported from experimental evaluations where having a publisher and a subscription in the same process or in two processes of the same machine does not affect the performances.

For what concerns OpenSplice, that statement is not correct, as this middleware uses the same ports for all the entities in the same process. This results in an almost instantaneous discovery, but I haven't seen any changes in latency for example.

No idea about Connext.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, they could do that if they wanted. I think it's worth separating whether or not they could make optimizations versus whether or not they do. Not being able to make these optimizations is a strong case for improving our intra-process system, but in the current situation, a valid question is "why not improve the rmw implementations using the information they have first"? I'm not saying we should do that, it's just a weak justification for adding more stuff into our intra-process pipeline instead, whereas if it were not possible for them to do those optimizations, that would be a strong reason to do it in rclcpp.

For example, a strong reason to do it in rclcpp (in my opinion) is that when we add type masquerading (using OpenCV Image or PCL point cloud instead of the ROS message type) it will be easier to support intra-process with those types if it is in rclcpp, though that doesn't imply that we use an envelope message or have our own queueing system.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Optimization" is not a binary operation though. There's no doubt that DDS vendors could do some optimizations and improve the efficiency of intra-process communication but there's only so much that can be done at the rmw layer. Unless some APIs are re-designed, there would always be the overhead of some additional copies. This overhead can be avoided entirely by doing this optimization in rclcpp directly without depending on the DDS vendor. IMO that is a strong enough reason since it guarantees a minimum level of performance irrespective of the DDS library.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but my point is that it's not a strong supporting reason to put more complexity into rclcpp rather than try to improve the existing design. Since we haven't even tried to improve performance within the rmw layer. If performance (latency and cpu/memory usage) were improved for small messages within the same process at the rmw layer, then the drawbacks of using a "meta-message" decrease significantly, even if they will perhaps never be as good as a queue in rclcpp itself.

It's a bit of a moot point since the work is essentially already done, but adding things to the rclcpp layer adds complexity to a part of the system that is not optional (or at least not without removing code manually -- they don't have to use intra process after all).

I'm thinking towards certification efforts where they might have a much simpler or already certified rmw/middleware layer below.

To be clear, I think the approach described here is the best option, I'm just trying to make sure it's properly motivated as it does cost something now and in the future (in terms of maintenance). For example, each time we expose a QoS in ROS 2 we'll have to update the intra process system, and I think the proposed system will be more fragile/complex to update than the meta-message approach we have now.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the proposed system will require an effort every time a new QoS is added.
However, I don't think that the current one handles this in a better way.

The RMW delivers the meta-messages enforcing the QoS, however if the IntraProcessManager "thinks" that a message has to delivered to N subscription, it will wait for all of them before releasing the memory in the ring buffer. This could be a potential source of problems.

Moreover, even if this proposal adds complexity to the rclcpp layer, I think that it's important that the intra-process communication is implemented in a way that is agnostic to the RMW.
Implementing it in the RMW will potentially lead to have different versions of intra-process communication for each RMW implementation...

RMW which does not have a notion of inter and intra-process communication. Consequently, the real message will
be delivered to both `Subscription`s present in the system. The `Subscription` that is in the same process as
the `Publisher` will actually discard the message, but it will be able to do that only after receiving and
deserializing it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential way to fix this, e.g.:

  • Search for ignore_participant or ignore_publication or ignore_subscription in the DDS Standard:
    • https://www.omg.org/spec/DDS/1.4/
    • Note that we didn't use this option because at the time, OpenSplice didn't support it (years ago), and I don't know if Fast-RTPS supports it or not.

But this might be affected by our decision to change the relationship between nodes and participants (again trying to save resources).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the missing piece that prevents me from recommending we enable intra-process by default, by the way.

Copy link
Copy Markdown

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few comments inline.

Copy link
Copy Markdown

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach sounds reasonable to me, though I would add to the open issue that we're not reusing resources as much as we could because we create a ring buffer or FIFO queue for each subscription, but in reality we could have them share a single buffer in many cases. The overhead isn't extreme, as these ring buffers/queues only store pointers, but still in cases where there are many subscriptions in the same process (e.g. a process with hundreds of nodes that all subscribe to /tf) it could be a really good optimization.

I'll have another look later, but that's my first pass review. Sorry it took so long.

However, it does not know whether the entities have intra-process communication enabled or not.
Consequently, the real message will be delivered to both `Subscription`s present in the system. The
`Subscription` that is in the same process as the `Publisher` will actually discard the message, but it will
be able to do that only after receiving and deserializing it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's worth pointing out that it should be possible to avoid this problem in the future, as I discussed here:

#1 (comment)

Otherwise it seems like an insurmountable technical issue (which I don't think is accurate).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this section explaining that this problem are only present in the current implementation and pointing to possible solutions.

`shared_ptr<constMessageT>` and `unique_ptr<MessageT>` that better fits with its callback type. This is
deduced looking at the output of `AnySubscriptionCallback::use_take_shared_method()`.

If the history QoS is set to `keep all`, the buffers are dynamically allocated. On the other hand, if the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't how KEEP_ALL works in DDS, btw. There's another QoS that we're not currently exposing (but we should) which is max_samples, max_instances, and max_samples_per_instance, which is the actual limit for a KEEP_ALL since it doesn't have a depth. See:

https://community.rti.com/static/documentation/connext-dds/5.3.0/doc/api/connext_dds/api_cpp/structDDS__ResourceLimitsQosPolicy.html

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm could you clarify the difference between sample and instance?

See my last comment #1 (comment), I decoupled all the different components and in this way it will be much simpler to support different options for the length of the queue

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what the documentation for Connext [1] says :

Keep all the samples.

On the publishing side, RTI Connext will attempt to keep all samples (representing each value written) of each instance of data (identified by its key) managed by the DDSDataWriter until they can be delivered to all subscribers.

On the subscribing side, RTI Connext will attempt to keep all samples of each instance of data (identified by its key) managed by the DDSDataReader. These samples are kept until the application takes them from RTI Connext via the take() operation.

@wjwwood seems to be suggesting that there's another limit above which the size of the history does not grow. That's reasonable I think.

In this intra-process proposal, the messages are delivered directly to the subscriber's buffer so publishing side does not need to do anything different. On the sub side, we would have to allow dynamic resizing of the buffer (i.e. standard queue instead of ring buffer) if this QoS is enabled.

Do you agree @wjwwood?

[1] https://community.rti.com/static/documentation/connext-dds/5.3.0/doc/api/connext_dds/api_cpp/group__DDSHistoryQosModule.html#gga96c54dadb2beb87b23776e766e282185a9e526d876dae232aa083125c72135f5b

intra-process `Subscription`s when it is created, but, eventually, a `transient local` `Subscription` in a
different process joins. Initially, published messages are not passed to the middleware, since all the
`Subscription`s are in the same process. This means that the middleware is not able to store old messages
for eventual late-joiners. No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this one is a kind of a deal-breaker. We just have to publish to inter-process anytime this QoS is used. There's no choice.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to intra-process buffers, we can unconditionally forward the messages to rmw layer if transient_local QoS is used. That will address this issue since rmw layer should take care of sending these messages to later joiners from a different process.

- The proposal does not take into account the problem of having a queue with twice the size when both inter
and intra-process communication are used. A node with a history depth of 10 will be able to store up to
20 messages without processing them (10 intra-process and 10 inter-process). This issue
is also present in the current implementation, since each `Subscription` is doubled.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be addressed though right? Also what does "A node with a history depth of 10" mean? Shouldn't you say publisher or subscription?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant subscription or publisher.

I don't have any nice idea for addressing this without entangling RMW and Intra-Process Manager

@alsora
Copy link
Copy Markdown
Collaborator

alsora commented Jun 12, 2019

I made a big update of the intra-process implementation in order to fix the type-erasing and to better prepare for the future.

Briefly this is how I implemented it now.

Intra-process communication involves potentially different data-types

  • MessageT: the "basic" data-type, i.e. something like std_msgs::msg::int
  • PublicationT: the data-type published by the Publisher
  • BufferT: the data-type stored in a buffer.
  • CallbackT: the data-type used in the Subcription callback.

By default BufferT = CallbackT.
I implemented a SubscriptionOption that can be used to manually specify the BufferT.
This can be used for example to set the BufferT = MessageT that can be used for very small messages.

I defined a hierarchy of classes for the IntraProcessBuffer

  • IntraProcessBufferBase
  • IntraProcessBuffer<MessageT> inherits from IntraProcessBufferBase
  • TypedIntraProcessBuffer<MessageT, BufferT> inherits from IntraProcessBuffer<MessageT>

Once the TypedBuffer is created, the classes that have to use it will only access a Buffer<MessageT> instance, thus being agnostic with respect to the type that is actually stored in the buffer.
There are methods for pushing or popping any possible variant of MessageT.

The IntraProcessBuffer contains a buffer implementation that can be any object implementing this minimal interface.

template<typename BufferT>
class BufferImplementationBase
{
public:
  virtual void dequeue(BufferT & request) = 0;
  virtual void enqueue(BufferT request) = 0;

  virtual void clear() = 0;
  virtual bool has_data() const = 0;
};

This potentially allows to use different buffer implementations (i.e. simple queues, ring buffers, wait-free implementations or whatever).
This is a good place where to put the logic related to the depth of the queue for example and maybe also for other QoS.

show that creating a `Publisher` or a `Subscription` has a non-negligible memory cost.

Moreover, when using the default RMW implementation, Fast-RTPS, the memory consumed by a ROS2 application
increases almost exponentially with the number of `Publisher` and `Subscriptions`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this argument alone is not a good rational for avoiding to use publishers and subscribers. ROS is a publish / subscribe middleware - instead of avoid to use publishers and subscribers the memory consumption should be reduced where possible.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No disagreement there. It's more of a practical consideration (than philosophical). I hope you agree that exponential increase in memory with the number of pubs (or subs) is definitely something that can't be ignored. And every core design decision that increases this number makes ROS2 less and less friendly for a resource constrained system (e.g. 32-bit arm linux with 100s of MB of RAM).

This statement (and others about rmw) are simply mentioned as our motivation for creating a new intra-process design.


In order to avoid the issue previously described, i.e., that a `Subscription` would receive both inter and
intra-process messages, it is necessary to inform the underlying middleware that some "connections" will be
handled at the `rclcpp` level, enabling the RMW to neglect them.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality could be implemented independently of the separate subscriber-specific buffers to address the problem described in scenario 1. I think the proposal should separate this into two independent changes.

| unique_ptr\<MessageT\> @1 | <ul><li>shared_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li></ul> | <ul><li>@1</li><li>@1</li></ul> |
| unique_ptr\<MessageT\> @1 | <ul><li>unique_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li></ul> | <ul><li>@1</li><li>@2</li></ul> |
| unique_ptr\<MessageT\> @1 | <ul><li>unique_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li></ul> | <ul><li>@1</li><li>@2</li><li>@2</li></ul> |
| unique_ptr\<MessageT\> @1 | <ul><li>unique_ptr\<MessageT\></li><li>unique_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li><li>shared_ptr\<MessageT\></li></ul> | <ul><li>@1</li><li>@2</li><li>@3</li><li>@3</li></ul> |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed approach with subscriber specific buffers does come with drawback though which isn't described in the current document. In the existing implementation the potential duplication of the message happens on demand when a subscriber actually takes the message. In the proposal the duplication directly at publishing time.

Considering a scenario (3) with N subscribers all taking a unique pointer. If the subscribers don't actually take the message (e.g. it busy and the message is being overwritten due to QoS settings) the existing implementation doesn't require a copy where as the proposal does result in the copy taking place anyway. So in a high load scenario the message being skipped since the subscriber is e.g. too slow still adds to the system load.

In another scenario (4) the separate buffers with the already duplicated data can't optimize the number of copies in all cases. Consider two subscribers - one taking a shared pointer and one taking a unique pointer. In the current proposal the message is always copies once. In the existing code with a single buffer serving both subscribers the logic can avoid this if e.g. the first subscriber takes the shared pointer and already gives away its ownership of the shared pointer before the second subscriber even tries to take the message. In that example there is no requirement to actually copy the data.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to delay creating additional copies would be to always store shared pointers in the subscribers' buffers. Then, any additional copies (say, to covert to unique pointer) are only done when the subscriber removes the message from the buffer. This directly addresses (3).

Regarding (4), I don't see how you can avoid the copy if two subscribers want the same message but one wants a shared pointer and the other a unique pointer. Could you perhaps elaborate?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perhaps elaborate?

As mentioned in the scenario: the first subscriber can get a const shared pointer of the existing message. After reading some data from it the subscriber releases the shared pointer. So the only reference of the message is the one in the buffer. If now the second subscriber wants to get a unique pointer the ownership can be passed from the buffer to the second subscriber. There was no need to copy the data because the usage of the message between the subscribers didn't overlap.

Copy link
Copy Markdown
Collaborator

@alsora alsora Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dirk-thomas sorry, but I still don't understand exactly.

The first subscriber will call the method get, while the second will call pop.

Regardless whether the ring buffer contains a unique_ptr or a shared_ptr, for the first subscriber no copies are needed and the internal message is promoted to shared_ptr.
When the second subscriber asks for the message now there is a shared_ptr inside the buffer and I don't see the mechanism that allows to avoid the copy

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can understand, @dirk-thomas is trying to describe the following scenario (note: the temporal aspect of the scenario is important):

time_0: subscriber_a wants a const shared pointer to the message (hence, no copy)
time_1: subscriber_a is done processing the message and relinquishes ownership of the message.
time_2: subscriber_b wants a unique pointer to the message (shared pointer is converted to unique pointer, thus avoiding the copy).

Requirement: time_2 comes after time_1

Is this correct, @dirk-thomas?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to do this optimization (essentially, move instead of copy for a single remaining sub) though it may not be trivial. The shared pointer counter alone can not be used as a proxy for the number of subs since the message may not have been copied to all the subscribers' buffers when this check is done, so this counter may underestimate the number of remaining subs.

Would you be okay with this being addressed as a future optimization considering it may not be a very common case (large message + multiple subs with atleast one sub, say A, requiring unique ptr + A tries to get the message only after other subs are done processing the message)?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not suggesting that this must be addressed. But I think this case should be described in the document and that the separate buffers make that kind of optimization more difficult.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

intra-process `Subscription`s when it is created, but, eventually, a `transient local` `Subscription` in a
different process joins. Initially, published messages are not passed to the middleware, since all the
`Subscription`s are in the same process. This means that the middleware is not able to store old messages
for eventual late-joiners.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good example for why a transient local publisher must publish to the RMW impl even if at the time of publishing no inter process subscribers exist.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@dgoel
Copy link
Copy Markdown
Collaborator

dgoel commented Jun 12, 2019

The approach sounds reasonable to me, though I would add to the open issue that we're not reusing resources as much as we could because we create a ring buffer or FIFO queue for each subscription, but in reality we could have them share a single buffer in many cases. The overhead isn't extreme, as these ring buffers/queues only store pointers, but still in cases where there are many subscriptions in the same process (e.g. a process with hundreds of nodes that all subscribe to /tf) it could be a really good optimization.

I'll have another look later, but that's my first pass review. Sorry it took so long.

Thanks @wjwwood for taking the time to review!

The design to have independent buffers for each of the subscriber was deliberate:

  1. It's easy to support different QoS for each subscriber. Overall, implementation is less complex.
  2. Since the buffers are separate, multiple subscribers can remove messages from their own buffers in parallel and not block each other (except for the overhead of synchronizing the shared pointer counter). If you have 100s of subs to a topic, this design can provide higher throughput at the expense of slightly more memory. (I also think if you have 100s of subscribers, slight memory increase is probably a tiny delta compared to the overall memory usage).

@dgoel
Copy link
Copy Markdown
Collaborator

dgoel commented Jun 12, 2019

@wjwwood @dirk-thomas Thanks for taking time to review the design proposal. We want to figure out what should be the next step(s).

Ideally, we would want to close this PR and open an official design proposal and continue conversations there. We can address some of the topics that you have brought up before that:

  1. Forward messages to rmw unconditionally if transient_local QoS is set by the publisher. This would allow late joiners from a different process to get old messages.
  2. Move the proposal to add disconnect APIs in rmw into a separate design.
  3. Future optimizations:
    a. Move instead of copy if there's only one sub left and it wants a unique pointer.

Sound good?

Copy link
Copy Markdown
Collaborator

@dgoel dgoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am closing this review in favor of the official ros2 design PR.

@dgoel dgoel merged commit b5ce943 into master Jun 14, 2019
@alsora alsora deleted the ipc_proposal branch February 8, 2022 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants