Fix a bug when publishing unique_ptr#504
Conversation
A do_inter_process_publish is always called when publishing unique_ptr, even for the intra-process situation. It will cause the performance of intra-process publishing degrade as the inter-process situation. Since other cases of intra-process publishing (i.e. publishing shared_ptr, publishing message reference, and publishing message pointer) all call this version of publish method, it will basically affect all intra-process publishing performance. This fix will bring the expected intra-process low-latency performance.
|
Have you tried to subscribe to the message from a different process with this patch applied? |
|
This is not an appropriate fix, external subscriptions will not work when intra-process is on, as @dirk-thomas suggested. |
|
Yes, you are right, @wjwwood @dirk-thomas. Sorry, I miss the requirement that intra- and inter- should be able to work together. |
Add a do_inter_process_publish AFTER the intra- message is published. This will enable intra- and inter-process publishing to work together.
|
OK. I add a do_inter_process_publish AFTER the intra- message is published. This will enable intra- and inter-process publishing to work together. It could still be improved, I think. I guess that the subscriber should subscribe the intra- version DDS handle only, and do not subscribe the normal version DDS handle. |
| if (RCL_RET_OK != status) { | ||
| rclcpp::exceptions::throw_from_rcl_error(status, "failed to publish intra process message"); | ||
| } | ||
| this->do_inter_process_publish(msg_ptr); |
There was a problem hiding this comment.
The function call store_intra_process_message_() above is taking ownership of the msg_ptr. So we can't rely on msg_ptr being still valid here. That is reason why the "inter" call currently happens before the "intra" call.
It would certainly be good to address the latency issue you describe but it will probably be more complicated than this.
The subscriber might additionally receive a message from another process. So it still needs to subscribe to the DDS topic in general. |
OK, I get your point. |
|
@Jrdevil-Wang it could be more granular so that you could control it on a per topic or even per publisher or subscription basis, but that doesn't change the need to do interprocess communication as well. In ROS 1, the passing of pointers instead of sending over the network within a node or between nodelets was on by default and you didn't need to do anything special to turn that on, so having it on a Node level seems natural to me.
Can you clarify what feature or API in ROS 1 you're referring to? I just want to make sure we're talking about the same thing. |
@wjwwood , please correct me if I am wrong. It seems in ros2, whether using intra-process communication is decided by a parameter on Node construction (use_intra_process_comms in the following code). rclcpp/rclcpp/include/rclcpp/node.hpp Lines 77 to 81 in 4d67a86 In ros1, whether a message is published with intra- mode is decided per subscriber-link. |
But the user usually doesn't influence that. In ROS 1 a What may not be obvious from your linked code is that there are many subscriber links and some will not be intra process, so in all cases the the serialized version of the message is sent to the publish queue for interprocess communications: You can see the various kind of links created, for intra-process here: And for transport subscriber links: |
|
Further more, you can see that the "local" connection is made for all publishers and subscriptions unconditionally in their respective creation functions:
|
|
OK, let me give an example to explain. Say there are two processes: process A has one publisher and one subscriber, and process B has one publisher and subscriber. All these publishers and subscribers are with the same topic. From my understanding, in ros1, there will be four subscriber links, and two of them are intra-process link, the other two of them are inter-process link. When A.pub publishes a message, this message is sent to A.sub through intra-process link, and to B.sub through inter-process link. It is fine to me. However, in ros2, all publishers and subscribers are with the same DDS topic. Therefore, when A.pub publishes a message, this message is sent to A.sub AND B.sub through DDS (DDS don't know we have another channel), and to A.sub with intra-process mechanism. Problem is, I should prevent DDS to send this message from A.pub to A.sub, but I can't, right? |
|
Yes, that's the double delivery problem. Borrowing from your example A.sub will receive the message from A.pub twice, once via intra process and once via inter process. DDS doesn't know about our "side channel", but we can tell it to not send data to itself when we have a side channel. That's what this option is for in our middleware API: https://github.com/ros2/rmw/blob/b4e6596161f122d520e5085742dd2e48dcdca82b/rmw/include/rmw/rmw.h#L246 Even in the case that there is double delivery, subscriptions which have intra process enabled will ignore interprocess messages from publishers in the same node, see: rclcpp/rclcpp/include/rclcpp/subscription.hpp Lines 218 to 224 in 4d67a86 This is inefficient because we've needlessly sent the message to ourselves over the middleware (DDS in this case), but we don't deliver data to the user twice. Unfortunately, the That's the remaining work that needs to be done (in my opinion) in order to make intra process on by default in Nodes. A further enhancement would be to have publishers watch the "matched subscriptions" (we don't expose this information from DDS yet) in order to know if publishing to DDS would actually end up sending data to anyone, and then as an optimization not doing inter process publish if there are no matched subscriptions from other participants. Each of these are sort of related, but different topics than the original question, or proposed enhancement in my opinion and perhaps deserve their own issues. To the best of my knowledge, the original proposal was to publish the intra process message first to improve latency, but that cannot be done without copying the message first to send to the inter process manager. The other option is to convert the unique pointer to a shared pointer before giving it to the intra process manager so that the inter process publish which is called later may be assured it will still exist. This causes other issues however, and makes truly zero copy across several pub/sub links impossible, because you can never be sure that the user removed read-write access to the message they received in a subscription callback and then published again. |
|
Happy to learn the plan.
Yes, that's true. There were no problem in my test, only because I used SingleThreadedExecutor, and there were no chance for the intra-process subscriber to consume the message before the message is sent through DDS in the changed publisher's code. Same principle, maybe I could add some kind of mutex to protect the msg_ptr? |
|
@dirk-thomas , @wjwwood . I get an idea about protecting the msg_ptr. Please check if it is acceptable before I implement it. During store_intra_process_message_, a record is created in a MappedRingBuffer. rclcpp/rclcpp/include/rclcpp/mapped_ring_buffer.hpp Lines 219 to 224 in 4d67a86 I plan to add another field "bool is_locked;". Initially, it is set to true, and will be set to false after finishing do_inter_process_publish. After that, nothing is different. |
|
Sorry for long delay in replying, and thanks for patience :) @Jrdevil-Wang I believe your proposal will work, however it has the draw back that having zero-copy pub/sub in a chain will be a race condition. For example, consider the https://github.com/ros2/ros2/wiki/Intra-Process-Communication#simple-pipeline Which generates and image and in each of three nodes the image is "stamped" with the address of the image message itself: The point of the demo is that the three addresses stamped on the image are always the same, implying zero-copy between several pub/sub links (and separately developed nodes). With your proposed change, if the inter-process copying of the message isn't fast enough, then the pointers on the image would not be the same, as the call to So we'll go from "you'll get zero-copy so long as there's only one subscriber" to "you'll sometimes (but non-deterministically) get zero-copy if there's only one subscriber". I also question whether or not this is a good trade-off. If most of the latency comes from copying the message before doing intra-process, then in the cases where the race fails (and That all being said, perhaps this is something we could add and make optional (something users could turn on with a flag of some sort). It does provide something between the unique pointer and shared pointer. Alternatively, I can see two other things we might do to address this problem. We could avoid copying the message for use with the inter-process publish if we could determine that there are no matched inter-process subscribers (avoiding the inter-process work all together). That would allow us to pass the
To do this we would need to implement the "count matched subscriptions" feature I described above. We have (in the past) discussed providing an option for intra-process to behave like ROS 1, where a call to
This has a few drawbacks:
However, if the user opts into it, then it is more efficient, lower latency, and more deterministic because there's no scheduling (callbacks are called as a direct result of calling I don't think this is the right behavior by default, but might be preferable in specific scenarios. |
|
Thanks for the reviews! @wjwwood
I think the image_pipeline_all_in_one demo will still work with my proposed change, because it uses the SingleThreadedExecutor executor. The executor has to complete of the publishing process, before any subscribers get the chance to execute. But yeah, when using multi-threaded executor, my proposed change will bring uncertainty.
This is a good solution, but I don't think it is easy to keep this count accurate, considering inter-machine subscriptions. |
Actually the only case where the copy is avoided all the time is when the publish call is coming from a callback in the single threaded executor (like from a timer callback) and the subscription callback is in the same single threaded executor. If the publish call is made from a different thread or if from a different executor or if in a multithreaded executor (assuming different callback groups for the subscription and timer), then it is a race to see whether or not a copy is returned by the subscription callback or not.
Well, so long as we use the same information the middleware keeps it should be accurate, after all this is how the middleware knows to create unicast connections to other endpoints. But we still need the feature. So what should we do with this pull request? Personally I'd prefer to keep the copy in place as there are only specific cases where we might be able to avoid it with the current proposal and instead we should strive to remove the copy when we can do it always, i.e. when there are no inter-process subscriptions. Separately we could consider merging this proposed optimization, but I think we should have an opt out option so that people can select determinism versus lower average latency (and so demos like |
|
Sorry I made a mistake in the previous comment, with a single threaded executor it will always avoid the copy, which is pretty good. I'll update the comment, but I still think having the ability to opt-out of this optimization would be best. |
|
Actually it's more complicated than even what I said in the previous comment, it is actually still possible that a copy is made on delivery with a single threaded executor if the publish call is made outside of the executor thread (asynchronously). |
|
@wjwwood Yeah, this problem is much more complicated than I thought, and there seems no easy way to solve here. Maybe I should just close this PR, and wait for a more thorough solution? |
|
@Jrdevil-Wang if you find the enhancement a benefit for your use case, I'd be ok iterating on this to make it opt-in or opt-out, but otherwise I would agree that we should close this and look towards one of the other optimizations/improvements I described above. I do appreciate you looking into it though! It was good to get written down somewhere and explain it. |
|
I'll let you decide to close it or not, or at the very least I'll wait for your feedback before closing it myself. |
|
@wjwwood Thanks! |
|
Ok, so first you'd need to implement what you proposed above (#504 (comment)), where the "final" call to Then we'd want to expose an option (probably at creation of the publisher) which opted into or out of the new behavior. Then in the demos (like the image pipeline demo) we'd need to ensure we're using the current behavior (because zero-copy is more important than the latency). At best we'd update the demo (and wiki page) to highlight this trade-off (lower average latency for occasionally needing to copy the message). One thing to watch out for in the implementation of the suggestion above, is that the message still needs to get removed from the ring buffer, even if the "last" call to |
|
Ok, I see. I will implement the proposal soon, hopefully. |
|
@Jrdevil-Wang #664 together with #690 have solved what you were proposing here. I will summarize a little bit:
rclcpp/rclcpp/include/rclcpp/publisher.hpp Lines 91 to 120 in 2b3e4d9
I think this PR could be closed now, but I will wait for your feedback before doing it. |
|
Dear ivanpauno,
Good to know that the issue is solved! Please close PR #504.
Best regards,
Jrdevil-Wang
From: ivanpauno
Date: 2019-05-03 22:42
To: ros2/rclcpp
CC: Yu-Ping Wang; Mention
Subject: Re: [ros2/rclcpp] Fix a bug when publishing unique_ptr (#504)
@Jrdevil-Wang #664 together with #690 have solved what you were proposing here.
I will summarize a little bit:
#664 avoided doing the inter process publish when only having intra process subscriptions. It used the count matching api introduced in #628.
#664 also allowed configuring intra process comm at publisher/subscription level. By default, the node intra process configuration was used. api for doing that will probably change with #713.
#690 allowed storing shared_ptr messages in the IPM. Using that, now the intra process publish is always done before the inter process publish.
https://github.com/ros2/rclcpp/blob/2b3e4d9c98b17bbed6a57e36a0466008454a1e4d/rclcpp/include/rclcpp/publisher.hpp#L91-L120
#690 also allowed zero-copy intraprocess comm with multiple-subscriptions, using shared_ptr<const T> callback signature and unique_ptr<T> publish signature.
I think this PR could be closed now, but I will wait for your feedback before doing it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Signed-off-by: ivanpauno <ivanpauno@ekumenlabs.com>

A do_inter_process_publish is always called when publishing unique_ptr, even for the intra-process situation.
It will cause the performance of intra-process publishing degrade as the inter-process situation.
Since other cases of intra-process publishing (i.e. publishing shared_ptr, publishing message reference, and publishing message pointer) all call this version of publish method, it will basically affect all intra-process publishing performance.
This fix will bring the expected intra-process low-latency performance.