Skip to content
This repository was archived by the owner on May 31, 2025. It is now read-only.

Topic subscription scalability fix#1162

Closed
guillaumeautran wants to merge 1 commit intolunar-develfrom
lunar-edge
Closed

Topic subscription scalability fix#1162
guillaumeautran wants to merge 1 commit intolunar-develfrom
lunar-edge

Conversation

@guillaumeautran
Copy link
Copy Markdown
Contributor

@guillaumeautran guillaumeautran commented Sep 12, 2017

Switching to using epoll system calls to improve performance of the topic polling code by a factor of 2. Addresses #1078.

Some previous discussion/review/numbers here: clearpathrobotics#12

@mikepurvis mikepurvis changed the title Topic subscription scalability fix (#1078) (#12) Topic subscription scalability fix Sep 12, 2017
@mikepurvis
Copy link
Copy Markdown
Member

This has been working well for us across several weeks of testing. 👍

@guillaumeautran
Copy link
Copy Markdown
Contributor Author

I'm not sure to understand the unit test here. The test add 100 threads who all register the same FD for POLLIN and POLLOUT. The previous implementation behaviour was that a single handler was going to be called (single FD) while the new code behaviour calls ALL of the registered hander on that socket. In this case, all 100 threads will be called granted the ability to write to the socket and all 100 threads will be called to read from the socket.

I am not sure how the code is expected to behave in the case where multiple handlers are registering for the same socket...

Test code is here: https://github.com/ros/ros_comm/blob/lunar-devel/test/test_roscpp/test/test_poll_set.cpp#L285

@guillaumeautran
Copy link
Copy Markdown
Contributor Author

guillaumeautran commented Sep 25, 2017

After further investigations, it appears that the original code (prior to my change) was behaving differently when multiple handlers are registered for the same socket. Based on the code here: poll_set.cpp#L76 multiple handlers are added to the internal data structure but when registering for events on the socket, poll_set.cpp#L125, only the first handler is marked to receive the event.
This actually IMHO is a bug of the original implementation because the event handler call that is supposed to receive one type of event may not be able to handle all events correctly (write event on a read-only handler, etc). In addition, deleting the event from the handler will remove the flag from the first handler found in the list, poll_set.cpp#L144, which might not be correct either as that particular handler may still be looking to receive that particular event.

IMHO, the proper behaviour with multiple event handlers should really be such that the handler registering for a particular event should be the one receiving that event. For example, a single handler could be used for socket reads while a separate handler could be used for socket writes.

Because the code still contains all three versions (Windows, poll, epoll), I suggest we disable this particular unit test and leave it as that.

@mikepurvis
Copy link
Copy Markdown
Member

@dirk-thomas Thoughts on the unit test discussion above?

Switching to using epoll system calls to improve performance of the topic polling code by a factor of 2.
@mikepurvis
Copy link
Copy Markdown
Member

Actually, going to close this now so that I can resume use of the lunar-edge branch for testing. @guillaumeautran Please make a new PR from a specific feature branch, and we can continue the conversation there.

@mikepurvis mikepurvis closed this Nov 3, 2017
@dirk-thomas
Copy link
Copy Markdown
Member

Since this patch breaks ABI should it target ROS Melodic?

@mikepurvis
Copy link
Copy Markdown
Member

Is it mutating any symbols which are linked from user code? I think it's all internal stuff that's being changed, so that should be safe, right?

@dirk-thomas
Copy link
Copy Markdown
Member

Is it mutating any symbols which are linked from user code? I think it's all internal stuff that's being changed, so that should be safe, right?

How can we assume that the changed headers (which are public and nowhere marked as impl / internal) are not used by any userland code?

@mikepurvis
Copy link
Copy Markdown
Member

I suppose we cannot be completely assured of that, but the two affected headers (ros/poll_set.h and ros/io.h) are pretty clearly deep implementation details, and there's not a puff about either of them in the wiki API documentation, nor are the classes involved meant to be extended— PollSet has no virtual methods, and there's no other functions which accept a PollSet instance.

I think the main risk with PollSet is that there could be a member of that type in an API class, which would cause the size of that class to change. However, on a quick grep through the repo, it looks like all instances (TransportTCP, TransportUDP, PollManager), are maintained as pointers or references, so those are safe from size change issues.

Anyway, this is mostly academic— we can actively test/ship this for now, and if it's doesn't merge until melodic-devel, that's acceptable to us. But it is also a change which could really benefit a lot of Kinetic/Lunar users, and I wouldn't want to hold it back on an unlikely technicality.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants