Skip to content

Topic subscription scalability fix(#1078)#12

Merged
mikepurvis merged 1 commit intolunar-develfrom
CORE-7220-epoll_raw
Aug 17, 2017
Merged

Topic subscription scalability fix(#1078)#12
mikepurvis merged 1 commit intolunar-develfrom
CORE-7220-epoll_raw

Conversation

@guillaumeautran
Copy link
Copy Markdown

Improve performance of topic socket polling.

@guillaumeautran guillaumeautran force-pushed the CORE-7220-epoll_raw branch 4 times, most recently from 1ee218b to 91541a4 Compare July 25, 2017 17:41
#else
#include <cstring> // strerror
#include <fcntl.h> // for non-blocking configuration
#include <sys/epoll.h>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause trouble on POSIX platforms where epoll isn't a thing, especially OS X and BSD. I think you'll need to test for the header in CMake and then have an #ifdef that only optionally includes it.

You can do this easily with CMake's check_include_file macro, see: https://cmake.org/cmake/help/v3.5/module/CheckIncludeFile.html

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that also mean we need to support all three implementation? Windows / poll / epoll ?
If that's the case, I'd much rather have an implementation file per OS.
io_win.cpp / io_posix.cpp / io_epoll.cpp.
That way, implementation file can be neat and clean (could be all in one file as well though.)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summoning @dirk-thomas for an opinion on this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not opposed to any of the options. But I would think that maintaining the current pattern of keeping the implementations in the same file will likely be easier to review. If you decide to split the implementation please make sure to use separate commits for the following steps:

  • copy existing file to new files without any changes
  • remove redundant parts
  • make the code work again from the new files

This will ensure that the git history is easy to follow.

ROS_ERROR("Unable to add FD to epoll: %s", strerror(errno));
}
#endif
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully digested all of what's going on here, but this if/else stuff going on everywhere feels like it's going to be a big pain to maintain. Should we be contemplating supplying multiple distinct implementations (epoll, poll, windows), and using a single branch somewhere to select which one is used?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the existing model doing my best not to disrupt too much. But I agree, a single branch is likely best.

*/
int poll_sockets(socket_pollfd *fds, nfds_t nfds, int timeout) {
boost::shared_ptr<std::vector<socket_pollfd> > poll_sockets(int epfd, socket_pollfd *fds, nfds_t nfds, int timeout) {
#if defined(WIN32)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return type is gross enough that I'd consider supplying a typedef for it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, can do...

if (fd_cnt < 0)
{
ROS_ERROR("Error in epoll_wait! %s", strerror(errno));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what conditions would cause this, and if there's something else we need to be doing about them?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer is "no". don't know when this would happen. And the best I can do now is retry next time hoping the failure will go away.

I am suspecting that may be the computer going into sleep mode might cause the syscall to fail on return. But have not tested it thoroughly yet. There are some flags to control that part which I want to play with once the overall structure is solidified.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the one case I know (taken care of in latest push), is that the syscall can be interrupted. In this case, we just return and try again later.

@guillaumeautran
Copy link
Copy Markdown
Author

guillaumeautran commented Jul 26, 2017

Some performance numbers using a played bag with 317 topics (bags plays in a loop)

$ rostopic list | wc -l
317

Legacy rosbag recorder (Linux version using poll)

$ time rosbag record  --duration 2m --all
..snip..
real	2m0.648s
user	0m10.856s
sys	0m22.224s

This version of ros_comm using poll (#undef HAVE_EPOLL)

$ time rosbag record  --duration 2m --all
..snip..
real	2m0.622s
user	0m10.488s
sys	0m20.912s

This version of ros_comm using epoll (#define HAVE_EPOLL)

$ time rosbag record  --duration 2m --all
..snip..
real	2m0.943s
user	0m14.744s
sys	0m7.224s

I did not try the windows version so don't know the numbers there...

@dirk-thomas
Copy link
Copy Markdown

Please try to avoid any unrelated white space changes. I know that it is tempting to "clean things up" on the go but it makes reviewing the patch more effort than necessary and make potential backports more problematic.

@guillaumeautran
Copy link
Copy Markdown
Author

I'll scrub again for unrelated white-space changes. Sorry about that.

@guillaumeautran guillaumeautran force-pushed the CORE-7220-epoll_raw branch 2 times, most recently from 32d303d to 2856125 Compare July 27, 2017 20:16
Switching to using epoll system calls to improve performance of the topic polling code by a factor of 2.
@mikepurvis
Copy link
Copy Markdown

This looks sane to me. Let's give it a try.

@mikepurvis mikepurvis merged commit dc1a122 into lunar-devel Aug 17, 2017
@mikepurvis mikepurvis deleted the CORE-7220-epoll_raw branch August 17, 2017 00:59
@mikepurvis
Copy link
Copy Markdown

Pushed to upstream's lunar-edge branch: https://github.com/ros/ros_comm/tree/lunar-edge

mikepurvis pushed a commit that referenced this pull request Nov 3, 2017
Switching to using epoll system calls to improve performance of the topic polling code by a factor of 2.
mikepurvis pushed a commit that referenced this pull request Nov 15, 2017
Switching to using epoll system calls to improve performance of the topic polling code by a factor of 2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants