Skip to content
This repository was archived by the owner on May 31, 2025. It is now read-only.

Use HTTP1.1: we get persistent connections for free#371

Closed
RainCT wants to merge 1 commit intoros:hydro-develfrom
RainCT:param_sockets
Closed

Use HTTP1.1: we get persistent connections for free#371
RainCT wants to merge 1 commit intoros:hydro-develfrom
RainCT:param_sockets

Conversation

@RainCT
Copy link
Copy Markdown

@RainCT RainCT commented Mar 5, 2014

Hi,

Currently rospy creates a new socket for every single request (eg. getParam call). Since sockets stay around for a while in CLOSE_WAIT state, this can become a serious problem.

Python's xmlrpc actually supports persistent connections. All that's needed is configuring SimpleXMLRPCRequestHandler to use HTTP 1.1 (vs. 1.0) and it'll automatically keep connections open and re-use them for subsequent calls.

With this you go from:

$ sudo netstat -atpn | grep 11311 | wc -l
1434
$ python -c "from rosmaster import util; s = util.xmlrpcapi('http://localhost:11311'); [s.getParam('foo', 'bar') for i in range(1000)]"
$ sudo netstat -atpn | grep 11311 | wc -l
2449

to:

$ sudo netstat -atpn | grep 11311 | wc -l
1728
$ python -c "from rosmaster import util; s = util.xmlrpcapi('http://localhost:11311'); [s.getParam('foo', 'bar') for i in range(1000)]"
siegfriedgevatter@thailand:svn:
$ sudo netstat -atpn | grep 11311 | wc -l
1732

@RainCT
Copy link
Copy Markdown
Author

RainCT commented Mar 5, 2014

I've tested this change on a real robot (two in-build computers, Python and C++ nodes, RViz, tablet & web UIs using rosbridge), everything seems to be working fine.

@dirk-thomas
Copy link
Copy Markdown
Member

Thank you for contributing. That looks like a nice benefit.

I will merge this into the indigo-devel branch though since that is the upcoming ROS distro.

In the future we might consider it for backporting to older ROS distros if there is need for it and it has worked fine for indigo for a while.

@dirk-thomas
Copy link
Copy Markdown
Member

Merged into indigo-devel: 53a2213

@dirk-thomas dirk-thomas closed this Mar 5, 2014
@dirk-thomas
Copy link
Copy Markdown
Member

I rolled back the commit for now since this changes introduces a serious regression. Python scripts do not react to Ctrl+C anymore (under certain conditions). The following scenario should reproduce the problem:

  • roscore (in terminal 1)
  • rosrun rospy_tutorials talker (in terminal 2)
  • rosrun rospy_tutorials listener (in terminal 3)
  • Ctrl+C the talker (in terminal 2)
  • Ctrl+C the listener (in terminal 3) the script will not exit anymore with this patch applied
  • Ctrl+C the roscore (in terminal 1 and the previously Ctrl+C-ed listener will terminate)

@dirk-thomas dirk-thomas reopened this Mar 5, 2014
@RainCT
Copy link
Copy Markdown
Author

RainCT commented Mar 5, 2014

Weird, the only thing I noticed is roscore taking a couple seconds to die on Ctrl+C.

I'll look into it.

@dirk-thomas
Copy link
Copy Markdown
Member

It does not hang forever for you? I had to kill the process.

@RainCT
Copy link
Copy Markdown
Author

RainCT commented Mar 6, 2014

Nope, I can't reproduce the problem with the {talker, listener}, the only change I notice is that the roscore sometimes takes a few seconds to stop if there are nodes talking.

Tested with Hydro and Python 2.7.3 on Ubuntu Precise.

@dirk-thomas
Copy link
Copy Markdown
Member

Did you also Ctrl-C the roscore or only the talker and listener?

@RainCT
Copy link
Copy Markdown
Author

RainCT commented Mar 17, 2014

Yes, IIRC I tried killing all three (and in different orders).

@dirk-thomas
Copy link
Copy Markdown
Member

And what is the behavior for you if you are not killing the roscore at all - and the other ones in the exact order as mentioned above?

@dirk-thomas dirk-thomas added this to the untargeted milestone May 2, 2014
@dirk-thomas
Copy link
Copy Markdown
Member

@RainCT Is it likely that you continue the work on this PR or should it be closed for now?

@RainCT
Copy link
Copy Markdown
Author

RainCT commented Jun 9, 2014

I couldn't reproduce the problem you mentioned.

@dpinol
Copy link
Copy Markdown

dpinol commented Jul 22, 2014

Dirk, did you try this only in indigo, or only in hydro?
I guess Siegfried was trying only in hydro

@dpinol
Copy link
Copy Markdown

dpinol commented Jul 22, 2014

This is what I get when I break with CTRL-C. Sometimes it lags a few seconds, but it always eventually dies.
Would it make sense to check if socket is none at https://github.com/ros/ros_comm/blob/indigo-devel/clients/rospy/src/rospy/impl/tcpros_base.py#L806 (line 806)?

[ERROR] [WallTime: 1406043537.699333] Traceback (most recent call last):
  File "/opt/ros/hydro/lib/python2.7/dist-packages/rospy/topics.py", line 305, in close
    c.close()
  File "/opt/ros/hydro/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 778, in close
    self.socket.close()
AttributeError: 'NoneType' object has no attribute 'close'

@dirk-thomas
Copy link
Copy Markdown
Member

I just tried it again (this time on Trusty with Indigo) with the exact steps posted here: #371 (comment)

And the problem is still there: terminal 2 does not terminate after pressing Ctrl+C (only after Ctrl+C the first terminal).

@dpinol
Copy link
Copy Markdown

dpinol commented Jul 23, 2014

You're completely right. You're scenario hangs on my precise+hydro also.
I'm having a look at:
https://codereview.appspot.com/73041/show
http://zxywpower.wordpress.com/2008/07/14/xmlrpc-hang-issue-with-python-client-and-xmlrpc-cc-server/
In case I could find a fix, are we still in time to push it for indigo-devel, or is it too late?

thanks

@dirk-thomas
Copy link
Copy Markdown
Member

The Indigo release is done. But with a lot of testing it can still be added in a patch release. But that depends what is necessary to fix the problem.

@dpinol
Copy link
Copy Markdown

dpinol commented Aug 5, 2014

I think the problem is that the listener will not break until the 1 minute timer which monitors the connection detects that the talker is dead. That's why sometimes it breaks immediately, and some times it takes a long time to break. Dirk, did you ever wait more than 1 minute for the listener to die after pressing CTRL-C?
Reading the links above, it looks like in http 1.1 it's the client's responsibility to close the connection, and not the server's, as it was in 1.0; but I'm still not sure how to patch it.

@dirk-thomas
Copy link
Copy Markdown
Member

No, I never waited that long. Without the patch the program exits immediately and that's the behavior which must be preserved.

@dirk-thomas
Copy link
Copy Markdown
Member

This is superseded by #610.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants