Indigo devel fix rospy reconnect#851
Conversation
while roscpp reconnects on timeout or temporary 'No route to host' errors the rospy does it not. So on connection problems longer than 3 min the connection between subscriber and publisher goes. I added some exceptions if these occurs rospy does not close the socket and reconnect now.
since rospy reconnects on timeout now, the reconnects goes very fast in times where it easier to restart the ros node then wait for a reconnect. So I added a maximal backoff time.
on connection problems while node start the subscription to a publisher will be stopped after 3 tries. Now the reconnection is stopped on shutdown of the node.
|
Some comments for the different changes: First commit: Can you please elaborate what exceptions you expect and in which cases you want to continue retrying? Currently it looks like if the exception is not a Second commit: An upper limit for the exponential backoff is a good point. Why does the patch use two different limits instead of the same? Third commit: the current retry logic (3 times) is implemented in |
|
First commit:
Second commit: Third commit: |
|
Thank you for the explanation. Let me try to rephrase / clarify my questions / concerns: First commit:
Second commit: Third commit: |
|
ok, I see now my mistake in first commit ;-( the connection should be closed in three cases
my pull request should be changed to (+ the rational for the error codes in a comment): Second commit: Third commit: Currently and for next three weeks I am in holiday and I am not able to change the pull request. Perhaps it is easier if you make the needed changes addressed by this pull request. And then ignore this request. regards |
* added/fixed description of cases to continue retrying * set upper limit for the exponential backoff to 32 sec * moved the retrying logic from `_connect_topic_thread` to the existing place in `_connect_topic` and replaced the existing one with 3 retries
|
hi @dirk-thomas, I fixed/added the changes we discussed |
|
something goes wrong: I changed only python code and |
|
It could be a flaky test. Let me retrigger the jobs: @ros-pull-request-builder retest this please |
|
The patch looks good to me. But I would merge it into the kinetic-devel branch instead. @ros/ros_team Can you please review this PR. |
wjwwood
left a comment
There was a problem hiding this comment.
Other than some small stylistic or comment formatting related comments, lgtm.
I agree that this should be target at kinetic. Either only kinetic or at least kinetic first with an option to backport to Indigo if things go well and there is desire for it.
We should solicit testers for this change too, as I could easily imagine it changes behavior in a breaking way for complex systems or even do something like effect ctrl-c behavior of simple scripts.
| while not success: | ||
| tries += 1 | ||
| interval = 0.5 # seconds | ||
| # try to get the topic information until succes or the ROS node is shutdown. |
There was a problem hiding this comment.
"success" or probably "until successful".
There was a problem hiding this comment.
I will address this on cherry-pick.
| tries += 1 | ||
| interval = 0.5 # seconds | ||
| # try to get the topic information until succes or the ROS node is shutdown. | ||
| # -> on connections problems while ROS node start |
There was a problem hiding this comment.
What does -> mean in this case, return or exit or something else? I'd prefer an explicit comment, like "while ROS node is running try to connect, exit on connection problems" (not sure if that statement is accurate, but it is clearer than the current comment).
There was a problem hiding this comment.
I updated it to:
while the ROS node is not shutdown try to get the topic information
and retry on connections problems after some wait
|
|
||
| if self.socket is None: | ||
| # exponential backoff | ||
| if self.socket is None and interval < 30.: |
There was a problem hiding this comment.
nitpick: I know that 30. is still technically correct to force it as a float literal, but 30.0 is the style used elsewhere, so I'd appreciate updating this for consistency.
There was a problem hiding this comment.
Existing code just a few lines above uses 30. so I think its fine as is.
There was a problem hiding this comment.
Then should new lines like this:
https://github.com/ros/ros_comm/pull/851/files#diff-20362b9a78750b6859885bf4bb7e2364R452
https://github.com/ros/ros_comm/pull/851/files#diff-8582a4a4f1da6f58343838915b4a9b55R176
Use this syntax?
There was a problem hiding this comment.
I don't mind either or. The ROS 1 code base has pretty bad style and I don't see the way a float is written as important enough to spend more time on it.
|
Thank you for the patch and for iterating on it. I have cherry-picked the patch to the kinetic-devel branch: 14b5c93 |
errors the rospy does it not. So on connection problems longer than 3
min the connection between subscriber and publisher goes.
I added some exceptions if these occurs rospy does not close the socket
and reconnect now.
times where it easier to restart the ros node then wait for a reconnect.
So I added a maximal backoff time.
will be stopped after 3 tries. Now the reconnection is stopped on
shutdown of the node.