Make CLI more robust to discovery latency.#494
Conversation
fac0a49 to
02c8d4a
Compare
| ) | ||
| try: | ||
| list_nodes_client.wait_for_service() | ||
| if not list_nodes_client.wait_for_service(timeout_sec=5.0): |
There was a problem hiding this comment.
When used in a loop (as in component list) a blocking timeout of 5s can add up to a very long time for the user call.
There was a problem hiding this comment.
Fair enough. Timeout is a somewhat arbitrary, larger than zero quantity. What about 2.0 seconds?
There was a problem hiding this comment.
I don't mind the value. For this PR you can choose whatever.
The problem is that with the current API all waits happen sequentially. E.g. if the service of N nodes isn't available the CLI will hang for N * timeout. I would prefer that the CLI only blocks for timeout seconds (whatever that value is). But that is pretty much our of scope for this PR.
There was a problem hiding this comment.
Please create a follow up ticket for this.
ros2topic/ros2topic/verb/echo.py
Outdated
| qos_profile = qos_profile_from_short_keys( | ||
| args.qos_profile, reliability=args.qos_reliability, durability=args.qos_durability) | ||
| if args.message_type is None: | ||
| with NodeStrategy(args) as node: |
There was a problem hiding this comment.
This potentially creates a second direct node (beside the one created a few lines below) which doesn't seem like a good idea.
Same pattern below in multiple cases (even though they preexisted before).
There was a problem hiding this comment.
That is true, you might end up creating node, destroying it and creating another one. We do want to use the daemon if available though.
A solution could be to extend NodeStrategy's API to ensure you're dealing with a direct node in case you have to. Something like a force_direct() method (or a context manager?). WDYT?
There was a problem hiding this comment.
A solution could be to extend NodeStrategy's API to ensure you're dealing with a direct node in case you have to.
What is the difference to a DirectNode then?
There was a problem hiding this comment.
I think this should create direct node, and conditionally if a daemon is running query some information from that to not have to wait for "complete" discovery for the direct node.
There was a problem hiding this comment.
What is the difference to a
DirectNodethen?
That it'd use the daemon outside the scope where you actually need it to be direct, e.g.:
with NodeStrategy(args) as node:
# ...
with node.force_direct():
node.subscribe()I think this should create direct node, and conditionally if a daemon is running query some information from that to not have to wait for "complete" discovery for the direct node.
That sounds a lot like replicating NodeStrategy's implementation N times. I'd rather improve it in a single place.
There was a problem hiding this comment.
See #499 for an alternative.
This looks much better 👍
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
|
Just wanted to chime in and mention that this patch resolves the cli instability for me across multiple hosts that I've been seeing for a long time. I wonder if the same changes need to be made to the ros2bag package? |
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
|
Running CI once again, got linter test failures due to recent |
|
I might also suggest a full CI run on Linux, just to see what the system_tests stuff looks like with this in place. |
Which part of |
I'm honestly not sure, but this is a big enough change that it makes me wary of the overnights. I'd prefer to see what it does on a full run, as I'd like not to regress at this point (we are actually getting pretty close to green for Linux CI, for instance). |
A more complete run always makes sense. But also only for downstream packages which actually depend on something in the |
|
@clalancette @dirk-thomas all test failures here are unrelated to this patch. I've sent separate PRs to fix flake8 errors and |
dirk-thomas
left a comment
There was a problem hiding this comment.
I am fine with it as is.
It would be good to investigate why the test failure happened in the PR build.
It's a strange one, I cannot reproduce locally. It doesn't show on nightlies either. |
|
In dashing 0.7.11 version, 'Could not determine the type for the passed topic' also happens. |
|
@aitazhixin I don't think these changes were backported to Dashing. This patch (and connected ones) only extends and improves API, but I will say that anything bigger than a bugfix is less likely to be considered for a backport. @nuclearsandwich thoughts? |
|
@hidmic so, I need a version which is bigger than 0.9.3? |

Split from #421. This pull request builds on top of #493 and ensures the daemon gets used throughout all CLI tools. It also updates some tests which showed to be flakey nonetheless (with and without this patch).