End-to-end test coverage for CLI commands output#304
Conversation
|
There's still the discussion as to whether this is enough for end-to-end testing (let alone unit tests which are not covered by this PR) or even the behavior currently shown is appropriate. We'll have to iterate on it as we move forward. |
cbbaa69 to
07a0883
Compare
|
Ok, I'll stop here for the reasons exposed in this PR's description and because this is getting larger than I'm usually comfortable with. |
dirk-thomas
left a comment
There was a problem hiding this comment.
I haven't read the full patch yet - just a few high level comments:
- The size of the tests makes me wonder if we really can't write them more compact? Is there anything we could do API / feature wise which would help reduce this?
- What is the reason that all the tests are bundled in the separate package
test_ros2cli? What is preventing them to be implemented in the package which command/verb they are testing?
test_ros2cli/package.xml
Outdated
| <test_depend>action_tutorials</test_depend> | ||
| <test_depend>ament_lint_auto</test_depend> | ||
| <test_depend>ament_lint_common</test_depend> | ||
| <test_depend>demo_nodes_py</test_depend> |
There was a problem hiding this comment.
This will create a repository level cycle since packages in the repository containing demo_nodes_py depend on packages in this repo. Imo we shouldn't do that.
There was a problem hiding this comment.
Ok, that is true for ros2run. It's another instance of an issue we have elsewhere e.g. launch_ros, and that I was consciously eluding to avoid scope creep. IMO we shouldn't be using demo nodes for system or integration testing, nor duplicate ros2/system_tests talkers, listeners, servers, etc., and others. We need a package with test nodes that ros2/ros2cli, ros2/launch*, ros2/system_tests, can all depend on. Probably in its own repo, like we did for test_interface_files.
I can put this on hold, do that and come back, or fix it afterwards. What would you rather do?
There was a problem hiding this comment.
Imo a package/repo like test_nodes sounds like a good idea.
test_ros2cli/CMakeLists.txt
Outdated
| if("${TEST_RMW_IMPLEMENTATION}" STREQUAL "rmw_connext_cpp") | ||
| # Connext startup is too slow. It needs a few extra seconds till | ||
| # discovery completes. | ||
| set(TEST_SETUP_DELAY 5.0) |
There was a problem hiding this comment.
That looks problematic because it:
- is fragile in case Connext might just take a second longer
- is slowing down the tests in case Connext happens to be faster than the threshold
- needs to be done for potentially other RMW impl
Instead we should find a way to write the test to work without this delay.
There was a problem hiding this comment.
It is problematic, I know. Truth to be told, all these tests have a delay to deal with CPU load fluctuations, uneven discovery times, etc., which makes them all flakey. I had to remove publishing/echoing rate tests because it's completely inconsistent, time wise.
Now, to fix this. Many tests don't need this delay at all e.g. ros2 msg list, so we can cut down time there. For those that do need it, I see two kinds: those that are bound to communication latency and those that are bound to discovery time.
For the first kind, we can wait for specific output or events e.g. for ros2 topic bw wait till we have any (non filtered) output to start testing it.
For the second kind e.g. ros2 topic list, well, it gets harder. Assuming the daemon is there, we may repeatedly execute the command, though that isn't really less fragile than a delay (and I can't figure how to pull that off using ros_testing as it is today). We may also wait for the ROS graph to show some nodes, but since the CLI command, the CLI daemon and launch test are three separate processes, it may not work as intended.
Alternatively, I think that we could make our CLI more predictable. Just like ros2 service call awaits for the server to be available, ros2 service list could have an optional ROS graph condition to wait for e.g. ros2 service list --when "'my_server' in nodes". The same goes for all discovery time bound commands. What do you think?
There was a problem hiding this comment.
Assuming the daemon is there, we may repeatedly execute the command, though that isn't really less fragile than a delay.
Why would that not be less fragile? It will pass the test as soon as the command returns the desired output and only runs to a maximum timeout if the command doesn't do so.
Just like
ros2 service callawaits for the server to be available,ros2 service listcould have an optional ROS graph condition to wait for e.g.ros2 service list --when "'my_server' in nodes".
This is a good example how not to add conditions imo. If you want to wait for X (services of a node being available) you can't use a condition Y (check if a node has been discovered). Simply because there is a still a race since the node might become available first shortly before the services are actually advertised.
There was a problem hiding this comment.
Why would that not be less fragile? It will pass the test as soon as the command returns the desired output and only runs to a maximum timeout if the command doesn't do so.
I meant neither option makes it not fragile. You may not delay enough or you may not wait enough. Sure, we can increase the timeout a lot to increase the likelihood of discovery to find all peers for free, but that doesn't make it not fragile. If CI nodes are running with enough load, tests may still flake.
Anyways, even if we go down that route, we lack support in launch testing to do it so we have to go back to it first.
Simply because there is a still a race since the node might become available first shortly before the services are actually advertised.
Yes, it doesn't make it not fragile either. I should've said that. I'm not well versed enough on DDS discovery mechanisms, is it atomic in any aspect e.g. some information chunks that are guaranteed to arrive in single block?
There was a problem hiding this comment.
Sure, we can increase the timeout a lot to increase the likelihood of discovery to find all peers for free, but that doesn't make it not fragile. If CI nodes are running with enough load, tests may still flake.
I don't think that is the definition of flaky. The fact that we have to define an upper bound how long we are willing to wait is fine.
What do you mean by size? These tests leverage the commonalities in the CLI commands to have single
There's currently no way to setup launch tests in pure Python packages (see ros2/launch#237). |
test_ros2cli/test/test_cli.py.in
Outdated
| OpaqueFunction(function=lambda context: ready_fn()) | ||
| ) | ||
| launch_description = LaunchDescription([ | ||
| # Always restart daemon to isolate tests. |
There was a problem hiding this comment.
Maybe instead of doing this it would be simpler to add an option --no-daemon to all the commands to not start the daemon on demand?
There was a problem hiding this comment.
Hmm, that is a good idea, but won't it result in test instances affecting each other through daemon state?
test_ros2cli/CMakeLists.txt
Outdated
| add_custom_cli_test( | ||
| test_ros2interface | ||
| CONFIG_MODULE ros2interface_test_configuration | ||
| TIMEOUT 240) |
There was a problem hiding this comment.
The sum of all timeouts is 31 min. Is that really necessary?
There was a problem hiding this comment.
It's insane. I was going to bring it up in tomorrow's meeting. And yes, it used to be half of it and tests would timeout in CI. Cutting down delays we may be able to reduce it, but I wouldn't expect a large reduction. Connext is very slow.
I was wondering why are The length of each actual test file is also super long and verbose. I don't see an obvious way to reduce that though.
I think it would be very important to make it possible to write a test function which can run a launch test. The fact that this PR moves all the tests into a separate package is a good indicator that not having that functionality is resulting in new code to be written which immediately introduces tech debt. |
If we like this way of specifying tests we may standardize it, but no, it's not provided by
I agree with that, though it's not a blocker for these tests to land. |
ivanpauno
left a comment
There was a problem hiding this comment.
I left some minimal comments.
| print(len(node_names)) | ||
| elif node_names: | ||
| print(*[n.full_name for n in node_names], sep='\n') | ||
| print(*sorted(n.full_name for n in node_names), sep='\n') |
There was a problem hiding this comment.
Why sorting them? Was the order random?
There was a problem hiding this comment.
I believe they show in discovery order, which is non-deterministic, yes.
| output.extend((' ' + yaml.dump({ | ||
| 'partial_sequence': sequence | ||
| })).splitlines()) | ||
| output.append('') |
|
|
||
| sys.path.append(os.path.dirname(__file__)) | ||
|
|
||
| from cli_test_configuration import CLITestConfiguration # noqa |
There was a problem hiding this comment.
Prefer noqa: I100 than wildcard noqa (I know that tests were doing that, and that I wrote it 😄 ).
| 'Waiting for an action server to become available...', | ||
| 'Sending goal:', | ||
| ' order: {}'.format(order), | ||
| '', |
| ' order: {}'.format(order), | ||
| '', | ||
| re.compile('Goal accepted with ID: [a-f0-9]+'), | ||
| '', |
| ), | ||
| CLITestConfiguration( | ||
| command='interface', | ||
| arguments=['show', 'std_msgs/msg/String'], |
There was a problem hiding this comment.
show should be tested with an action and a srv too.
| ), | ||
| CLITestConfiguration( | ||
| command='msg', | ||
| arguments=['show', 'std_msgs/msg/NotAMessageType'], |
There was a problem hiding this comment.
I would add a test case doing:
arguments=['show', 'std_msgs/srv/NotAMessageType']and another doing:
arguments=['show', 'std_msgs/action/NotAMessageType']Actually, I'm curious if they work correctly.
There was a problem hiding this comment.
Well, this is my result:
ros2 msg show std_msgs/action/String
string data
... it isn't working fine.
There was a problem hiding this comment.
and this one is fun:
ros2 msg show std_msgs/asd/String
string data
There was a problem hiding this comment.
@ivanpauno I can confirm interface introspection CLIs ignore the interface type namespace. I've commented out those tests, we should address that in a follow up PR.
test_ros2cli/test/test_cli.py.in
Outdated
| def @TEST_NAME@(self, proc_info, command_under_test, test_configuration): | ||
| """Test that the command under test finished in a finite amount of time.""" | ||
| success = proc_info.waitForShutdown( | ||
| process=command_under_test, timeout=test_configuration.timeout |
There was a problem hiding this comment.
waitForShutdown could take an optional keyword argument: assert: bool = False. In that way, we wouldn't need the extra if ...: assert ... here.
I think it may be useful.
Note: We have also assertWaitForShutdown, but it's less versatile than an extra argument.
Note: I won't delete the original waitForShutdown
| get_add_two_ints_server_action(service_name='_add_two_ints')], | ||
| expected_output=itertools.chain( | ||
| # Cope with launch internal ROS 2 node. | ||
| itertools.repeat(re.compile(r'/launch_ros/.*parameter.*'), 6), |
There was a problem hiding this comment.
We are checking for an exact number of lines here (6), of something that may easily change.
I don't know if it's better checking just if some lines are in the output, or checking for exact output.
This apply in many places, but I don't have an strong opinion about what's better.
Check lines in output pro: Updating the test is needed less often
Check lines in output con: Maybe, the test should have been updated (adding extra checks) after a change in other place, and it wasn't.
There was a problem hiding this comment.
Check lines in output con: Maybe, the test should have been updated (adding extra checks) after a change in other place, and it wasn't.
That's the main reason why I went for a restrictive rather than a permissive test. If we trust CLI release testing on these tests, I'm more comfortable being as precise as possible regarding how they should behave.
| ), | ||
| CLITestConfiguration( | ||
| command='topic', | ||
| arguments=['pub', '/my_ns/chatter', 'std_msgs/msg/String', '{data: test}'], |
There was a problem hiding this comment.
Should the test also subscribe to the topic and check that the message is received?
I think that checking output is not enough in this case
There was a problem hiding this comment.
That's why there's a listener, to check its output. Yes, we're trusting the listener in the process...
I think that most of the code is just test cases for parameterizing the test.
I agree with this point. It's probably a bit difficult to implement.
I did a comment about improving |
That is true @dirk-thomas. Test parameterizations are as few and compact as they can be. We simply have way to many combinations that need to be tested. But the test itself could use an API simplification. |
08d5088 to
e75d98e
Compare
e75d98e to
8a16abd
Compare
a15df96 to
7c6ae38
Compare
|
@ivanpauno @dirk-thomas rebased and ready for re-review. bf24f0d is a major refactor of this PR, so anything before that commit is likely outdated. PTAL when you have time. Splitting is also an option if you find this PR hard to grok. |
dirk-thomas
left a comment
There was a problem hiding this comment.
Overall looks good to me - I haven't reviewed every line though.
Beside the inline question about geometry_msgs how about the TODOs? Can you comment on why they are there?
|
|
||
| import sys | ||
|
|
||
| from geometry_msgs.msg import TwistStamped |
There was a problem hiding this comment.
Why using geometry_msgs for this test? Something from std_msgs would be a much lighter dependency.
There was a problem hiding this comment.
ros2 topic delay works with message types that have a header (with a stamp) and std_msgs doesn't have any. There are some in visualization_msgs, diagnostic_msgs, stereo_msgs, sensor_msgs, trajectory_msgs and nav_msgs. Would you rather switch to one of those? I don't have a strong opinion.
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| main() |
There was a problem hiding this comment.
Maybe, we could provide a package with test nodes, so they can be used from other packages.
Probably, this is out of the scope of this PR.
There was a problem hiding this comment.
I entertained that idea for a bit, but besides basic talker / listeners, fixture nodes tend to be quite specific to a test and not so easy to reuse unless you start generalizing them -- which I purposefully didn't want to.
| def get_interface_path(parts): | ||
| prefix_path = has_resource('packages', parts[0]) | ||
| if not prefix_path: | ||
| raise LookupError('Unknown package {}'.format(parts[0])) |
There was a problem hiding this comment.
Bug fixes should be done in separated PRs.
There was a problem hiding this comment.
Yes! I thought I had removed them all, looks like I didn't.
|
Considering how CI looks like atm, I don't think we'll get to merge this patch today. There are test fixture executables failing on specific platforms in a variety of ways, some even segfault in MacOS, while nightlies are all green. That means this is going to require further investigation on each platform. |
|
IMO, this can be merged after the feature freeze. It's just adding tests, no features. |
The same applies to that one. Test failures are mostly unrelated to this PR, but to other packages that make use of |
|
Alright, recent builds from CI also experienced some issues and thus the current failed builds. I'll re-trigger after solving some other failures related to |
8f2f2d1 to
6dc5d0c
Compare
|
Rebased to resolve conflicts. |
|
CI (full, against Fast-RTPS, Connext and Opensplice)
|
- action command - service command - topic command - interface command - node command - msg command - srv command Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Improved test coverage for: - ros2action - ros2service - ros2topic - ros2msg - ros2srv - ros2interface - ros2node - ros2pkg Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
6dc5d0c to
e6f5893
Compare
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
|
Alright, @ivanpauno @mjcarroll may I ask for a final review? Test failures on Linux and MacOS are all unrelated to this patch and I had to re-trigger Windows CI because it failed for unrelated reasons (yet again). I'd like to merge this ASAP to avoid further conflicts with any new changes to |
ivanpauno
left a comment
There was a problem hiding this comment.
LGTM!
It would be good to immediately follow-up in the TODOs (which are bugs unrelated to this PR that should be fixed).
|
Please, double check that all CI failures are unrelated. |
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
|
Alright, finally green! Though that means there's some flakiness that needs to be addressed. @mjcarroll may I proceed with this PR and any follow-up ones after it? |
This pull request aims to add full end-to-end test coverage to
allseveralros2CLI commands:paramruncomponentlifecycledaemonlaunchNote: this table will be updated as progress is made.Test coverage for commands that have been struck through i.e.param,component,lifecycle, etc., will not be provided here but in follow-up PRs. These commands have side effects beyond their execution lifetime that have to be checked in ways that are specific to each command. Thus, these don't lend themselves well to be integrated with the "standardized" testing device used for other commands.Connected to ros2/ros2#739.