Conversation
This test was trying to talk to the fibonacci action server before it was ready to be talked to. Fix the test by having hte fibonacci action server write to stdout when it's ready and having the test wait for the stdout to start Signed-off-by: Pete Baughman <peter.c.baughman@gmail.com>
| strict=False | ||
| ) | ||
|
|
||
| @launch_testing.markers.retry_on_failure(times=5) |
There was a problem hiding this comment.
I believe this was a workaround for the race condition that I've fixed above and is no longer necessary
| # Spend up to 2 seconds doing all the work we need to have the fibonacci | ||
| # action server actually talking over DDS before signaling to the test it's | ||
| # OK to start | ||
| for _ in range(20): |
There was a problem hiding this comment.
There may be a way to spin_until_future_complete here, but I just don't know. Is there a future I can wait on that indicates the action server is up, ready, and can be communicated with by other processes before I print out 'fibonacci_action_server has started' ??
|
. . . Now i'm seeing failures locally with rmw_cyclonedds_cpp on my branch and also when I build/test using 'master'. . . It feels totally impossible to make any progress 😭 Test failures on my machine on 'master': |
|
@hidmic Side-note: I'm not sure if this is being tracked anywhere, but the pytest plugin for launch tests makes it really hard to see what has failed here. Instead of the results showing 9 test cases x however many DDS implementations there are, we just get one failing test called test_cli.py and you have to cat the logs to see what actually happened. You can't figure it out programmatically. |
|
Looks like rmw_connext_cpp is also failing on my machine when I try to build/test the master branch: Does mean anything to anybody? |
|
@sloretz Maybe we can run CI on this and see if we get lucky? I'm so sorry about all of the bad CI jobs we're spinning up for this issue. This package is extremely difficult for me to wrangle. The changes here depend on ros2/launch#386 which was just merged, so we might need to wait for those changes to propegate into CI |
I don't think this is the right way to fix this test. We couldn't recommend the same approach to a user trying to use actions in their own code. |
|
@dirk-thomas Can you suggest something different? The Fibonacci action server program was created specifically as a test fixture for this test, and must provide some mechanism to indicate that it has finished starting up or else the test will race and fail.
. . . but the current test starts the action server in a process and attempts to communicate with it before it's finished starting up with no feedback whatsoever and just fails with a cryptic error sometimes. Would you recommend that approach to a user? |
|
@dirk-thomas Would you accept a PR where we disable the test because it's broken? That would satisfy my need to get this PR unstuck |
|
@dirk-thomas Can you please further clarify your objection to this fix? The code in question is a test fixture meant to facilitate the testing of the ros2 actions cli tool. In order for the tests to start, the test fixture must be up and running. The mechanism I've used here is a message printed over stdout, which I don't think is very unreasonable. The test fixture in question is not meant as a demo or a suggestion to any user about how to create a ROS2 action server - it's a dummy process for the test to run against. I would be happy to use some other synchronization method suggested, but thus far no suggestion has been offered. I see that this issue is also causing CI failures in this PR as well as my other PR. If we can't agree on a way to improve the test, I suggest disabling it so we can make progress elsewhere. |
|
@pbaughman I spoke with @wjwwood offline about this issue and we agree that the correct thing to do in this situation (with the CLI daemon) is to retry the affected tests until discovery happens. I believe the bug in this case is that the |
|
As an alternative to this PR, I've updated all the tests involving the CLI daemon in #459. |
|
@jacobperron Sure. My only dissapointment is that the whole point of having a "Ready To Test" action as part of the launch description was that the test author would be forced to think about things like this and wouldn't have tests that try to run before all of their preconditions are met. This was a big problem with test actions in launch files using Having said that - let's do whatever we need to do to unstick this. |
|
@pbaughman I agree with the sentiment, but I think these CLI tests are an edge case because of the underlying daemon that they use. For example, in a different scenario where we have access to the node objects we are testing, it's more feasible to condition a launch event on the nodes discovering each other. |
Well, this is still an improvement. It just so happens that the thing we're testing for is the precondition. We could write it like this:
But instead we're writing it like this:
Other tests will be able to do:
|
|
Also, we could additionally (to what @jacobperron proposes in #459) wait for the server to be available (using an action client), but that would influence the output of AFAICT, the only reason this pull request helps is that it takes some time to complete your new code in the action server and that allows discovery to happen. If you artificially added a lag to the daemon so it did not report new discovery information until sometime in the future you'd still see this problem with this pull request (imo). |
|
@wjwwood just an aside:
I had hoped to write some launch_testing event handlers to make it easy to wait for a certain topic to be available or for data to appear on a certain topic - as these are common things you want to wait for before starting a test, but I never got the chance when I was contracting at Apex. I'm hopeful that they or someone else will want to make that improvement at some point. |
|
Closed in favor of #459 |
This test was trying to talk to the fibonacci action server before it
was ready to be talked to.
Fix the test by having the fibonacci action server write to stdout when
it's ready and having the test wait for the stdout to start
I believe this was the cause of the test failures in #376 but I've found the ros2cli tests are very fragile, so I'd like to try to merge this separately so there aren't confounding factors if the tests continue to fail.
How to reproduce the original race/failure:
If you delay the Fibonacci action server program's startup by adding
right before
mainis called here. . . and also remove the retry_on_failure here you could make
test_fibonacci_infofail every time.Signed-off-by: Pete Baughman peter.c.baughman@gmail.com