Skip to content

Fix the flaky dynamic params test thats been causing CI to fail#568

Merged
13 commits merged intomasterfrom
unknown repository
Mar 6, 2019
Merged

Fix the flaky dynamic params test thats been causing CI to fail#568
13 commits merged intomasterfrom
unknown repository

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Feb 15, 2019

NOTE: I recommend reviewing this commit by commit rather than as one big PR
This is quite a big PR, but I cleaned up the commit history, so you can review it commit by commit. Each commit is an independent change if you follow it from oldest to newest. The newer ones build on the older ones.


Basic Info

Info Please fill out this column
Ticket(s) this addresses partially addresses #556
Primary OS tested on Ubuntu
Robotic platform tested on NA

Description of contribution in a few bullet points

  • Fixes dynamic params test that has been failing in CI. The test was failing because the "helper" nodes were not ready yet at the time the test started. I fixed this by converting the "helper" nodes to lifecycle nodes and waiting in the test till they were activated.

  • This refactors and adds a bunch of support code, so this mechanism can be easily used on other tests that need it


@ghost ghost closed this Feb 15, 2019
Carl Delsey added 10 commits February 14, 2019 17:01
The ServiceClient class will be used Lifecycle helper functions that
will be put in nav2_util, so we need it there. It seemed more of utility
class than a Task anyhow.
Add some helper functions for generating node names. These will be used
to fix some potential problems with the ServiceClient and the upcoming
LifecycleServiceClient.
ServiceClient reuses the service name as the temporary node name.
However, service names frequently have / characters in them and
that is invalid for a node name. Also, it seems likely we'll have
multiple clients using the same service. This would result in a name
collision. This now adds a random suffix to the internal node name.
ServiceClient can now use a supplied node instead of always creating
it's own. This allows us to create a higher level service client that
interacts with multiple services using just one generated node.
The unit tests will be handled at a higher level by a subsequent checkin
I need to add another function to it, and it could stand some
restructuring before hand.
Adding a function to split a string at delimiters. The intent is to
use this later so tests can get ':' seperated string of lifecycle nodes
from and environment variable, and spin them all up.
@ghost ghost reopened this Feb 15, 2019
@ghost
Copy link
Copy Markdown
Author

ghost commented Feb 15, 2019

Sorry for all the noise on this PR. It seems Github doesn't present commits in the order they are in the branch. Instead it shows them according to "author date".

@ghost ghost requested review from mhpanah, mjeronimo and orduno February 15, 2019 01:05
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 15, 2019

Codecov Report

Merging #568 into master will increase coverage by 0.13%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #568      +/-   ##
==========================================
+ Coverage    31.9%   32.04%   +0.13%     
==========================================
  Files         220      230      +10     
  Lines        9070     9206     +136     
  Branches     3044     3112      +68     
==========================================
+ Hits         2894     2950      +56     
- Misses       4288     4310      +22     
- Partials     1888     1946      +58
Impacted Files Coverage Δ
src/navigation2/nav2_amcl/src/amcl_node.cpp 36.47% <0%> (-0.37%) ⬇️
...dynamic_params/test/test_dynamic_params_helper.cpp 16.66% <0%> (ø) ⬆️
...2/nav2_tasks/include/nav2_tasks/service_client.hpp
...vigation2/nav2_util/include/nav2_util/strutils.hpp
...n2/nav2_util/include/nav2_util/lifecycle_utils.hpp 50% <0%> (ø)
src/navigation2/nav2_util/src/lifecycle_utils.cpp 42.85% <0%> (ø)
...n2/nav2_util/src/lifecycle_bringup_commandline.cpp 0% <0%> (ø)
...on2/nav2_util/include/nav2_util/service_client.hpp 40% <0%> (ø)
...c/navigation2/nav2_util/test/test_string_utils.cpp 14.28% <0%> (ø)
...til/include/nav2_util/lifecycle_service_client.hpp 100% <0%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c36af6d...171159c. Read the comment docs.

@ghost ghost mentioned this pull request Feb 15, 2019
7 tasks
@ghost
Copy link
Copy Markdown
Author

ghost commented Feb 15, 2019

I'd also like feedback on whether breaking the large PR into individually reviewable commits like this worked to make the review more manageable.

#define NAV2_TASKS__COSTMAP_SERVICE_CLIENT_HPP_

#include "nav2_tasks/service_client.hpp"
#include "nav2_util/service_client.hpp"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should move this to util. Good idea.

{

/// Transition the given lifecycle nodes to the ACTIVATED state in order
void bringupLifecycleNodes(const std::vector<std::string> & node_names);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is here as a convenience to quickly get the lifecycle nodes to the ACTIVE state. However, it doesn't have the corresponding shutdownLifecycleNodes. Also, it seems a bit like test code rather than general code to be used with lifecycle nodes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I wanted to make it super easy to write a test that uses lifecycle nodes, because I expect we will want to do that a lot to improve test stability.

I also didn't want to invest too much time into managing lifecycle states until you'd had a chance to look at this. I figured you probably had a better way to do this in the work you'd been doing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was went through this to separate test code from production code, I found I didn't have any test code to put in the library, so I'm going to defer creating that library.

This bringupLifecycleNodes function ended up being used by the command line utility I created, which I consider production code as opposed to just test code.

explicit ServiceClient(const std::string & name)
explicit ServiceClient(
const std::string & name,
const rclcpp::Node::SharedPtr & provided_node = rclcpp::Node::SharedPtr())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that the ROS2 code uses SharedPtr and not SharedPtr&. Seems to be that if one is going to hold on to this value using a local node_ member, it should be reference counted.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that when we do the copy assignment of the reference to the member variable at node_ = provided_node;, that is when the reference will get incremented.

By passing in a reference, we only avoid adding a temporary reference in the function parameter itself.

In other words, by passing the shared ptr by value instead of by ref, we create the parameter provided_node and increment the reference count. Then assign it to the member variable and increment the reference count again. When the function returns the parameter is destroyed and the reference counted is decremented again.

We'd end up taking the shared ptr lock an extra two times for the sole purpose of the temporary parameter object.

LifecycleServiceClient sc(node_name);
sc.ChangeState(lifecycle_msgs::msg::Transition::TRANSITION_CONFIGURE);
sc.ChangeState(lifecycle_msgs::msg::Transition::TRANSITION_ACTIVATE);
while (sc.GetState() != lifecycle_msgs::msg::State::PRIMARY_STATE_ACTIVE) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Configure or Activate failed, this code will be stuck in the loop. Perhaps instead of looping until ACTIVE the code could simply ensure that each state transition was successful (and therefore wouldn't require the GetState at the end).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You're right. I was approaching this from the test mindset, where CMake or the user will Ctrl-C this if it takes too long.

I had to add the check for the current state being ACTIVE instead of just counting on the successful call to activate, but I don't remember why exactly. I'll retest to figure out why that was necessary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a timeout could help.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return sanitizeNodeName(prefix) + "_" + timeToString(8);
}

rclcpp::Node::SharedPtr generateInternalNode(const std::string & prefix)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be code generally available for the implementation to use or is it support code for testing (or both perhaps)? If it's testing-specific code, should we have a place for such code or is util the right place? I'm wondering if it's worth separating.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it could be used anywhere we need to create a node internal to a class without bothering the client about that detail. So it would get used by ServiceClient, maybe DynamicParamsClient, and a few other places. It wasn't meant to be just for test code.

With some of the talk on the logger exception thread about nodes have unique names, I was afraid that these classes could end up creating nodes with identical names. I figured the solution to that was to add a randomized string to the end of the name, and I was trying to encapsulate the algorithm for that in this function.

{

class globalLocalizationServiceClient : public ServiceClient<std_srvs::srv::Empty>
class globalLocalizationServiceClient : public nav2_util::ServiceClient<std_srvs::srv::Empty>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should global be uppercase?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed. As well as a bunch of other name violations in the nav2_utils package. I addressed all the functions in the main nav2_utils library, but left the libraries that came from AMCL and map server

Copy link
Copy Markdown
Contributor

@orduno orduno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. Splitting into multiple commits did help with the review.

Carl Delsey added 3 commits February 26, 2019 10:50
@ghost
Copy link
Copy Markdown
Author

ghost commented Feb 26, 2019

This change is necessary to make the dynamic params test stable, but is not enough in itself. The dynamic params test still fails sometimes when run concurrently with other tests. This will be addressed in another PR

string node_name(potential_node_name);
// read this as `replace` characters in `node_name` `if` not alphanumeric.
// replace with '_'
replace_if(begin(node_name), end(node_name),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

{

void bringupLifecycleNode(const std::string & node_name)
#define RETRY(fn, retries) \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other places in the code besides life-cycle that could benefit from this retry method. For example for Auto Localization currently I'm using the BT retry node for this purpose but we might need a simple retry method for other tasks.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll do that as a separate PR. It seems like a big enough change that it should have a focused review and not just tacked on as an addendum to this one.

Copy link
Copy Markdown
Contributor

@mhpanah mhpanah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make the retry method in the utils to be available to not just the life-cycle node.

@ghost ghost mentioned this pull request Mar 6, 2019
@ghost ghost merged commit 4f03d9d into ros-navigation:master Mar 6, 2019
mini-1235 pushed a commit to mini-1235/navigation2 that referenced this pull request Feb 5, 2025
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants