Skip to content

[JTC] Fix time sources and wrong checks in tests#686

Merged
destogl merged 5 commits intoros-controls:jtc-featuresfrom
christophfroehlich:fix_jtc_tests
Jun 26, 2023
Merged

[JTC] Fix time sources and wrong checks in tests#686
destogl merged 5 commits intoros-controls:jtc-featuresfrom
christophfroehlich:fix_jtc_tests

Conversation

@christophfroehlich
Copy link
Copy Markdown
Member

@christophfroehlich christophfroehlich commented Jun 25, 2023

I found some weird time differences when running the tests:

4: [WARN] [1687297889.461844027] [test_joint_trajectory_controller]: Aborted due goal_time_tolerance exceeding by 1687208373.175506 seconds

So I tried RCL_STEADY_TIME for all clocks and new failures popped up.

  • JointTrajectoryController::update: Why would we use the node's clock instead of the time argument of the method?
  • test_goal_tolerances_fail -> It won't hold the initial position but the last one from the goal.
  • test_ignore_old_trajectory -> The new message isn't accepted, so the old message keeps being processed and it will reach its last point instead of the points from the discarded message.

Comment on lines +511 to +513
EXPECT_NEAR(4.0, joint_pos_[0], COMMON_THRESHOLD);
EXPECT_NEAR(5.0, joint_pos_[1], COMMON_THRESHOLD);
EXPECT_NEAR(6.0, joint_pos_[2], COMMON_THRESHOLD);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to use variables instead of the numbers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. with the next PR ;)

@destogl destogl merged commit 391ad84 into ros-controls:jtc-features Jun 26, 2023
bmagyar added a commit that referenced this pull request Jun 26, 2023
* Fix time sources and wrong checks in tests
* Use time from update-method instead of node clock
* Readd test of last command in test_goal_tolerances_fail

---------

Co-authored-by: Bence Magyar <bence.magyar.robotics@gmail.com>
@christophfroehlich christophfroehlich deleted the fix_jtc_tests branch June 26, 2023 18:46
@gwalck
Copy link
Copy Markdown
Contributor

gwalck commented Jul 5, 2023

Hi @christophfroehlich I was not sure where to ask something regarding the feature branch, but since it is related to timing, and I should work from the feature branch I ask here

while writing a test for joint_limiter in JTC, I would like to use the method updateController of the test utils to run a test that does multiple updates for a certain duration.
However, it seems to me that the current code does not pass the correct period to the update.

traj_controller_->update(clock.now(), clock.now() - start_time);

instead of computing the period = time delta between 2 calls of update, the time period just gets bigger and reaches wait_time finally.
I don't think this is what an update loop should do.
I would more see something like

auto previous_time = start_time;
while (clock.now() < end_time)
{
    auto now = clock.now();
    traj_controller_->update(now,  now - previous_time);
    previous_time = now;
}

more generally I was not totally convinced if it is required to use a real clock for such loops. What we want for those update loop is to test steps and how the controller / hardware state updates, not really do it in real wall clock.

There is only one place in the jtc code where a node clock is used to get the "now" for validating the trajectory msg otherwise I suppose it relies on external caller of update to pass the current time and dt.

However, I suppose when using an integrated publisher and subscriber scheme in the test with the controller, for which they include wall clock timeouts (like wait_for_trajectory based on chrono) everything else might require to really loop in real wall time.

Maybe in most other cases a loop to fake multiple update calls with faked time delta, just to get the simulated hardware to reach a certain state is sufficient and would be way faster than doing all tests with wall clock.

@christophfroehlich
Copy link
Copy Markdown
Member Author

Hi @christophfroehlich I was not sure where to ask something regarding the feature branch, but since it is related to timing, and I should work from the feature branch I ask here

while writing a test for joint_limiter in JTC, I would like to use the method updateController of the test utils to run a test that does multiple updates for a certain duration. However, it seems to me that the current code does not pass the correct period to the update.

traj_controller_->update(clock.now(), clock.now() - start_time);

instead of computing the period = time delta between 2 calls of update, the time period just gets bigger and reaches wait_time finally. I don't think this is what an update loop should do. I would more see something like

auto previous_time = start_time;
while (clock.now() < end_time)
{
    auto now = clock.now();
    traj_controller_->update(now,  now - previous_time);
    previous_time = now;
}

you are completely right here, never asked myself about that.

more generally I was not totally convinced if it is required to use a real clock for such loops. What we want for those update loop is to test steps and how the controller / hardware state updates, not really do it in real wall clock.

There is only one place in the jtc code where a node clock is used to get the "now" for validating the trajectory msg otherwise I suppose it relies on external caller of update to pass the current time and dt.

Before this PR there was one clock call more inside the update loop, but I changed it to the time argument of the method. For the timeouts inside the update method we compare it with the time sent via the trajectory messages. If the time sources don't match, you get an c++ exception.

However, I suppose when using an integrated publisher and subscriber scheme in the test with the controller, for which they include wall clock timeouts (like wait_for_trajectory based on chrono) everything else might require to really loop in real wall time.

Maybe in most other cases a loop to fake multiple update calls with faked time delta, just to get the simulated hardware to reach a certain state is sufficient and would be way faster than doing all tests with wall clock.

I think we could change that for most of the tests (being careful of the clock source of update+trajectory), but you'll get clear errors about that). Feel free to add a PR against jtc-features branch!

@gwalck
Copy link
Copy Markdown
Contributor

gwalck commented Jul 5, 2023

I am working on it, I am currently testing my idea to see if one wins anytime

@gwalck
Copy link
Copy Markdown
Contributor

gwalck commented Jul 5, 2023

It does not look good. Fixing the period computation is fine, but trying to fake the period (for instance 1 or 10 ms) and then call update as fast as possible with faking this time progression leads to tests segfaulting (not exceptions, crash due to access out of bounds of a string vector)

    #0    Object "/work/dev/rolling-ctrl-ws/build/joint_trajectory_controller/test_trajectory_controller", at 0x55b413a701d2, in std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator[](unsigned long)
    Segmentation fault (Address not mapped to object [0x28])

UpdateController has nothing related to time tested after it was called, and uses the time and dt passed in arguments, hence I don't understand why this should be a problem. The test that crashes is the state_topic_consistency
I need to isolate it and maybe try debug what happens. This is weird

[EDIT] Well the segfault came from some getState() that returned a null state_msg because the spin_some did not get new data. If I ensured that spin_some got at least one message, I just received the first one, because then the publisher apparently did not send more than one in my too fast update. If in my fast update I add a spin_once it does not improve.

Conclusion, we have to let time pass, with clock.now(), to have thread execute some stuff behind the scene...[/EDIT]

@destogl
Copy link
Copy Markdown
Member

destogl commented Jul 6, 2023

It is true to use clock.now() for callbacks, but we can still fake time in the update method. This means it does not matter how long it takes to get the callback, we should anyway fake the time as we need. So I recommend to remove and real clock from the test as much as possible - otherwise they will end up being very flaky.

@gwalck
Copy link
Copy Markdown
Contributor

gwalck commented Jul 6, 2023

Currently a set of tests rely on the state_msg to read final or intermediate states of the controller. They will fail if the state_msg does not go through properly after an too fast update. Maybe other means of looking at the state (rather than through a publisher) would be better, then one could discard clocks in those updateController.

I will first provide a PR for the delta time.

christophfroehlich added a commit to christophfroehlich/ros2_controllers that referenced this pull request Jul 17, 2023
* Fix time sources and wrong checks in tests
* Use time from update-method instead of node clock
* Readd test of last command in test_goal_tolerances_fail

---------

Co-authored-by: Bence Magyar <bence.magyar.robotics@gmail.com>
bmagyar added a commit that referenced this pull request Jul 17, 2023
* Fix time sources and wrong checks in tests
* Use time from update-method instead of node clock
* Readd test of last command in test_goal_tolerances_fail

---------

Co-authored-by: Bence Magyar <bence.magyar.robotics@gmail.com>
mergify bot pushed a commit that referenced this pull request Jul 17, 2023
* Fix time sources and wrong checks in tests
* Use time from update-method instead of node clock
* Readd test of last command in test_goal_tolerances_fail

---------

Co-authored-by: Bence Magyar <bence.magyar.robotics@gmail.com>
(cherry picked from commit 7e13d1d)

# Conflicts:
#	joint_trajectory_controller/test/test_trajectory_actions.cpp
mergify bot pushed a commit that referenced this pull request Jul 17, 2023
* Fix time sources and wrong checks in tests
* Use time from update-method instead of node clock
* Readd test of last command in test_goal_tolerances_fail

---------

Co-authored-by: Bence Magyar <bence.magyar.robotics@gmail.com>
(cherry picked from commit 7e13d1d)

# Conflicts:
#	joint_trajectory_controller/test/test_trajectory_actions.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants