[JTC] Fix time sources and wrong checks in tests by christophfroehlich · Pull Request #686 · ros-controls/ros2_controllers

christophfroehlich · 2023-06-25T18:45:40Z

I found some weird time differences when running the tests:

4: [WARN] [1687297889.461844027] [test_joint_trajectory_controller]: Aborted due goal_time_tolerance exceeding by 1687208373.175506 seconds

So I tried RCL_STEADY_TIME for all clocks and new failures popped up.

JointTrajectoryController::update: Why would we use the node's clock instead of the time argument of the method?
test_goal_tolerances_fail -> It won't hold the initial position but the last one from the goal.
test_ignore_old_trajectory -> The new message isn't accepted, so the old message keeps being processed and it will reach its last point instead of the points from the discarded message.

destogl · 2023-06-26T17:35:19Z

joint_trajectory_controller/test/test_trajectory_actions.cpp

+    EXPECT_NEAR(4.0, joint_pos_[0], COMMON_THRESHOLD);
+    EXPECT_NEAR(5.0, joint_pos_[1], COMMON_THRESHOLD);
+    EXPECT_NEAR(6.0, joint_pos_[2], COMMON_THRESHOLD);


It would be great to use variables instead of the numbers.

you are right. with the next PR ;)

* Fix time sources and wrong checks in tests * Use time from update-method instead of node clock * Readd test of last command in test_goal_tolerances_fail --------- Co-authored-by: Bence Magyar <bence.magyar.robotics@gmail.com>

gwalck · 2023-07-05T19:58:40Z

Hi @christophfroehlich I was not sure where to ask something regarding the feature branch, but since it is related to timing, and I should work from the feature branch I ask here

while writing a test for joint_limiter in JTC, I would like to use the method updateController of the test utils to run a test that does multiple updates for a certain duration.
However, it seems to me that the current code does not pass the correct period to the update.

ros2_controllers/joint_trajectory_controller/test/test_trajectory_controller_utils.hpp

Line 378 in 811ae4d

traj_controller_->update(clock.now(), clock.now() - start_time);

instead of computing the period = time delta between 2 calls of update, the time period just gets bigger and reaches wait_time finally.
I don't think this is what an update loop should do.
I would more see something like

auto previous_time = start_time;
while (clock.now() < end_time)
{
    auto now = clock.now();
    traj_controller_->update(now,  now - previous_time);
    previous_time = now;
}

more generally I was not totally convinced if it is required to use a real clock for such loops. What we want for those update loop is to test steps and how the controller / hardware state updates, not really do it in real wall clock.

There is only one place in the jtc code where a node clock is used to get the "now" for validating the trajectory msg otherwise I suppose it relies on external caller of update to pass the current time and dt.

However, I suppose when using an integrated publisher and subscriber scheme in the test with the controller, for which they include wall clock timeouts (like wait_for_trajectory based on chrono) everything else might require to really loop in real wall time.

Maybe in most other cases a loop to fake multiple update calls with faked time delta, just to get the simulated hardware to reach a certain state is sufficient and would be way faster than doing all tests with wall clock.

christophfroehlich · 2023-07-05T21:12:30Z

Hi @christophfroehlich I was not sure where to ask something regarding the feature branch, but since it is related to timing, and I should work from the feature branch I ask here

while writing a test for joint_limiter in JTC, I would like to use the method updateController of the test utils to run a test that does multiple updates for a certain duration. However, it seems to me that the current code does not pass the correct period to the update.

ros2_controllers/joint_trajectory_controller/test/test_trajectory_controller_utils.hpp

Line 378 in 811ae4d

traj_controller_->update(clock.now(), clock.now() - start_time);

instead of computing the period = time delta between 2 calls of update, the time period just gets bigger and reaches wait_time finally. I don't think this is what an update loop should do. I would more see something like
auto previous_time = start_time;
while (clock.now() < end_time)
{
    auto now = clock.now();
    traj_controller_->update(now,  now - previous_time);
    previous_time = now;
}

you are completely right here, never asked myself about that.

more generally I was not totally convinced if it is required to use a real clock for such loops. What we want for those update loop is to test steps and how the controller / hardware state updates, not really do it in real wall clock.

There is only one place in the jtc code where a node clock is used to get the "now" for validating the trajectory msg otherwise I suppose it relies on external caller of update to pass the current time and dt.

Before this PR there was one clock call more inside the update loop, but I changed it to the time argument of the method. For the timeouts inside the update method we compare it with the time sent via the trajectory messages. If the time sources don't match, you get an c++ exception.

However, I suppose when using an integrated publisher and subscriber scheme in the test with the controller, for which they include wall clock timeouts (like wait_for_trajectory based on chrono) everything else might require to really loop in real wall time.

Maybe in most other cases a loop to fake multiple update calls with faked time delta, just to get the simulated hardware to reach a certain state is sufficient and would be way faster than doing all tests with wall clock.

I think we could change that for most of the tests (being careful of the clock source of update+trajectory), but you'll get clear errors about that). Feel free to add a PR against jtc-features branch!

gwalck · 2023-07-05T21:15:35Z

I am working on it, I am currently testing my idea to see if one wins anytime

gwalck · 2023-07-05T23:23:34Z

It does not look good. Fixing the period computation is fine, but trying to fake the period (for instance 1 or 10 ms) and then call update as fast as possible with faking this time progression leads to tests segfaulting (not exceptions, crash due to access out of bounds of a string vector)

    #0    Object "/work/dev/rolling-ctrl-ws/build/joint_trajectory_controller/test_trajectory_controller", at 0x55b413a701d2, in std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator[](unsigned long)
    Segmentation fault (Address not mapped to object [0x28])

UpdateController has nothing related to time tested after it was called, and uses the time and dt passed in arguments, hence I don't understand why this should be a problem. The test that crashes is the state_topic_consistency
I need to isolate it and maybe try debug what happens. This is weird

[EDIT] Well the segfault came from some getState() that returned a null state_msg because the spin_some did not get new data. If I ensured that spin_some got at least one message, I just received the first one, because then the publisher apparently did not send more than one in my too fast update. If in my fast update I add a spin_once it does not improve.

Conclusion, we have to let time pass, with clock.now(), to have thread execute some stuff behind the scene...[/EDIT]

destogl · 2023-07-06T06:58:34Z

It is true to use clock.now() for callbacks, but we can still fake time in the update method. This means it does not matter how long it takes to get the callback, we should anyway fake the time as we need. So I recommend to remove and real clock from the test as much as possible - otherwise they will end up being very flaky.

gwalck · 2023-07-06T15:48:01Z

Currently a set of tests rely on the state_msg to read final or intermediate states of the controller. They will fail if the state_msg does not go through properly after an too fast update. Maybe other means of looking at the state (rather than through a publisher) would be better, then one could discard clocks in those updateController.

I will first provide a PR for the delta time.

* Fix time sources and wrong checks in tests * Use time from update-method instead of node clock * Readd test of last command in test_goal_tolerances_fail --------- Co-authored-by: Bence Magyar <bence.magyar.robotics@gmail.com>

* Fix time sources and wrong checks in tests * Use time from update-method instead of node clock * Readd test of last command in test_goal_tolerances_fail --------- Co-authored-by: Bence Magyar <bence.magyar.robotics@gmail.com> (cherry picked from commit 7e13d1d) # Conflicts: # joint_trajectory_controller/test/test_trajectory_actions.cpp

christophfroehlich added 3 commits June 25, 2023 18:23

Fix timesources and IMHO wrong checks in tests

4c8897a

Use time from update-method instead of node clock

ef21801

Readd test of last command in test_goal_tolerances_fail

6254ad8

github-actions bot requested review from aprotyas, arne48, bmagyar, destogl, peterdavidfagan, qiayuanl and rosterloh June 25, 2023 18:45

christophfroehlich mentioned this pull request Jun 25, 2023

[JTC] Fix tests and set_hold_position #607

Closed

christophfroehlich and others added 2 commits June 25, 2023 18:49

Cleanup

43923d9

Merge branch 'jtc-features' into fix_jtc_tests

69eb804

destogl approved these changes Jun 26, 2023

View reviewed changes

destogl merged commit 391ad84 into ros-controls:jtc-features Jun 26, 2023

christophfroehlich deleted the fix_jtc_tests branch June 26, 2023 18:46

gwalck mentioned this pull request Jul 6, 2023

Fixed period computation in test #693

Merged

christophfroehlich mentioned this pull request Jul 17, 2023

[JTC] Re-enabling test, bugfixing and hardening. Adding a parameter to define when trajectories with non-zero velocity at the end are used. #705

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JTC] Fix time sources and wrong checks in tests#686

[JTC] Fix time sources and wrong checks in tests#686
destogl merged 5 commits intoros-controls:jtc-featuresfrom
christophfroehlich:fix_jtc_tests

christophfroehlich commented Jun 25, 2023 •

edited

Loading

Uh oh!

destogl Jun 26, 2023

Uh oh!

christophfroehlich Jun 26, 2023

Uh oh!

gwalck commented Jul 5, 2023

Uh oh!

christophfroehlich commented Jul 5, 2023

Uh oh!

gwalck commented Jul 5, 2023

Uh oh!

gwalck commented Jul 5, 2023 •

edited

Loading

Uh oh!

destogl commented Jul 6, 2023

Uh oh!

gwalck commented Jul 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

christophfroehlich commented Jun 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

destogl Jun 26, 2023

Choose a reason for hiding this comment

Uh oh!

christophfroehlich Jun 26, 2023

Choose a reason for hiding this comment

Uh oh!

gwalck commented Jul 5, 2023

Uh oh!

christophfroehlich commented Jul 5, 2023

Uh oh!

gwalck commented Jul 5, 2023

Uh oh!

gwalck commented Jul 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

destogl commented Jul 6, 2023

Uh oh!

gwalck commented Jul 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

christophfroehlich commented Jun 25, 2023 •

edited

Loading

gwalck commented Jul 5, 2023 •

edited

Loading