Change the starting time of the goal expiration timeout#1121
Conversation
|
This patch doesn't introduce new interfaces. As a result, the current modification is not very clear for setting goal terminal timestamp. |
There was a problem hiding this comment.
How is goal_handle->impl->goal_terminal_timestamp set initially?
If my understanding of your code is correct (I don't live in rcl) when calling _recalculate_expire_timer, which I assume is periodically called during an action server's normal operations, we reset to now the clock ticking down when its expired.
I don't understand the goal_terminal_timestamp == INT64_MAX condition then --> shouldn't we always be resetting the clock if the action is still active? That would make the most sense to me; while the goal is still actively being processed, we stave off the timeout to remove it by another full timeout length. Then when the last update / sending of the result comes so we don't call it anymore, the full countdown is considered before removal.
fujitatomoya
left a comment
There was a problem hiding this comment.
i have a few comments to confirm.
First, let me explain when _recalculate_expire_timer() is called.
I use scenario 1 to set goal_terminal_timestamp.
rcl/rcl_action/src/rcl_action/action_server.c Line 447 in a17bc95
|
fujitatomoya
left a comment
There was a problem hiding this comment.
lgtm, i will try CI.
In this case, why is there any timer still set for expiring the goals? That seems like the root cause of the problem; our goals are being expired before they're done due to a terminal condition if they're long-running. In this design, that seems to me should be completely removed and this check in terminal states (success or cancel or abort) is all that should be required. This is where the timer to count down the expiration of the goal handle should be initiated. Alternatively, the timer for expiring the next goal should be adjusted during action server operations from |
|
@Barry-Xu-2018 can you rebase your development branch? it is failing build https://ci.ros2.org/job/ci_linux/19950/console since it does not include some changes in the mainline. Barry-Xu-2018/rcl@review/topic-use-goal-terminal-timestamp-to-expire-check...ros2:rcl:rolling |
62dd16f to
0a87c6d
Compare
Rebase was done. @fujitatomoya |
Sorry, I'm not quite clear about the scenarios you described. Could you please describe them again? #1121 (comment)
|
|
There is a failure in windows test. |
|
@sloretz @iuhilnehc-ynos either of you, can you take a look at this? i would like to have another review on this. |
|
Your two situations this method is called:
I'll shorthand this as achieving a terminal state
I'll shorthand the timer for goal expiring as "the ticking clock" My assertion is that you found a clever place to make modifications to the goal handle's state: at achieving the terminal state. We know that once we achieve this state, the goal is done processing and anytime before this point, the goal is still processing (or was dropped on the floor from a users' crappy application, which no matter what is going to cause problems). If we have a place (e.g. here) we can modify a ticking clock, that calls into question the need for the ticking clock prior to this point at all. The issue we want to solve is that ticking clock expiring still-processing goals -- thus why even have the ticking clock have the ability to expire still-processing goals? I'd argue that it shouldn't even be possible. We should be waiting until the terminal state to trigger the clock to start its countdown until the handle / result are deleted. Of which, the default of 10s or what-have-you is totally appropriate and fully solves the problem. It also comes with the added benefit of simplifying the action logic since there (1) doesn't need to be a ticking clock checked while the goal is processing and (2) the modification you require don't need to reset it, it merely needs to start it |
|
Thank you for the further explanation.
Your current question revolves around "the ticking clock", but I haven't fully caught your idea yet. For "the ticking clock", while the timer is triggered (At 2 in above diagram), the next expire time need to be calculated.
There are currently two time points where "the ticking clock" is checked. One is when a goal enters the termination state, and the other is when the ticking clock timer is triggered. When you mentioned "the goal is processing," in what situation does that occur? If expire timer has been run, a goal enters terminal state to call _recalculate_expire_timer(). At this case, calculating and setting the timer is an unnecessary operation. |
That's a question you're most suited to answer than me. With the current behavior before this PR, where was the goal being expired resulting in the bug that this is meant to address (which was expiring still-processing goals causing issues)? That was happening which triggered this discussion. My suggestion is removing entirely where-ever that is happening if we're able to start the count down to removing a goal at a terminal state. You've shown the part of the code where that terminal state is found to be able to reset a timer. Instead, we could just start it here and remove the previous problematic countdown that previously caused us issues. I ack that perhaps I'm missing something and that my suggestion isn't fully informed with the ins-and-outs of rcl. What I'm trying to suggest at its core is this, regardless of the specific details:
To my eye, this PR resets some timeouts, but still leaves the original timeout in place that caused the problem in the past - which could still cause a problem moving forward as well if you have a timeout of 10s but an action length of 10 minutes (which, we have). |
|
Thank you for detailed explanation. This PR ensures that the current time is recorded as the start time for expiration only when the goal enters the terminal state. The expire timer is set based on this time. Therefore, with the current modifications, it is impossible for the goal to expire while being processed. In rcl_action_expire_goals(), it will get time of goal termination by calling rcl_action_goal_handle_get_goal_terminal_timestamp() to check if expiration occurs. |
|
OK, got it. That makes sense, you can ignore my comments then :-) This is a very important fix for us in Nav2 so I wanted to make sure beyond the shadow of my ignorance that this was going to fully resolve the problem 😉 I very much appreciate your time here! |
|
@Barry-Xu-2018 since you added the fix from last CI, i started the full CI. @SteveMacenski thank you for taking your time for the review! |
|
What's the good word? |
|
we would like to have 2nd review on this. |
|
I will check failed test cases. |
|
Commit a07e119 is wrong. |
This reverts commit a07e119. Signed-off-by: Barry Xu <barry.xu@sony.com>
|
@Barry-Xu-2018 please consider adding some comments around that code that we thought was not needed but broke the tests when removed, so that in the future we will immediately know why we need it. |
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
3ac0079 to
959f39b
Compare
|
The all unstable cases are related to this commit ros2/message_filters@5551518 For Windows, it is related to this PR. |
|
All CI tests passed. |
|
Will this be backported to Jazzy/Humble? |
|
I believe adding new function in C does not break ABI compatibility, so i say we can backport this to jazzy, iron and humble. @clalancette @alsora what do you think? |
|
This is really great news, I know there's alot of interest in Nav2 users for the backport to remove that timeout! From my review as well, it looks backportable |
|
@Barry-Xu-2018 @fujitatomoya can this be backported to Jazzy? |
|
This fix is API/ABI compatible, i think we can backport this to jazzy. (technically for humble too, but maybe code base could be different.) |
|
@Mergifyio backport jazzy |
✅ Backports have been createdDetails
|
Signed-off-by: Barry Xu <barry.xu@sony.com> (cherry picked from commit 30a047a)

Address one issue mentioned #1103
The expire timeout is being calculated from the time the goal was accepted. According to the design https://design.ros2.org/articles/actions.html#result-caching, it should be calculated from the time the goal is completed.