fix: agent leaves interrupted operations in non-final state #3210
Conversation
Robot Results
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files📢 Thoughts on this report? Let us know! |
|
The test "Update tedge version from base to current using Cumulocity" failed for unrelated reasons. Not obviously flaky though: |
| .load_pending_commands(pending_commands) | ||
| { | ||
| // Make sure the latest state is visible over MQTT | ||
| self.mqtt_publisher |
There was a problem hiding this comment.
Should we limit this republishing only for commands in their terminal state? If not, there's the risk of the even the intermediate command states like scheduled or executing getting resent, risking duplicate execution of the same logic by any external component listening on them.
There was a problem hiding this comment.
Though there isn't any official contract that says these messages will only be published exactly once. But generally such listeners are not doing control (as control logic should be in the workflow itself), so triggering twice on a status change should not be a problem.
There was a problem hiding this comment.
Just using qos 1 is enough to receive a duplicate message (without publishing the status change twice).
There was a problem hiding this comment.
Should we limit this republishing only for commands in their terminal state?
This would introduce the same issue on intermediate states (.e.g the "executing" state being never published).
If not, there's the risk of the even the intermediate command states like
scheduledorexecutinggetting resent, risking duplicate execution of the same logic by any external component listening on them.
The agent protects itself from reacting to messages of which it is the publisher.
Other components, such as the mappers, should be prepared to receive a status twice.
Just using qos 1 is enough to receive a duplicate message (without publishing the status change twice).
It's also enough to restart, the messages being retained. If the c8y mapper stop after sending an EXECUTING 501 message and restart before the command is successful, a second EXECUTING 501 will be sent.
There was a problem hiding this comment.
Just using qos 1 is enough to receive a duplicate message (without publishing the status change twice).
I have always assumed that users would send critical messages like commands only with QoS 2, as they might expect once-and-only-once execution for such things. But yeah, if the users are already expected to be resilient to duplicate delivery, esp since most cloud platforms don't support QoS 2 either, I guess this is okay.
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
ebe374d to
24f870f
Compare
Proposed changes
Types of changes
Paste Link to the issue
Reproduce #3149
Checklist
cargo fmtas mentioned in CODING_GUIDELINEScargo clippyas mentioned in CODING_GUIDELINESFurther comments