tedge-agent should not block processing commands if one command is unacknowledged by the creator

**Is your feature improvement request related to a problem? Please describe.**


Workflows processed by `tedge-agent` currently await the workflow result to be cleared by the user/component which created it.

New command requests for the same workflow are then queued until any previously in-use workflow (of the same type) has been cleared. This makes the `tedge-agent` fragile to clients who either forget to clear an existing command, which then will block other users from processing their requests.

**Problem 1: Badly behaving client which fails to acknowledge the result**

1. Client 1: Create workflow for firmware_update (cmd_id=1)
1. tedge-agent: processes firmware_update (cmd_id=1), and sets the status to "successful" or "failed"
1. Client 2: Create workflow for firmware_update (cmd_id=2)
1. tedge-agent: waits until cmd_id=1 has been acknowledged by Client 1 (and cleared) until processing cmd_id=2, but if Client 1 never acknowledges the result, then the tedge-agent will be blocked indefinitely

**Problem 2: tedge-agent fails to see acknowledgement**

If some messages are lost (which is currently the case with mosquitto > 2.0.11, <= 2.0.21), then it is possible that the clearing of the command is not seen by the tedge-agent, and therefore it will block any future commands until the tedge-agent is restarted (as on startup it will check for any existing retain messages, and reconcile the any in-progress commands). Whilst this is mainly due to existing [mosquitto bug #2618](https://github.com/eclipse-mosquitto/mosquitto/issues/2618), the problem would also exist if the MQTT server isn't configured with persistence.

Below shows the potentially problematic sequence:

1. Client 1: Create workflow for firmware_update (cmd_id=1)
1. tedge-agent: processes firmware_update (cmd_id=1), and sets the status to "successful" or "failed"
1. tedge-agent: Gets disconnected from the MQTT broker (possibly due to the device being restarted)
1. Client 1: Clears the cmd_id=1 message (tedge-agent is still disconnected)
1. tedge-agent: Reconnects to the MQTT broker, but does not receive the cleared retain message, so it will not process any future command of the same type until the tedge-agent is restarted or another client re-sends the clearing message (but this is unlikely as it would be difficult to find the correct message id)


**Describe the solution you'd like**


The solution is open up for discussion, but below are some questions to think about:

* Why should the tedge-agent care if an operation is acknowledged by the owner of the command? The tedge-agent should not clear the commands itself (at least those that didn't originate from the tedge-agent itself). The tedge-agent is responsible for doing the work, not if the work has been observed

**Describe alternatives you've considered**


**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tedge-agent should not block processing commands if one command is unacknowledged by the creator #3456

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tedge-agent should not block processing commands if one command is unacknowledged by the creator #3456

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions