Skip to content

Preserve workflow failure reasons if set by a state #3582

@reubenmiller

Description

@reubenmiller

Is your feature improvement request related to a problem? Please describe.

When using workflows, if the operation is set to failed, then the default failure reason generally isn't very helpful as it it just includes the exit code from the last state, and it overrides any more meaningful reason already set in previous states. Below shows an example of the generic error reason which overrides any existing reason.

/usr/bin/rugpi_workflow.sh returned exit code 1

Below shows a more detailed failure reason was given (set via the begin/end-tedge blocks), but it is overridden with the generic reason, below shows the snippet from the workflow log where the reason is set.

:::begin-tedge:::
{"reason":"Refusing to install update as the current (hot) partition is not the default partition. This indicates that you may be in the middle of an update. Please reboot to switch to the default partition"}
:::end-tedge:::

Having a more descriptive failure reason helps users understand more quickly what the problem is without having to download the full workflow log (as the failure reason is generally visible in the Cumulocity Device Management Application).

Describe the solution you'd like

The tedge-agent should only use the generic "x returned exit code y" when an existing .reason property is not already set. This would allow the workflow to define it's own more detailed failure reason.

Describe alternatives you've considered

Additional context

The full workflow log taken from the Rugix firmware update workflow which shows a non-trivial process where a failure reason was set by the failing workflow, but not propagated back to the UI as it was overridden.

==================================================================
Triggered firmware_update workflow
==================================================================

topic:     te/device/main///cmd/firmware_update/c8y-mapper-45569495
operation: firmware_update
cmd_id:    c8y-mapper-45569495
time:      2025-04-27T18:43:50.748997844Z

==================================================================

----------------------[ firmware_update @ init | time=2025-04-27T18:43:50.763384777Z ]----------------------

State:    {"logPath":"/var/log/tedge/agent/workflow-firmware_update-c8y-mapper-45569495.log","name":"tedge-raspios-arm64","remoteUrl":"https://repo-thinedgeio.s3.us-east-1.amazonaws.com/tedge-raspios-arm64/tedge-raspios-arm64_20250427.1105.img.xz","status":"init","tedgeUrl":"https://repo-thinedgeio.s3.us-east-1.amazonaws.com/tedge-raspios-arm64/tedge-raspios-arm64_20250427.1105.img.xz","version":"20250427.1105-img-aws"}

Action:   move to scheduled state

=> moving to firmware_update @ scheduled

----------------------[ firmware_update @ scheduled | time=2025-04-27T18:43:50.839291588Z ]----------------------

State:    {"logPath":"/var/log/tedge/agent/workflow-firmware_update-c8y-mapper-45569495.log","name":"tedge-raspios-arm64","remoteUrl":"https://repo-thinedgeio.s3.us-east-1.amazonaws.com/tedge-raspios-arm64/tedge-raspios-arm64_20250427.1105.img.xz","status":"scheduled","tedgeUrl":"https://repo-thinedgeio.s3.us-east-1.amazonaws.com/tedge-raspios-arm64/tedge-raspios-arm64_20250427.1105.img.xz","version":"20250427.1105-img-aws"}

Action:   move to executing state

=> moving to firmware_update @ executing

----------------------[ firmware_update @ executing | time=2025-04-27T18:43:50.90725353Z ]----------------------

State:    {"logPath":"/var/log/tedge/agent/workflow-firmware_update-c8y-mapper-45569495.log","name":"tedge-raspios-arm64","remoteUrl":"https://repo-thinedgeio.s3.us-east-1.amazonaws.com/tedge-raspios-arm64/tedge-raspios-arm64_20250427.1105.img.xz","status":"executing","tedgeUrl":"https://repo-thinedgeio.s3.us-east-1.amazonaws.com/tedge-raspios-arm64/tedge-raspios-arm64_20250427.1105.img.xz","version":"20250427.1105-img-aws"}

Action:   /usr/bin/rugpi_workflow.sh executing

Exit status: 1 (ERROR)

stderr (EMPTY)

stdout <<EOF
:::begin-tedge:::
{"reason":"Refusing to install update as the current (hot) partition is not the default partition. This indicates that you may be in the middle of an update. Please reboot to switch to the default partition"}
:::end-tedge:::
EOF
=> moving to firmware_update @ failed

----------------------[ firmware_update @ failed | time=2025-04-27T18:43:52.39318117Z ]----------------------

State:    {"logPath":"/var/log/tedge/agent/workflow-firmware_update-c8y-mapper-45569495.log","name":"tedge-raspios-arm64","reason":"/usr/bin/rugpi_workflow.sh returned exit code 1","remoteUrl":"https://repo-thinedgeio.s3.us-east-1.amazonaws.com/tedge-raspios-arm64/tedge-raspios-arm64_20250427.1105.img.xz","status":"failed","tedgeUrl":"https://repo-thinedgeio.s3.us-east-1.amazonaws.com/tedge-raspios-arm64/tedge-raspios-arm64_20250427.1105.img.xz","version":"20250427.1105-img-aws"}

Action:   wait for the requester to finalize the command

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions