feat: support reloading workflows at runtime#3180
feat: support reloading workflows at runtime#3180didier-wenzek merged 16 commits intothin-edge:mainfrom
Conversation
Codecov ReportAttention: Patch coverage is Additional details and impacted files📢 Thoughts on this report? Let us know! |
Robot Results
|
| ) -> Result<(OperationWorkflow, WorkflowVersion), anyhow::Error> { | ||
| let bytes = tokio::fs::read(path).await.context("Fail to read file")?; | ||
| let input = std::str::from_utf8(&bytes).context("Fail to extract UTF8 content")?; | ||
| let version = sha256::digest(input); |
There was a problem hiding this comment.
Using md5 should be enough here - as there is no crypto concerns.
eb59902 to
d0a5cde
Compare
d0a5cde to
b0454bc
Compare
tests/RobotFramework/tests/tedge_agent/workflows/long-running-command-v1.toml
Outdated
Show resolved
Hide resolved
| let Some(version) = &command_state.workflow_version() else { | ||
| return Err(WorkflowExecutionError::MissingVersion); | ||
| }; |
There was a problem hiding this comment.
I have to revert this change. Indeed, this might be a source of issue if the agent is updated whilst there is a pending operation. If the previous agent was not featuring command versions, then the resumed operation should not be rejected by the new agent: the current version of the workflow must be used.
| let dir_path = &self.custom_workflows_dir.clone(); | ||
| if let Err(err) = self | ||
| .load_operation_workflows(WorkflowSource::UserDefined, dir_path) | ||
| .load_operation_workflows(WorkflowSource::UserDefined(dir_path)) |
There was a problem hiding this comment.
Opinion: Although I understand that WorkflowSource being a generic type allows such generic usages, but this one with the directory path seems slightly overloaded. I agree that we can still defend it since the directory roots are also linked to the source type.
tests/RobotFramework/tests/tedge_agent/workflows/long-running-command-v1.toml
Show resolved
Hide resolved
| ... item="@version":"76e9afe834b4a7cadc9029670ba76745fcda73784f9e78c09f0c0416f7f58ad2" | ||
|
|
||
| Recover Builtin Operation | ||
| ThinEdgeIO.File Should Exist /etc/tedge/operations/software_list.toml |
There was a problem hiding this comment.
suggestion: the test fails if run by itself and not part of the test suite, because this file is created by the previous test case. Could we instead use Transfer to Device everywhere so that there are no dependencies between the test cases? I'd expect Transfer to Device to overwrite the file if it already exists, so it should be okay to use that?
There was a problem hiding this comment.
I have a mix feeling here. On one side, you are correct, it would be handy to have independent tests. But, on the other side, this test suite represents well a scenario where a user creates and iterate updating a workflow file.
Concretely, replacingFile Should Exist assertion by a Transfer to Device command would lead to a different test while running the suite vs the isolated test. Indeed, in the suite case, one checks that a user can update a workflow while, in the isolated case, one checks that the user can create a workflow (i.e. Update User-Defined Operation doing the same test as Create User-Defined Operation).
Bravo555
left a comment
There was a problem hiding this comment.
Some nits, but LGTM overall.
| ${workflow_log} Execute Command cat /var/log/tedge/agent/workflow-user-command-dyn-test-1.log | ||
| Should Contain | ||
| ... ${workflow_log} | ||
| ... item="@version":"37d0861e3038b34e8ab2ffe3257dd9372213ed5e17ba352e5028b0bf9762a089" |
There was a problem hiding this comment.
nit(non-blocking): actual SHA256 is an implementation detail, we don't need to compare full value, only that it changed between different versions of the workflow
Also if the toml file changes this value will have to be updated.
It's a bit of a nitpick, but a comment would help because it's not obvious that it's SHA256 hash of user-command-v1.toml and why we're comparing it
| use anyhow::Context; | ||
| use camino::Utf8Path; | ||
| use camino::Utf8PathBuf; | ||
| use log::error; |
There was a problem hiding this comment.
| use log::error; | |
| use tracing::error; |
There was a problem hiding this comment.
thought: this module has quite a bit of functionality but doesn't have any unit tests - codecov reports 210 missed lines and 34.2% patch coverage (from other workflow-related tests)
There was a problem hiding this comment.
I can only acknowledge that the unit test coverage is poor. However, this code is quit extensively tested by the system test suite added by this PR (Dynamic Workflow Reloading). I opted for system-tests instead of unit-tests because the features introduced by this module are heavily related to the file system and inotify as well as sequence of user actions (adding/updating/removing files while the agent is running/restarted. One place where unit tests can be improved is the tedge_api::workflow::supervisor module which provide the in-memory representation of the uploaded workflow definitions.
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
This is an intermediate step, the aim being to use the same directory to persist a copy of the workflows currently used (i.e. for which there is a running operation instance). Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
For this first step the behavior is unchanged: the workflows are only loaded on start Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
…r engine The WorkflowRepository acts as a facade to WorkflowSupervisor adding all disk related features: loading definitions from disk, caching definitions in-use, reloading definitions on changes. Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
A workflow source being always used with a complementary info: a file path or a workflow version, it makes sense to pack the complementary info within the WorkflowSource itself. This also highlights the corner case of the BuiltIn workflow for which there is no complementary info. Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Proposed changes
In order to support dynamic reloading of workflow, without breaking a running workflow,
the proposal is to:
Plan:
Types of changes
Paste Link to the issue
#3156
Checklist
cargo fmtas mentioned in CODING_GUIDELINEScargo clippyas mentioned in CODING_GUIDELINESFurther comments