fix: Don't fail if only some operations could not be loaded#3236
Conversation
Codecov ReportAttention: Patch coverage is Additional details and impacted files📢 Thoughts on this report? Let us know! 🚨 Try these New Features:
|
Robot Results
|
7c40908 to
5dd7588
Compare
5dd7588 to
c024494
Compare
|
This PR changes logic of loading c8y operations in the To ensure that only a single log is emitted for a given operation file, this PR originally aimed to address #3160 as well, by refactoring registering operation flow so that operation directory would be read only a single time. However this optimal solution is made impossible by the inability to add an MQTT subscription after the actor is already spawned, so we're forced to scan the directory when creating As such, this PR will remove duplicates traversals where it's feasible, but will not completely address #3160, and as such, the log output will still be a bit spammed by duplicate errors, which will be annoying for users. Solving this may require bigger changes to our MQTT actor, so it will be left for another PR. |
c024494 to
95dc831
Compare
95dc831 to
ea2da72
Compare
ea2da72 to
51f6823
Compare
61c89ab to
bf9f88a
Compare
bf9f88a to
ac65be8
Compare
| self.operations.dedup(); | ||
| let pos = self.operations.iter().position(|op| op.name == name); | ||
| pos.map(|pos| self.operations.remove(pos)) |
There was a problem hiding this comment.
If the switch to a BTreeSet is very minimal, I'm even okay with you making that switch in this PR itself, to avoid complications like this. Or if you prefer a standalone PR, that's also fine.
There was a problem hiding this comment.
Yes, better to switch to a BTreeSet right now.
There was a problem hiding this comment.
I would really prefer to avoid big changes to Operations struct in this PR because of parallel PR #3225, after one of these merges we expect to see conflicts which I'd like to minimise.
I'd also like to reorganize the module a bit in the refactor, so I'd rather do it as a separate PR.
| let operation = c8y_api::smartrest::operations::get_operation( | ||
| ops_file.as_std_path(), | ||
| &self.config.bridge_config, | ||
| )?; | ||
| let operations = self | ||
| .operations_for_device_mut(target) | ||
| .expect("entity should've been checked before that it's not a service"); | ||
|
|
||
| let need_cloud_update = self.update_operations(ops_dir.as_std_path())?; | ||
| let prev_operation = operations.remove_operation(&operation.name); | ||
| // even if the body of the operation is different, as long as it has the same name, supported operations message | ||
| // will be the same, so we don't need to resend | ||
| let need_cloud_update = prev_operation.is_none(); |
There was a problem hiding this comment.
Why not move this logic into the existing update_operations function, as that function already has the contract to return the boolean flag whether the cloud update is required or not?
There was a problem hiding this comment.
Yeah, the logic of checking if operations changed is a bit duplicated, but also slightly different:
update_operationsreads all operations from the c8y operations dir and returnstrueif there were any updates in there. It compares two different instances ofOperations.- in
register_operationas we register only a single operation, we know the update is required whenprev_operationisNone. We only mutate a singleOperationsvalue.
So update_operation has a side effect of creating the entire new Operations value and it checks more that's necessary, so I can't really reuse it as-is, but I agree that the logic that decides if we publish a 114 message being split is suboptimal.
There was a problem hiding this comment.
The function changed a bit in a313601, and I ended up with two branches, in the case of a child device I just call update_operations outright. Let me know if this addresses it, or if you had something more particular in mind.
|
|
||
| #[tokio::test] | ||
| // TODO: fix or remove test | ||
| #[ignore = "asserts that publishing a single operation capability message causes full rescan of c8y operations directory, which is undesirable behaviour"] |
There was a problem hiding this comment.
I understand that the rescan is not optimal. But, we shouldn't be skipping the whole test for that reason. The dynamic ops update feature is still relevant and must be tested.
There was a problem hiding this comment.
Yeah, this will need to be addressed before the merge.
Just that from my perspective, it looks like the primary point of this test is to assert the exact behaviour I'm removing (registering any operation via MQTT causes full rescan of c8y operations directory), so I'm not really sure what behaviour we could be checking instead.
rina23q
left a comment
There was a problem hiding this comment.
As I confirmed that it no longer makes a panic. I would approve it. There are still many small things to be improved regarding Operations struct, however, I believe time to move on.
As such, this PR will remove duplicates traversals where it's feasible, but will not completely address #3160, and as such, the log output will still be a bit spammed by duplicate errors, which will be annoying for users. Solving this may require bigger changes to our MQTT actor, so it will be left for another PR.
I saw the error messages are now much less (now 2 vs before around 7, the numbers of operation files at tedge-mapper-c8y's startup). Thanks for improving.
albinsuresh
left a comment
There was a problem hiding this comment.
Despite the shortcomings of the Operations structure, this is sufficient for the bug fix.
| let operations = self | ||
| .operations_for_device_mut(target) | ||
| .expect("entity should've been checked before that it's not a service"); | ||
| let need_cloud_update = match is_child_operation_path(ops_dir) { |
There was a problem hiding this comment.
Not a request for change, but just curious why you chose match statement here instead of a simple if-else.
| } else { | ||
| C8yTopic::upstream_topic(&c8y_prefix) | ||
| }; | ||
| let prev_operation = current_operations.remove_operation(&operation.name); |
There was a problem hiding this comment.
In your refactoring PR, consider updating the signature of Operations::add_operation function to return the previous operation so that we can avoid this removal first and addition one step later.
a313601 to
e4e1d22
Compare
When loading operations from $CONFIG_DIR/operations/c8y, log an error for operations we were unable to load but return other successfully loaded operations. Currently this spams the log output a bit, because we statically register some operations by name, but it causes full directory scan and traversal of all operation files anyway. Will be fixed in next commit. Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
… operation [1] disabled dynamic operation reload for child devices because it needed to support nested child devices as well. For this reason, when receiving an MQTT command metadata message and registering that operation, we still need to read the operation directory and register and send all the new operations. [1]: thin-edge#2614 Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
e4e1d22 to
d66f2ca
Compare
TODO
read c8y operations directory only once(fix: Don't fail if only some operations could not be loaded #3236 (comment))clean updefered to another PROperationsstruct and its usage in converterProposed changes
When loading operations from $CONFIG_DIR/operations/c8y, log an error for operations we were unable to load but return other successfully loaded operations.
Because of #3160, this resulted in a lot of logs being emitted, so the PR also reworks the operation registration to not load and re-register all the operations from disk when only a single operation is registered. As a result, we still log invalid operation files 2 times on startup, because of inability to change subscriptions during runtime, see #3236 (comment).
Types of changes
Paste Link to the issue
Checklist
cargo fmtas mentioned in CODING_GUIDELINEScargo clippyas mentioned in CODING_GUIDELINESFurther comments