Skip to content

refactor: Simplify the operation handling code by replacing fragmented control flow with regular async/await#2904

Merged
Bravo555 merged 10 commits intothin-edge:mainfrom
Bravo555:improve/operation-handler-decoupling
Jul 12, 2024
Merged

refactor: Simplify the operation handling code by replacing fragmented control flow with regular async/await#2904
Bravo555 merged 10 commits intothin-edge:mainfrom
Bravo555:improve/operation-handler-decoupling

Conversation

@Bravo555
Copy link
Copy Markdown
Member

@Bravo555 Bravo555 commented May 25, 2024

TODO

  • add more tests to see which operations are blocked/executed concurrently added log_upload, no other operations downloading from the FTS (from the perspective of the mapper)
  • remove remaining mutexes/locks replaced with ordinary sender clones
  • clean up operation handling code will be done in next PR

Follow-up tasks in: #2880 (comment)

Proposed changes

This PR is a 2nd attempt at simplifying operation control flow. I found that the top-down approach in #2881 of spawning a task for every main actor message would be too tedious to get working, because to remove the lock on the converter would require to change convert method to &self, which at this moment is infeasible because of how many things it does/how many other functions it calls.

Instead, the approach taken here was a bottom-up one:

  1. It was identified that the operation handling and its properties do not fit into convert method's assumptions:
    • convert() needs to convert MQTT messages in both directions without blocking
    • but operation handling can block, and it can take some time before MQTT response is ready
  2. As such, some operations (not all) were pulled out of CumulocityConverter and placed in a newly created OperationHandler. The operation handler was later made to tokio::spawn the operation handlers. This separates the scope of operation handling from the converter, and makes it so the converter can immediately return.
  3. pending_upload_operations and pending_fts_download_operations hashmaps were removed.

In contrast to the previous attempt, the stated aims, i.e.: simplifying the control flow without impacting the ability to execute multiple operations concurrently, appear to have been achieved.

cont. in Further comments

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Caveats

Testing and correctness

There was a test created, but it only tests config_snapshot and log_upload command, and it only tests if it's concurrent by checking if the download requests were created. Ideally we'd spawn operations of all types and check if they're progressing equally, but the only view into their execution is messages they send to other actors, but some don't do this at all.

https://github.com/Bravo555/thin-edge.io/blob/4f53e990e160924aec0931ba1afe771f9301195f/crates/extensions/c8y_mapper_ext/src/tests.rs#L2421-L2469

@codecov
Copy link
Copy Markdown

codecov bot commented May 25, 2024

Codecov Report

Attention: Patch coverage is 76.00644% with 149 lines in your changes missing coverage. Please review.

Project coverage is 78.0%. Comparing base (70f761f) to head (fe3d76a).
Report is 16 commits behind head on main.

Additional details and impacted files
Files Coverage Δ
crates/core/tedge_api/src/mqtt_topics.rs 87.1% <ø> (ø)
crates/extensions/c8y_auth_proxy/src/url.rs 81.5% <ø> (ø)
crates/extensions/c8y_http_proxy/src/handle.rs 39.0% <ø> (ø)
crates/extensions/c8y_mapper_ext/src/actor.rs 80.1% <100.0%> (-0.6%) ⬇️
...ons/c8y_mapper_ext/src/operations/config_update.rs 93.2% <100.0%> (+0.2%) ⬆️
...s/c8y_mapper_ext/src/operations/firmware_update.rs 91.1% <100.0%> (+0.1%) ⬆️
...tedge_config_cli/models/c8y_software_management.rs 40.0% <0.0%> (ø)
...on/tedge_config/src/tedge_config_cli/models/mod.rs 54.5% <0.0%> (ø)
crates/extensions/c8y_mapper_ext/src/lib.rs 88.8% <0.0%> (ø)
crates/extensions/c8y_mapper_ext/src/config.rs 48.2% <0.0%> (ø)
... and 9 more

... and 8 files with indirect coverage changes

@Bravo555 Bravo555 force-pushed the improve/operation-handler-decoupling branch from 4f53e99 to df72f91 Compare May 25, 2024 14:52
@Bravo555 Bravo555 had a problem deploying to Test Pull Request May 25, 2024 14:52 — with GitHub Actions Failure
@Bravo555 Bravo555 force-pushed the improve/operation-handler-decoupling branch from df72f91 to faecc49 Compare May 25, 2024 14:58
@Bravo555 Bravo555 had a problem deploying to Test Pull Request May 25, 2024 14:58 — with GitHub Actions Failure
@Bravo555 Bravo555 changed the title Improve/operation handler decoupling Simplify the operation handling code by replacing fragmented control flow with regular async/await - attempt #2 May 26, 2024
@Bravo555 Bravo555 self-assigned this May 27, 2024
@Bravo555 Bravo555 force-pushed the improve/operation-handler-decoupling branch from faecc49 to b51176e Compare May 28, 2024 09:06
@Bravo555 Bravo555 had a problem deploying to Test Pull Request May 28, 2024 09:06 — with GitHub Actions Failure
@Bravo555 Bravo555 temporarily deployed to Test Pull Request May 28, 2024 15:40 — with GitHub Actions Inactive
@Bravo555 Bravo555 marked this pull request as ready for review May 28, 2024 15:41
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented May 29, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
476 0 3 476 100 1h18m25.890029s

@Bravo555
Copy link
Copy Markdown
Member Author

Bravo555 commented May 29, 2024

After rerunning the tests locally, I can observe that the mapper correctly sends 502 and 503 smartrest messages, but operations status asserts are hitting timeouts nonetheless. Could this be a problem with Cumulocity tenant?

Was a regression introduced by the PR, fixed.

@didier-wenzek
Copy link
Copy Markdown
Contributor

After rerunning the tests locally, I can observe that the mapper correctly sends 502 and 503 smartrest messages, but operations status asserts are hitting timeouts nonetheless. Could this be a problem with Cumulocity tenant?

I would be surprised by that. The tests are green on other PRs and the errors are related to this PR's changes.

@Bravo555 Bravo555 force-pushed the improve/operation-handler-decoupling branch from ee20ab7 to 80b918b Compare May 29, 2024 17:13
@Bravo555 Bravo555 temporarily deployed to Test Pull Request May 29, 2024 17:13 — with GitHub Actions Inactive
@Bravo555 Bravo555 temporarily deployed to Test Pull Request May 29, 2024 18:18 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great effort. It was nice to see the OperationHandler abstraction with only the stuff that really needs to be shared for operation handling. And I really liked that subtle trick you used with the Entity abstraction, eliminating the need for sharing the entity_store as well.

I'm still a bit confused to see the locking required around the proxy. I was under the impression that it wouldn't be needed anymore. But still, it's good to see that the locking is limited to the upload path and it doesn't influence the processing of any other parallel requests.

Copy link
Copy Markdown
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be really happy to merge this PR!

However, I see this work as half-done:

  • All the operations must be handled by the OperationHandler.
  • This is an opportunity to move the code for Restart, SoftwareList and SoftwareUpdate into their own modules under operations.

Copy link
Copy Markdown
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A giant leap in terms of streamlining the bloated responsibilities of the converter. The code looks really good and consider this my stamp of approval of the refactoring.

Yes, there are still some minor niggles in how the clear messages are a handled etc. But, to avoid blocking this PR for much longer, I'm happy to approve this PR in its current state, as the improvements made far outweighs such minor niggles. Will give the "formal" approval once Didier's comments are also addressed.

@didier-wenzek
Copy link
Copy Markdown
Contributor

A giant leap in terms of streamlining the bloated responsibilities of the converter. The code looks really good and consider this my stamp of approval of the refactoring.

Agree. The benefits are great and I will be happy to merge this PR despite being not fully happy with the design. See #2904 (comment).

Yes, there are still some minor niggles in how the clear messages are a handled etc. But, to avoid blocking this PR for much longer, I'm happy to approve this PR in its current state, as the improvements made far outweighs such minor niggles. Will give the "formal" approval once Didier's comments are also addressed.

I'm afraid my comments are not straightforward to address:

Hence, I'm happy to address these pending comments in a following PR.

Copy link
Copy Markdown
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Thank you for this sustained effort to untangle the c8y converter. A great leap forward.

Copy link
Copy Markdown
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some test failure, but looks unrelated.

@reubenmiller reubenmiller added the refactoring Developer value label Jul 12, 2024
Bravo555 added 10 commits July 12, 2024 09:33
To be more sure about future changes to the c8y mapper and the
converter, we need tests checking that operations happen concurrently.

This change introduces a test that only covers config_snapshot
operation, but it expectedly fails when regressions were introduced by
the previous commits in this PR.

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Creates a new `OperationHandler` struct with methods that handle
operations. This is done to both pull stuff out of
`CumulocityConverter`, which is too large and does too many things, and
to make it possible to handle operations concurrently.

The current design of the `CumulocityConverter` is that all the incoming
MQTT messages go into the converter's `convert()` method, which converts
in both directions between messages from the cloud and local MQTT
broker, and also needs to return a converted message without blocking.

In contrast, when handling operations, it can take some time before an
MQTT response is ready to be sent. As such, it makes sense to spawn a
task when starting the operation, and have the converter return no
messages, so other messages can be converted.

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
To make sure `OperationHandler` can be used concurrently, its handler
methods were made `&self`.

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Before the change, `pending_upload_operations` and
`pending_fts_download_operations` hashmaps lived inside
`CumulocityConverter`, so the main actor would need to access them via
the converter.

After the change, the hashmaps were put behind `Arc<Mutex<>>`, so the
main actors would be able to access them without going through the
converter.

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Created a new struct `OperationContext` from the fields of
`OperationHandler` that were used by operations tasks. Thanks to that:

- mutex on the `running_operations` field was removed
- `OperationContext` no longer needs to be behind `Arc` to be useful,
  but uses `Arc`s internally
- data flow in `RunningOperation::spawn` is clearer
- `OperationContext` is private, so constructing `OperationHandler` is
  easier, because we don't need to pass every field, instead we just
  take what we need from `C8yMapperConfig`

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
@Bravo555 Bravo555 force-pushed the improve/operation-handler-decoupling branch from d0674c1 to fe3d76a Compare July 12, 2024 09:49
@Bravo555 Bravo555 temporarily deployed to Test Pull Request July 12, 2024 09:49 — with GitHub Actions Inactive
@Bravo555 Bravo555 added this pull request to the merge queue Jul 12, 2024
Merged via the queue into thin-edge:main with commit fddf093 Jul 12, 2024
@Bravo555 Bravo555 removed their assignment Jul 12, 2024
@reubenmiller reubenmiller changed the title Simplify the operation handling code by replacing fragmented control flow with regular async/await - attempt #2 refactor: Simplify the operation handling code by replacing fragmented control flow with regular async/await - attempt #2 Aug 1, 2024
@reubenmiller reubenmiller changed the title refactor: Simplify the operation handling code by replacing fragmented control flow with regular async/await - attempt #2 refactor: Simplify the operation handling code by replacing fragmented control flow with regular async/await Aug 1, 2024
@reubenmiller reubenmiller added the theme:c8y Theme: Cumulocity related topics label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring Developer value theme:c8y Theme: Cumulocity related topics

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants