Skip to content

feat: Change which download errors are retryable#3229

Merged
Bravo555 merged 2 commits intothin-edge:mainfrom
Bravo555:improve/download-retryable-errors
Nov 13, 2024
Merged

feat: Change which download errors are retryable#3229
Bravo555 merged 2 commits intothin-edge:mainfrom
Bravo555:improve/download-retryable-errors

Conversation

@Bravo555
Copy link
Copy Markdown
Member

@Bravo555 Bravo555 commented Nov 6, 2024

Proposed changes

Whereas previously we considered only some errors to be permanent and the rest to be retryable, that logic was inverted so we check if an error is one of a few that is transient, and the rest are considered permanent.

Biggest changes:

  • connect errors are considered transient
  • timeouts are considered transient

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Nov 6, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
529 0 2 529 100 1h31m55.289806999s

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Additional details and impacted files

📢 Thoughts on this report? Let us know!

Whereas previously we considered only some errors to be permanent and
the rest to be retryable, that logic was inverted so we check if an
error is one of a few that is transient, and the rest are considered
permanent.

Biggest changes:
- connect errors are considered transient
- timeouts are considered transient

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
@Bravo555 Bravo555 force-pushed the improve/download-retryable-errors branch from 5ee377a to 8e0ab51 Compare November 6, 2024 14:03
@Bravo555 Bravo555 temporarily deployed to Test Pull Request November 6, 2024 14:03 — with GitHub Actions Inactive
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
let mut downloader = Downloader::new(target_path, None, CloudRootCerts::from([]));
downloader.set_backoff(ExponentialBackoff {
current_interval: Duration::ZERO,
max_elapsed_time: Some(Duration::ZERO),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit puzzled by this change. As far as I understand, this disables the backoff since it will cause next_backoff to always return None, so we will never retry?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As now the connect errors are considered transient, it causes DNS resolution failure to be considered transient, which caused the test to hang instead of immediately returning.
I think the intention behind setting current_interval to zero would be to disable retries in the test, but it wasn't actually doing anything, and only max_elapsed_time set to zero actually disables the retries.

But another question is should DNS resolution failure be considered transient. On one hand we can timeout during the query because the network is down, then we should retry, but if the network is up and the domain does not exist, then we should return ASAP.

I'll check if we can extract information from reqwest::Error so that we can distinguish between these 2 cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But another question is should DNS resolution failure be considered transient.

Yes DNS resolution errors are transient and should be retried, see #2321

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit puzzled by this change. As far as I understand, this disables the backoff since it will cause next_backoff to always return None, so we will never retry?

I was puzzled too, till I noticed this is only in the scope of a test.

@Bravo555 Bravo555 added this pull request to the merge queue Nov 12, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 12, 2024
@Bravo555 Bravo555 added this pull request to the merge queue Nov 13, 2024
Merged via the queue into thin-edge:main with commit 3641da2 Nov 13, 2024
@reubenmiller reubenmiller added this to the 1.4.0 milestone Dec 5, 2024
@reubenmiller reubenmiller changed the title Change which download errors are retryable feat: Change which download errors are retryable Dec 16, 2024
@reubenmiller reubenmiller added theme:software Theme: Software management theme:configuration Theme: Configuration management labels Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

theme:configuration Theme: Configuration management theme:software Theme: Software management

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants