Skip to content

fix: ensure concurrent server always responds to shutdown signal#3291

Merged
jarhodes314 merged 1 commit intothin-edge:mainfrom
jarhodes314:bug/tedge-mapper-shutdown
Dec 13, 2024
Merged

fix: ensure concurrent server always responds to shutdown signal#3291
jarhodes314 merged 1 commit intothin-edge:mainfrom
jarhodes314:bug/tedge-mapper-shutdown

Conversation

@jarhodes314
Copy link
Copy Markdown
Contributor

Proposed changes

This fixes an issue that was causing tedge-mapper to fail to shutdown properly in certain cases, namely when the HTTP actor was busy when the shutdown request was received.

The root cause is in these lines of code

loop {
tokio::select! {
Some(request) = self.requests.recv() => {
return Some(request);
}
Some(result) = self.running_request_handlers.next() => {
if let Err(err) = result {
log::error!("Fail to run a request to completion: {err}");
}
}
else => {
return None
}
}
}

When we receive a RuntimeRequest::Shutdown, self.requests.recv() returns None. Since this doesn't match Some(request), this branch then becomes inactive and we wait for self.running_requests_handlers.next(). If any request handlers are running, when this returns, the code will loop, running self.requests.recv() again, meaning we've completely ignored the shutdown request and will never receive another one.

A similar thing happens if the running requests do not finish in a timely manner after the shutdown request is received. We have a unit test for this, however this only tests the case where the actor is at capacity (i.e. we have a task waiting that cannot be scheduled due to the concurrency limit). Given in this case the actor stops running immediately upon receiving the shutdown request, I have replicated this behaviour for the case where the actor is not at capacity.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Ok(())
);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was slightly horrified to see how complicated the test case above is. I've tested the code more directly (i.e. without the actor actually running, just polling next_request manually) as a result. A better solution might be to try and abstract away some of the setup of the above test, and then run essentially the same test just with fewer in-flight requests to test the correct code path.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 11, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...tes/core/tedge_actors/src/servers/message_boxes.rs 85.71% 0 Missing and 7 partials ⚠️
Additional details and impacted files

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 11, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
541 0 2 541 100 1h22m30.741205s

);
}

#[tokio::test]
Copy link
Copy Markdown
Contributor

@didier-wenzek didier-wenzek Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to have these two new tests actually detecting the issue. However, it would be could to put them in a test module that doesn't require cfg(feature = "test-helpers").

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a fixup commit to fix that f092624

Copy link
Copy Markdown
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@jarhodes314 jarhodes314 force-pushed the bug/tedge-mapper-shutdown branch from f092624 to 8d1b650 Compare December 13, 2024 11:25
@jarhodes314 jarhodes314 added this pull request to the merge queue Dec 13, 2024
Merged via the queue into thin-edge:main with commit 44a56e3 Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants