fix: ensure concurrent server always responds to shutdown signal by jarhodes314 · Pull Request #3291 · thin-edge/thin-edge.io

jarhodes314 · 2024-12-11T16:06:26Z

Proposed changes

This fixes an issue that was causing tedge-mapper to fail to shutdown properly in certain cases, namely when the HTTP actor was busy when the shutdown request was received.

The root cause is in these lines of code

thin-edge.io/crates/core/tedge_actors/src/servers/message_boxes.rs

Lines 51 to 65 in 82cde40

    
           loop { 
        
               tokio::select! { 
        
                   Some(request) = self.requests.recv() => { 
        
                       return Some(request); 
        
                   } 
        
                   Some(result) = self.running_request_handlers.next() => { 
        
                       if let Err(err) = result { 
        
                           log::error!("Fail to run a request to completion: {err}"); 
        
                       } 
        
                   } 
        
                   else => { 
        
                       return None 
        
                   } 
        
               } 
        
           }

When we receive a RuntimeRequest::Shutdown, self.requests.recv() returns None. Since this doesn't match Some(request), this branch then becomes inactive and we wait for self.running_requests_handlers.next(). If any request handlers are running, when this returns, the code will loop, running self.requests.recv() again, meaning we've completely ignored the shutdown request and will never receive another one.

A similar thing happens if the running requests do not finish in a timely manner after the shutdown request is received. We have a unit test for this, however this only tests the case where the actor is at capacity (i.e. we have a task waiting that cannot be scheduled due to the concurrency limit). Given in this case the actor stops running immediately upon receiving the shutdown request, I have replicated this behaviour for the case where the actor is not at capacity.

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

jarhodes314 · 2024-12-11T16:09:46Z

crates/core/tedge_actors/src/servers/message_boxes.rs

            Ok(())
        );
    }
+


I was slightly horrified to see how complicated the test case above is. I've tested the code more directly (i.e. without the actor actually running, just polling next_request manually) as a result. A better solution might be to try and abstract away some of the setup of the above test, and then run essentially the same test just with fewer in-flight requests to test the correct code path.

codecov · 2024-12-11T16:18:06Z

Codecov Report

Attention: Patch coverage is 85.71429% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...tes/core/tedge_actors/src/servers/message_boxes.rs	85.71%	0 Missing and 7 partials ⚠️

Additional details and impacted files

📢 Thoughts on this report? Let us know!

github-actions · 2024-12-11T16:28:20Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %	⏱️ Duration
541	0	2	541	100	1h22m30.741205s

didier-wenzek · 2024-12-11T20:35:07Z

crates/core/tedge_actors/src/servers/message_boxes.rs

        );
    }
+
+    #[tokio::test]


Good to have these two new tests actually detecting the issue. However, it would be could to put them in a test module that doesn't require cfg(feature = "test-helpers").

I pushed a fixup commit to fix that f092624

didier-wenzek

Approved

jarhodes314 requested review from albinsuresh, didier-wenzek and rina23q as code owners December 11, 2024 16:06

jarhodes314 temporarily deployed to Test Pull Request December 11, 2024 16:06 — with GitHub Actions Inactive

jarhodes314 commented Dec 11, 2024

View reviewed changes

jarhodes314 temporarily deployed to Test Auto December 11, 2024 16:14 — with GitHub Actions Inactive

didier-wenzek reviewed Dec 11, 2024

View reviewed changes

didier-wenzek temporarily deployed to Test Pull Request December 13, 2024 10:50 — with GitHub Actions Inactive

didier-wenzek approved these changes Dec 13, 2024

View reviewed changes

didier-wenzek temporarily deployed to Test Auto December 13, 2024 10:58 — with GitHub Actions Inactive

fix: ensure concurrent server always responds to shutdown signal

8d1b650

jarhodes314 force-pushed the bug/tedge-mapper-shutdown branch from f092624 to 8d1b650 Compare December 13, 2024 11:25

jarhodes314 temporarily deployed to Test Pull Request December 13, 2024 11:25 — with GitHub Actions Inactive

jarhodes314 enabled auto-merge December 13, 2024 11:26

jarhodes314 temporarily deployed to Test Auto December 13, 2024 11:31 — with GitHub Actions Inactive

jarhodes314 added this pull request to the merge queue Dec 13, 2024

Merged via the queue into thin-edge:main with commit 44a56e3 Dec 13, 2024

didier-wenzek mentioned this pull request Dec 31, 2024

Fix the behavior of Concurrent Server Actor on shutdown #2026

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ensure concurrent server always responds to shutdown signal#3291

fix: ensure concurrent server always responds to shutdown signal#3291
jarhodes314 merged 1 commit intothin-edge:mainfrom
jarhodes314:bug/tedge-mapper-shutdown

jarhodes314 commented Dec 11, 2024

Uh oh!

jarhodes314 Dec 11, 2024

Uh oh!

codecov bot commented Dec 11, 2024

Uh oh!

github-actions bot commented Dec 11, 2024 •

edited

Loading

Uh oh!

didier-wenzek Dec 11, 2024 •

edited

Loading

Uh oh!

didier-wenzek Dec 13, 2024

Uh oh!

didier-wenzek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	loop {
	tokio::select! {
	Some(request) = self.requests.recv() => {
	return Some(request);
	}
	Some(result) = self.running_request_handlers.next() => {
	if let Err(err) = result {
	log::error!("Fail to run a request to completion: {err}");
	}
	}
	else => {
	return None
	}
	}
	}

Conversation

jarhodes314 commented Dec 11, 2024

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

Uh oh!

jarhodes314 Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 11, 2024

Codecov Report

Uh oh!

github-actions bot commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Robot Results

Uh oh!

didier-wenzek Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

didier-wenzek Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

didier-wenzek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 11, 2024 •

edited

Loading

didier-wenzek Dec 11, 2024 •

edited

Loading