Avoid retrying work after the queue is cancelled#32306
Avoid retrying work after the queue is cancelled#32306sharwell merged 1 commit intodotnet:masterfrom
Conversation
|
Change looks good. But it would be useful to know how this fixes the linked issue. |
We have a check that runs at the end of every test verifying that all asynchronous operations scheduled in the context of a workspace have completed (successfully or cancelled) before moving from one test to the next. The This test was failing occasionally because the asynchronous operation started here never completed: In the past, this was caused by work processing queues having work added after the queue finished processing events. |
|
that CancellationToken is shutdown cancellation token. so no harm checking it before re-enqueue. but there might still be a race. if we know this work item async token is only token left not disposed, I think it might be good to have explicit Test only API such as DropAllPendingWorkItems which just loop through workitem in the queue and call dispose on the AsyncToken |
👍 There may be a race, but one less is good. 😄
The purpose of the test is to assert that this is not needed. The async tokens represent work which may execute in the future. The goal is to ensure operations started by test A do not impact the state of future test B, and disposing of the async tokens manually would defeat this isolation. |
|
but the queue is per test. so I am not sure how it can affect other test? or the issue is due to we moved all tests to share same MEF export which causing multiple tests to share same queue? then, that seems the root cause of the bug? |
jasonmalinowski
left a comment
There was a problem hiding this comment.
Please move the explanation into the commit message.
|
@heejaechang The isolation guarantee allows us to state that tests do not impact each other through asynchronous operations continuing after the test ends. We do not need to look at the operations themselves to see if they share state because there are no operations. It's a strong guarantee but only works when applied as a blanket policy. 😄 |
|
but I am still not sure. since solution crawler should have each workspace has its dedicated queues. it should never share queue with other workspaces. each test should have its own isolation. either that got broken at some point, or if it still has isolation, then having test only drop all pending workitem should be fine since the queue is being shutdown. we don't want drop all pending workitem thing in production code, since we don't want to run any clean up code on shutdown. that's just unnecessary work causing shutdown to take longer. |
|
I still don't get. each test should have its own Asynchronous operation queue. if it is shared between multiple tests due to MEF stuff, then that is a bug from the beginning. if its test has its own asynchronous operation queue, then I don't get why it is a bug when there is pending operation when test are done. it is only a bug if that pending async token is related to the test itself. |
|
@heejaechang The check is much wider than the asynchronous operation queues. It is a global assertion (to the degree possible) that work scheduled by a test will not start or continue executing after the end of the test. The benefits include, but are not limited, ensuring exceptions thrown by asynchronous work are correctly associated in time with the test that originally caused the scheduling of that operation. |
Work scheduled after queues are shutdown will never complete, which will hang the cleanup operations that run between unit tests. Fixes dotnet#28062
b164d9c to
b8caab1
Compare
|
alright, don't want to keep argue. I looked through the code and looks like what you did is enough. basically issue is, solution crawler has shutdown and it set shutdown cancellation token and cancel all running work. and then those running work got cancelled blindly re-enqueue itself to the queue since it didn't finish its work. in VS, since process is going away when it got shutdown so doesn't matter. but in the unit test, due to the your check, someone must either go through those items and explicitly drop or make sure not to enqueue the work after solution is shutdown. and you did the later. |
|
@jinujoseph for approval |
|
Does this needs to be in preview2 ? |
|
I am waiting to hear back from @jaredpar on whether the current flakiness level means this should be fixed in preview 2. |
I don't think the flakiness level should change where this bug is fixed. The test can be disabled / enabled independently of this fix. I'm fine with master here. |
|
@jinujoseph This is now targeting master |
Avoid retrying work after the queue is cancelled
Fixes #28062
Ask Mode
Customer scenario
No direct reproducer is known.
Bugs this fixes
#28062
Workarounds, if any
None.
Risk
Very low. This change avoids scheduling work items in three specific queues after global shutdown is requested.
Performance impact
This change may improve test performance, but is unlikely to have an observable impact elsewhere.
Is this a regression from a previous update?
No.
Root cause analysis
If project or document processing was cancelled because the workspace was shut down, items could be rescheduled for execution and never actually execute.
How was the bug found?
Caught by automated testing.
Test documentation updated?
N/A