Skip to content

Unwrap the cause (inner exception) of operation canceled exceptions when reporting faults#61787

Merged
ryzngard merged 7 commits intodotnet:mainfrom
ryzngard:issues/operationcanceled_fault
Sep 22, 2022
Merged

Unwrap the cause (inner exception) of operation canceled exceptions when reporting faults#61787
ryzngard merged 7 commits intodotnet:mainfrom
ryzngard:issues/operationcanceled_fault

Conversation

@ryzngard
Copy link
Copy Markdown
Contributor

@ryzngard ryzngard commented Jun 8, 2022

Fixes AB#1212299

@ryzngard ryzngard requested a review from a team as a code owner June 8, 2022 23:20
@ghost ghost added the Area-IDE label Jun 8, 2022
@ryzngard ryzngard requested a review from sharwell June 8, 2022 23:20
Copy link
Copy Markdown
Contributor

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OperationCanceledException can occur on both normal cancellation paths and error paths. Currently we only report them on error paths, where it's not required to omit them.

@sharwell sharwell changed the title Don't report operation canceled exceptions as faults Unwrap the cause (inner exception) of operation canceled exceptions when reporting faults Jun 9, 2022
Copy link
Copy Markdown
Contributor

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misinterpreted the original change and reworded the title to avoid confusion. One simplification given in code.

@tmat
Copy link
Copy Markdown
Member

tmat commented Jun 9, 2022

Won't we start reporting ConnectionLostExceptions (that are wrapped in OperationCanceledException) if this is merged? These are expected exceptions that occur on shutdown and they may get wrapped in OperationCanceledException and get reported if cancellation token is not signaled when ReportAndPropagateUnlessCanceled is called with cancellation token.

@sharwell
Copy link
Copy Markdown
Contributor

sharwell commented Jun 9, 2022

This is assuming all of the cancellation exceptions getting reported are wrapping a ConnectionLostException. Also, it'd be fine to skip reporting for ConnectionLostException when the current process is the ServiceHub process (ConnectionLostException is a more targeted exclusion than OperationCanceledException).

@tmat
Copy link
Copy Markdown
Member

tmat commented Jun 9, 2022

The motivation for adding OperationCanceledNotMatchingCancellationTokenException that is wrapping ConnectionLostException or ObjectDisposedException in #61792 is to distinguish OCEs that should not be reported in NFEs from regular OCEs.

Are there any other cases when we are wrapping exceptions to OCE?

Co-authored-by: Sam Harwell <sam@tunnelvisionlabs.com>
@ryzngard
Copy link
Copy Markdown
Contributor Author

ryzngard commented Jun 9, 2022

The motivation for adding OperationCanceledNotMatchingCancellationTokenException that is wrapping ConnectionLostException or ObjectDisposedException in #61792 is to distinguish OCEs that should not be reported in NFEs from regular OCEs.

Are there any other cases when we are wrapping exceptions to OCE?

If we want to avoid reporting specific exceptions as faults we should filter there instead of FatalError. Filtering at that level would also result in it being removed from ETW traces and other logs right?

The goal of this is specifically around fault reporting. OCE is resulting in millions of events of unactionable data and bucketed in ways we can't root cause. This will, at the very least, give us a clearer picture of root causing.

@tmat
Copy link
Copy Markdown
Member

tmat commented Jun 9, 2022

OCE is resulting in millions of events of unactionable data

Yes, some of these (maybe most?) because we are wrapping ConnectionLostException in OCE today and then reporting it form FatalError.ReportUnlessCancelled(Exception, CancellationToken) because the cancellation token is not signaled in this case.

If we just unwrap this OCE and do nothing else we will continue having these unactionable NFEs, just with ConnectionLostException instead of OCE

@tmat
Copy link
Copy Markdown
Member

tmat commented Jun 12, 2022

Once #61792 is merged I think we shouldn't be seeing any OCE wrapping another exception being reported via NFE.

@ryzngard
Copy link
Copy Markdown
Contributor Author

Once #61792 is merged I think we shouldn't be seeing any OCE wrapping another exception being reported via NFE.

I think we should still take this change though. Having the inner exception unwrapped for OCE is more useful than ever reporting OCE for faults

@tmat
Copy link
Copy Markdown
Member

tmat commented Jun 13, 2022

Maybe, but there are no other cases where we wrap an exception to OCE in Roslyn.

@ryzngard
Copy link
Copy Markdown
Contributor Author

Maybe, but there are no other cases where we wrap an exception to OCE in Roslyn.

yet :)

@ryzngard
Copy link
Copy Markdown
Contributor Author

/azp run

@ryzngard ryzngard enabled auto-merge (squash) July 27, 2022 21:04
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@ryzngard ryzngard merged commit 620a86b into dotnet:main Sep 22, 2022
@ghost ghost added this to the Next milestone Sep 22, 2022
@Cosifne Cosifne modified the milestones: Next, 17.4 P3 Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants