Skip to content

Conversation

@lateralusX
Copy link
Member

Fixing issue #71143.

Happens in full AOT mode where methods have been compiled as direct calls (happens within an assembly). In that case, the method is not known by the runtime (not present in the JIT info table). When a new class is loading, the class_loading profiler callback is called. The EventPipe implementation of that callback will take a stackwalk (inline with CoreCLR behavior), but taking a stackwalk will look up each managed function on the callstack against the JIT info table, if its not there it will load it, that in turn could case additional recursive class_loading event to fire, that will again take the same callstack, ending up in an infinite recursion.

The issue is even a little more problematic, since it turns out you cannot do anything inside the class_loading callback that might end up triggering a class load, since that might lead up to recursively loading the same type, that in turn will assert when outer class_loading call returns and tries to add it into the loaded class cache.

Fix will harden the stackwalking logic inside EventPipe to detect recursive stackwalks (doing a stackwalk triggering another stackwalk) and if that is the case, the stackwalk will be done using the async_context stackwalking capabilities since that won't trigger any additional profiler events, breaking recursion.

Fix also adds logic to the class_loading profiler callback in EventPipe to prevent recursion making sure it won't trigger any additional profiler events, leading up to potential issues loading the same class twice, that will cause an assert inside the class loader logic.

For 1:1 mapping of old Mono profiler events issued into EventPipe, they have been aligned with old Mono log profiler events callstack behaviours, meaning it will only emit callstacks on a few limited profiler events, mitigating the issues seen in the runtime profiler provider.

Happens in full AOT mode where methods have been compiled as direct calls
(happens within an assembly). In that case, the method is not known by
the runtime (not present in the JIT info table). When a new class is
loading, the class_loading profiler callback is called. The EventPipe
implementation of that callback will take a stackwalk
(inline with CoreCLR behavior), but taking a stackwalk will look
up each managed function on the callstack against the JIT info table,
if its not there it will load it, that in turn could case additional
recursive class_loading event to fire, that will again take the same
callstack, ending up in an infinite recursion.

The issue is even a little more problematic, since it turns out you
cannot do anything inside the class_loading callback that might end up
triggering a class load, since that might lead up to recursively loading
the same type, that in turn will assert when outer class_loading call
returns and tries to add it into the loaded class cache.

Fix will harden the stackwalking logic inside EventPipe to detect
recursive stackwalks (doing a stackwalk triggering another stackwalk)
and if that is the case, the stackwalk will be done using the
async_context stackwalking capabilities since that won't trigger
any additional profiler events, breaking recursion.

Fix also adds logic to the class_loading profiler callback in EventPipe
to prevent recursion making sure it won't trigger any additional profiler
events, leading up to potential issues loading the same class twice, that
will cause an assert inside the class loader logic.

For 1:1 mapping of old Mono profiler events issued into EventPipe,
they have been aligned with old Mono log profiler events callstack
behaviours, meaning it will only emit callstacks on a few limited
profiler events, mitigating the issues seen in the runtime profiler
provider.
@lateralusX lateralusX merged commit 15f67a2 into dotnet:main Jul 4, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants