Skip to content

Improve Eventing Fundamentals #45518

@josalem

Description

@josalem

This issue is tracking investigation and potential work on improving eventing fundamentals in .NET.

User Statement: As a user, I should be able to easily enable and collect profiling and event data from my .NET application with minimal overhead.


Potential future work (None are considered committed work or in scope for 6):

  • Fix Deadlock in EventPipe when reader tries to take EventPipe config lock while tracing self EventPipe (Deadlock in EventPipe when reader tries to take EventPipe config lock while tracing self #1892)
  • Fix EventCounter events not filtered by EventSource/EventListener relationship (EventCounter events not filtered by EventSource/EventListener relationship #31927)
  • Fix assert in gcstress-extra CI (Assertion failed in test 'Loader\\binding\\tracing\\BinderTracingTest.Basic\\BinderTracingTest.Basic.cmd' #47698)
  • Reduce or remove impact of Rundown when collecting traces via EventPipe
    • Currently, users must collect a sequence of events called Rundown that contain information like loaded modules and IP -> Method Name mappings
    • Rundown requires indexing the code manager table to get symbols for all IPs. This prevents the JIT from... JIT-ing.
    • Sufficiently large processes (by method/type count) can take long amounts of time to send Rundown.
    • We have historically run into issues with self-tracing processes if the act of reading Rundown events causes something to JIT.
  • Quantify limits of EventPipe throughput in resource constrained environments
    • We don't currently document any limits or expected throughput of EventPipe. While we may choose not to document values like this, we should still have remedial advice for situations where events are being dropped.
  • Quantify overhead of CPU sample profiling via EventPipe
  • Determine if we can mitigate safe-point bias in SampleProfiler
    • Samples are currently collected by suspending the runtime using the same infrastructure as the GC. This means that suspension on each thread will defer to a "safe point" in the code for the GC--typically this will be on method return. This will bias sampling towards the second-most leaf frame in the stack. For example, if method A calls method B which calls method C in a tight loop and method C does not have any potential safe points in it, then samples will be biased to show stacks containing A->B. The suspend could have happened in C, but the nearest safe point ends up being on its return.
  • Determine if it is possible to include mixed-mode stacks in SampleProfiler
    • Currently, SampleProfiler is only capable of sending managed stacks without any native frames.
  • Allow configuring SampleProfiler frequency
    • SampleProfiler defaults to a 1ms sample rate. Some applications of profiling data don't require this level of resolution and could benefit from the reduced throughput of a lower sample frequency.

CC @tommcdon @sywhang @noahfalk @shirhatti

Metadata

Metadata

Assignees

No one assigned

    Labels

    Bottom Up WorkNot part of a theme, epic, or user storyUser StoryA single user-facing feature. Can be grouped under an epic.area-Diagnostics-coreclr

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions