Improve Eventing Fundamentals

This issue is tracking investigation and potential work on improving eventing fundamentals in .NET.

**User Statement**: As a user, I should be able to easily enable and collect profiling and event data from my .NET application with minimal overhead.

- [X] Make C implementation of EventPipe the default (#46079) (#47665)
- [X] Aligned reader causes latency in dispatching events live (now tracked in https://github.com/microsoft/perfview/issues/1447)
  - [x] Turn DiagnosticPort tests back on (TraceEvent issue is breaking test) (#44072). A bandaid fix would be to turn on the sample profiler so that data continues to flow and we don't get broken by an unaligned block size.
  - [X] ~Fix issue in TraceEvent~ OOB fix in trace event, see: https://github.com/microsoft/perfview/issues/1447
- [x] Improve EventPipe high core count CPU scalability  #dotnet/diagnostics/issues/1412
- [x] Fix libcoreclr.so!EventPipeInternal::GetNextEvent high CPU use (#43985)

---

Potential future work (None are considered committed work or in scope for 6):

- Fix Deadlock in EventPipe when reader tries to take EventPipe config lock while tracing self EventPipe (#1892)
- Fix EventCounter events not filtered by EventSource/EventListener relationship (#31927)
- Fix assert in `gcstress-extra` CI (#47698)
- Reduce or remove impact of Rundown when collecting traces via EventPipe
  - Currently, users must collect a sequence of events called Rundown that contain information like loaded modules and IP -> Method Name mappings
  - Rundown requires indexing the code manager table to get symbols for all IPs.  This prevents the JIT from... JIT-ing.
  - Sufficiently large processes (by method/type count) can take long amounts of time to send Rundown.
  - We have historically run into issues with self-tracing processes if the act of reading Rundown events causes something to JIT.
- Quantify limits of EventPipe throughput in resource constrained environments
  - We don't currently document any limits or expected throughput of EventPipe.  While we may choose not to document values like this, we should still have remedial advice for situations where events are being dropped.
- Quantify overhead of CPU sample profiling via EventPipe
- Determine if we can mitigate safe-point bias in SampleProfiler
  - Samples are currently collected by suspending the runtime using the same infrastructure as the GC.  This means that suspension on each thread will defer to a "safe point" in the code for the GC--typically this will be on method return.  This will bias sampling towards the second-most leaf frame in the stack.  For example, if method A calls method B which calls method C in a tight loop and method C does not have any potential safe points in it, then samples will be biased to show stacks containing A->B.  The suspend could have happened in C, but the nearest safe point ends up being on its return.
- Determine if it is possible to include mixed-mode stacks in SampleProfiler
  - Currently, SampleProfiler is only capable of sending managed stacks without any native frames.
- Allow configuring SampleProfiler frequency
  - SampleProfiler defaults to a 1ms sample rate.  Some applications of profiling data don't require this level of resolution and could benefit from the reduced throughput of a lower sample frequency.


CC @tommcdon @sywhang @noahfalk @shirhatti 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Eventing Fundamentals #45518

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve Eventing Fundamentals #45518

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions