[EventPipe] Remove thread lock and cleanup ThreadSessionState lifecycle #118415

mdh1418 · 2025-08-05T22:00:15Z

This PR removes EventPipe's complex dual-lock synchronization scheme and integrates ThreadSessionState cleanup as part of event reading logic.

Thread Lock Removal

Eliminated the EventPipe thread lock, replacing the complex dual-lock scheme (thread lock + buffer_manager lock) with a single buffer_manager lock
Unified synchronization: Thread list and session_state slot updates now occur atomically under one lock, eliminating race conditions between buffer operations and session state management
Simplified lock-free reads: Removed complex cross-lock coordination requirements

ThreadSessionState Lifecycle Cleanup

Fixed memory leaks: Proper cleanup of ThreadSessionState objects prevents unbounded growth in long-running sessions
Unified resource management: ThreadSessionState and buffer_list now have matching lifetimes, eliminating buffer_list leaks from the previous cleanup logic
Eliminated TSS leak scenarios when threads exit before session disable
Atomic cleanup operations: Single-lock model enables proper cleanup without cross-lock race conditions
SequencePoints are updated when a ThreadSessionState is being removed

Slight Event Reading Optimization

Added snapshotting of known EventPipeThreadSessionStates to prevent iterating over Threads whose buffers are guaranteed to not have the earliest events

Testing

Using the original repro

Throwing and catching 1000 exceptions (warm up JIT)

Throwing and catching 5000000 exceptions
RSS: 51.7MiB, user CPU: 100.4%
20000000 EventPipe events received, 5000000 exceptions thrown

Spawning 5000 short-lived threads
RSS: 60.3MiB, user CPU: 119.0%
20000 EventPipe events received, 5000 exceptions thrown

Throwing and catching 5000000 exceptions
RSS: 54.3MiB, user CPU: 101.5%
20000000 EventPipe events received, 5000000 exceptions thrown

Performance

Adapted https://github.com/dotnet/diagnostics/tree/main/src/tests/EventPipeStress to also stress the native in-proc event listener EventPipeSession.
Testing a matrix of Session Type X Thread Count X Exceptions/Thread
Session Type: IPC Streaming, Native In-Proc
Thread Count: 1, 8, 16, 50, 100, 1000
Exceptions/Thread: 1, 2, 100, 1000

Note: The local stress test that I put together primarily tests an immediate burst in events being written to EventPipe, which doesn't reflect the behavior of any steady state behavior.

Native InProc

Baseline vs PR

This PR overall performs better in fewer events dropped, especially considering the high thread x high exceptions/thread scenarios.

IPC Streaming

Baseline VS PR

This PR performs similarly, with slight drop (0.8%) in events read for high threads x high exceptions/thread.

EventPipe buffer_manager code had this complex synchronization scheme where some state was protected by a thread lock and other state was protected by the buffer_manager lock. Removing the lock introduces a little extra lock-free behavior, but overall I think it will be a simplification. 1. Previously the thread list in the buffer manager and the thread->session_state slots were updated under separate locks at different times. It was also possible that a TSS (ThreadSessionState) would never get added to the list. The function ep_session_remove_dangling_session_states was designed as a bandaid to clean that up, but it still didn't handle the case where the thread exits prior to the session being disabled which appears to permanently leak the TSS. Now the session_state slots and buffer_manager thread list are always updated together. ep_buffer_manager_deallocate_buffers will clean up all all thread session state at session disable time at the latest. 2. Previously buffer_lists had different lifetimes than TSS and they were getting leaked when ep_buffer_manager_write_all_buffers_to_file_v4 did cleanup. Now buffer_list is allocated and freed at the same as TSS so it doesn't require separate explicit cleanup. 3. Previously TSS->write_buffer was set under a different lock than buffer_list->tail_buffer which created a window where tail_buffer was the only one updated. Now they are updated atomically and that window no longer exists. This simplifies try_convert_buffer_to_read_only because there is no longer a possibility of failure. 4. We've been interested in cleaning up the TSS objects more eagerly to prevent unbounded leaks on long-running sessions. Doing this under the existing two-lock scheme would have been aggravating because we need to test if the buffers are empty under the buffer_manager lock, then do the session_state slot clear and TSS deletion under the thread lock. The locks didn't support being taken at the same time and as soon as one thread exited the buffer_manager_lock it was possible to race with another thread wanting to do the same cleanup. Now that there is only one lock the cleanup can happen atomically without needing to add yet another synchronization mechanism. 5. I removed buffer_manager_suspend_event rather than adjust its locking scheme. The method didn't do what its name implied, instead it converted writeable buffers to read-only buffers without suspending anything. There is no need to convert those buffers eagerly because the existing event reading code will automatically convert them on-demand.

Copilot

Pull Request Overview

This PR eliminates EventPipe's complex dual-lock synchronization scheme by removing the per-thread lock and implementing proper ThreadSessionState lifecycle cleanup with optimized SequencePoint tracking for better resource management.

Key changes:

Removes EventPipe thread locks, replacing dual-lock scheme with single buffer_manager lock
Implements proper ThreadSessionState cleanup to prevent memory leaks in long-running sessions
Adds EventPipeThreadSequencePointTrackState enum for optimized SequencePoint tracking

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/native/eventpipe/ep.c	Adds session type check for process info logging
src/native/eventpipe/ep-types.h	Adds thread ID container type definitions
src/native/eventpipe/ep-types-forward.h	Defines EventPipeThreadSequencePointTrackState enum for thread lifecycle tracking
src/native/eventpipe/ep-thread.h	Updates thread synchronization model and session state management
src/native/eventpipe/ep-thread.c	Implements new thread session state lifecycle and removes thread locks
src/native/eventpipe/ep-session.h	Adds session_type getter
src/native/eventpipe/ep-session.c	Removes dangling session state cleanup logic
src/native/eventpipe/ep-event-instance.c	Removes thread reference counting from sequence points
src/native/eventpipe/ep-buffer.h	Updates lock requirements documentation
src/native/eventpipe/ep-buffer.c	Updates buffer conversion synchronization
src/native/eventpipe/ep-buffer-manager.h	Updates buffer manager structure and removes suspend functionality
src/native/eventpipe/ep-buffer-manager.c	Major refactoring of buffer management with unified lock model
src/native/eventpipe/ep-block.c	Updates sequence point block initialization for new thread ID handling

Comments suppressed due to low confidence (1)

src/native/eventpipe/ep-buffer-manager.c:577

The buffer_manager_requires_lock_held assertion is checking the wrong lock. This function is called from buffer_manager_allocate_buffer_for_thread which holds the buffer manager lock, but the assertion appears to be in a context where the lock may not be held.

src/native/eventpipe/ep-buffer-manager.c

lateralusX · 2025-08-06T08:29:18Z

By moving from two locks to one, did you analyze any potential additional lock contention and its effect on EventPipe performance? Buffer manager lock is a central lock for a session, meaning that a lot of parallel activities will queue up on this lock more frequently due to increase need to lock buffer manager. Just looking at the buffer_manager_advance_to_non_empty_buffer we reach for the buffer manager lock multiple times where the old code only took buffer manager lock when a buffer needed to be deleted after been fully consumed. I know there has been a lot of optimizations that taken place over the years to reduce the contention on buffer manager lock to improve performance in EventPipe. Just wanted to make sure we don't introduce any major performance regression by increasing the use of buffer manager lock in some critical code paths.

src/native/eventpipe/ep-thread.c

src/native/eventpipe/ep-buffer-manager.c

src/native/eventpipe/ep-thread.h

src/native/eventpipe/ep-buffer-manager.c

src/native/eventpipe/ep-thread.c

src/native/eventpipe/ep-thread.h

If the writer thread isn't currently writing an event, we should still convert the buffer to read-only

- Set thread writing_event_in_progress flag for rundown ep_buffer_manager_write_event now asserts that the thread's writing_event_in_progress flag is set. - Remove buffer free state assertion When sessions are disabled or abruptly disconnected, not all buffers may have been converted to readonly. log_process_info_event may never be read by the in-proc listener session and ep_buffer_alloc failure are both examples of when a buffer would be freed yet not have been converted to read-only. Now that converting buffers to readonly yields for in-progress event writing to complete, we shouldn't be deleting a buffer that a writer thread is writing into. - Allow thread to set a NULL session state As part of thread_session_state removal logic, a NULL pointer can be written int oa thread's session_state slot. - Allow reader thread to check a thread session state's write_buffer As part of converting buffers to readonly, the reader thread yields for writer threads to finish writing their event. So the assertion that only writing threads will access a thread session state's write buffer no longer holds.

session_state was switched to volatile in an earlier commit to remove the thread_lock

- Remove unnecessary buffer_manager field from ThreadSessionState The buffer_manager can be obtained through the session field - Remove unnecessary thread_holder field from BufferList The owning ThreadSessionState already holds a reference to the thread. Now that the BufferList's lifetime is tied to the ThreadSessionState, the BufferList's thread_holder is no longer needed. - Reuse ep_buffer_get_volatile_state over manual state load

When the buffer_manager iterates over events across threads, it tracks the current EventPipeEventInstance, EventPipeBuffer, and EventPipeBufferList. In preparation to remove ThreadSessionStates from the buffer_manager's thread_session_state_list, tracking the current ThreadSessionState instead of the current EventPipeBufferList will allow for direct removal, as they are 1:1. ep_buffer_manager_write_all_buffers_to_file_v4 adjusts sequence points tss:sequence_number mappings whenever events have been read from the tss. Instead of having the tss' underlying buffer_list track the last read sequence point, move the counter to the tss itself.

In preparation to cleanup ThreadSessionStates, adjust the SequencePoint map of Thread sequence numbers to map the EventPipeThread directly instead of the ThreadSessionState.

Former ThreadSessionState cleanup only accounted for select scenarios, such as session disablement or write_all_buffers_to_file_v4. This commit aims to integrate ThreadSessionState cleanup as part of the core logic to advance to events over threads so all callsite may benefit. As ThreadSessionStates hold the last read sequence number for that EventPipeThread, it is important to update current and future SequencePoints that have that thread as an entry before deleting the ThreadSessionState

lateralusX · 2026-01-08T08:11:42Z

@mdh1418 I know you been doing some performance regression testing on this PR that indicates that this change regress events/s and increase number of dropped events so we should follow up if we need to optimize this further or change the implementation to reduce performance regression.

noahfalk

This looks pretty nice overall. I did notice a couple potential use-after-free race conditions that we should certainly shore up and we might need to make some perf adjustments but I think this is close.

src/native/eventpipe/ep-thread.c

src/native/eventpipe/ep-buffer-manager.c

…8027) ## Summary of changes Creates a .NET 6+ only implementation of `IRuntimeMetricsListener` that uses the `System.Diagnostics.Metrics` (and other) APIs ## Reason for change .NET Core (probably all versions, but at least .NET 6+) has a memory leak with the event pipes, which means if we enable runtime metrics, we likely have a slow memory leak 😬 [This was raised ~1 year ago with .NET team](dotnet/runtime#111368), specifically citing dd-trace-dotnet. but doesn't have a fix yet. Also a PR has been open on the .NET repo with a tentative fix for ~2 months, so _at best_ this _might_ be fixed in .NET 11. Separately, the `System.Diagnostics.Metrics` APIs were introduced in .NET 6, with support for aspnetcore-based metrics added in .NET 8, and support for "runtime" metrics in .NET 9. This PR introduces a new (experimental for now) `IRuntimeMetricsListener` implementation that doesn't use `EventListener`, and instead uses the `System.Diagnostics.Metrics` APIs, aiming to provide essentially the same runtime metrics we currently do, just using a different source. ## Implementation details - Created a new `IRuntimeMetricsListener` implementation, `DiagnosticsMetricsRuntimeMetricsListener` - Added a config to enable it in .NET 6+ only, `DD_RUNTIME_METRICS_DIAGNOSTICS_METRICS_API_ENABLED` - Open to suggestions here. Other options include having an "enum" type for listener instead of just this one. That's harder to consume for customers, but more extensible theoretically. - Added tests To give as wide compatibility as possible, and to avoid any additional overhead, whenever the built-in runtime metrics use existing APIs (e.g. via `GC` calls), we use those instead of the metrics. In summary: Thread metrics: - `runtime.dotnet.threads.workers_count`: via `ThreadPool.ThreadCount` (same as `RuntimeEventListener`) - `runtime.dotnet.threads.contention_count: via `Monitor.LockContentionCount` GC metrics: - `runtime.dotnet.gc.size.gen#` from info in `GC.GetGCMemoryInfo()`, which mirrors [the built-in approach](https://github.com/dotnet/runtime/blob/v10.0.1/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RuntimeMetrics.cs#L185). - `runtime.dotnet.gc.memory_load` was a tricky one as the built-in uses a new API, but I think the info we get in `GC.GetGCMemoryInfo()` is broadly good enough - `runtime.dotnet.gc.count.gen#` uses `GC.CollectionCount()`, same as [built-in approach](https://github.com/dotnet/runtime/blob/v10.0.1/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RuntimeMetrics.cs#L159) - `runtime.dotnet.gc.pause_time` this was also a tricky one, more on it below... `runtime.dotnet.gc.pause_time` is a runtime metric that's available in .NET 9, so when we're running in .NET 9 we just use that value. There's actually also a public API introduce in .NET 6, `GetTotalPauseDuration()`, but [it's _only_ available from 6.0.21](dotnet/runtime#87143), so we can't directly reference it. Resorted to using a simple `CreateDelegate` call to invoke it in these cases. We could use duck typing, but didn't seem worth it. If we're running < 6.0.21, there's no feasible way to get the value, so we just don't emit it. ASP.NET Core metrics: - `runtime.dotnet.aspnetcore.requests.current` - `runtime.dotnet.aspnetcore.requests.failed` - `runtime.dotnet.aspnetcore.requests.total` - `runtime.dotnet.aspnetcore.requests.queue_length` - `runtime.dotnet.aspnetcore.connections.current` - `runtime.dotnet.aspnetcore.connections.queue_length` - `runtime.dotnet.aspnetcore.connections.total` Note that the `.total` and `.failed` requests are recorded as _gauges_ (which monotonically increase), which doesn't feel right to me (they should be counters, surely), but that's what `RuntimeEventListener` is using, so we have to stick to the same thing (metric types are global by metric, so we can't change it). It means there's a risk of overflow there, but that's already the case for `RuntimeEventListener` so I guess we just ignore it 🤷‍♂️ I couldn't find a way to get the following metrics at all without using `EventListener`: - `runtime.dotnet.threads.contention_time` ## Test coverage Added unit and integration tests for the listener behavior. I also manually ran an aspnetcore app in a loop with both the `RuntimeEventListener` and the new listener producing metrics (hacked in, we wont ever do this in "normal" execution), and did a manual comparison of the metrics. Overall, the values were in broad agreement (slightly off, due to skew in sampling time) and helped identify some cases where I'd made incorrect assumptions (e.g. aspnetcore `.total` metrics are never "reset" to 0. ## Other details Relates to: - #5862 (comment) - dotnet/runtime#111368 - dotnet/runtime#118415 - https://datadoghq.atlassian.net/browse/LANGPLAT-916 --------- Co-authored-by: Steven Bouwkamp <steven.bouwkamp@datadoghq.com>

src/native/eventpipe/ep-buffer-manager.c

src/native/eventpipe/ep-thread.c

src/native/eventpipe/ep-buffer-manager.c

src/native/eventpipe/ep-thread.c

Event processing may be invoked despite no events being available for reading. e.g. event flushing when a session is being disabled Iterating over an empty snapshot list results in the same behavior.

noahfalk

LGTM

src/native/eventpipe/ep-buffer-manager.c

src/native/eventpipe/ep-thread.c

lateralusX

I mainly looked at the threading scenarios around the use of the new locking patterns + dependencies around waiting on writer threads to finish up work before terminating sessions, or cleaning up thread state. Made some comments around that, but nothing popped up as a real issue after code reviewing it. It needs a lot of milage over time to build full confidence, but what we have at least appears to be correct.

I didn't look into any depth around the changes around sequence point handling and ordering, but I assume we have test coverage that would at least trigger errors in-case sequence point handling regressed as part of this change.

I made some smaller comments. Great work!

src/native/eventpipe/ep-buffer-manager.c

src/native/eventpipe/ep-buffer-manager.h

src/native/eventpipe/ep-buffer-manager.c

lateralusX · 2026-01-16T11:13:24Z

src/native/eventpipe/ep-session.c

Curious why we needed did this in the past, converting buffers to read only happens when we flush buffers in ep_session_write_all_buffers_to_file and that is called after ep_session_suspend_write_event. I believe it has been like this for a very long time (maybe from the original implementation), so might be something that was originally needed, but then got carried along without specific reasons.

In the past, it used to be responsible for waiting for all threads to finish writing to the session's buffer_manager 289d2cf#diff-23d22a915fe109f2546833cdf6e695b5f1d2d0def61e5ec4dc389614a3f7ff32

Today, the method no longer suspends, and we no longer need to eagerly convert buffers to read-only since that's done as part of the event processing loop

src/native/eventpipe/ep-buffer.c

src/native/eventpipe/ep.c

src/native/eventpipe/ep-session.c

mdh1418 requested review from Copilot, lateralusX and noahfalk August 5, 2025 22:00

mdh1418 added this to the 11.0.0 milestone Aug 5, 2025

github-actions bot added the area-Tracing-coreclr label Aug 5, 2025

dotnet-policy-service bot assigned mdh1418 Aug 5, 2025

mdh1418 mentioned this pull request Aug 5, 2025

[EventPipe] Optimize ThreadSessionState Management and Cleanup #118029

Closed

Copilot AI reviewed Aug 5, 2025

View reviewed changes

mdh1418 force-pushed the ep_remove_thread_lock_and_optimize_tss_cleanup branch 2 times, most recently from d184ddb to 323d1fe Compare August 5, 2025 22:57

This was referenced Aug 6, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

noahfalk reviewed Oct 15, 2025

View reviewed changes

andrewlock mentioned this pull request Jan 2, 2026

Create System.Diagnostics.Metrics-based runtime metrics listener DataDog/dd-trace-dotnet#8027

Merged

mdh1418 added 2 commits January 2, 2026 20:55

[EventPipe] Fix convert_buffer_to_read_only

9c2b736

If the writer thread isn't currently writing an event, we should still convert the buffer to read-only

mdh1418 mentioned this pull request Jan 6, 2026

[EventPipe] BufferManager Event Heap #122929

Draft

mdh1418 force-pushed the ep_remove_thread_lock_and_optimize_tss_cleanup branch 2 times, most recently from 628668f to aa19e3b Compare January 7, 2026 03:36

This was referenced Jan 7, 2026

Cannot find 'arm64-v8a' device dotnet/dnceng#2284

Open

iOS.Device test WorkItemExecutions #122874

Open

mdh1418 added 7 commits January 7, 2026 22:18

Fix inconsistent volatile qualifier

daf7143

session_state was switched to volatile in an earlier commit to remove the thread_lock

Guard ep_buffer_convert_to_read_only under buffer_manager lock

1fe4ccf

Fix spacing and update comments

09a6577

Refactor write_all_buffers_to_file_v4 loop

afe42a5

Store thread in SequencePoint Map

6674546

In preparation to cleanup ThreadSessionStates, adjust the SequencePoint map of Thread sequence numbers to map the EventPipeThread directly instead of the ThreadSessionState.

mdh1418 force-pushed the ep_remove_thread_lock_and_optimize_tss_cleanup branch from aa19e3b to 285e3a3 Compare January 8, 2026 03:31

noahfalk reviewed Jan 8, 2026

View reviewed changes

mdh1418 added 3 commits January 9, 2026 15:45

Add minimal snapshot optimization

f5b2893

Address feedback

fc31c98

Update more comments

2f57141

noahfalk reviewed Jan 10, 2026

View reviewed changes

src/native/eventpipe/ep-buffer-manager.c Outdated Show resolved Hide resolved

src/native/eventpipe/ep-thread.c Outdated Show resolved Hide resolved

mdh1418 added 2 commits January 12, 2026 16:29

Extend snapshot to all reader thread variations

34061e2

Synchronize thread unregister with session disable

ad4ef72

noahfalk reviewed Jan 13, 2026

View reviewed changes

src/native/eventpipe/ep-buffer-manager.c Show resolved Hide resolved

src/native/eventpipe/ep-buffer-manager.c Show resolved Hide resolved

src/native/eventpipe/ep-thread.c Outdated Show resolved Hide resolved

src/native/eventpipe/ep-thread.c Outdated Show resolved Hide resolved

mdh1418 added 3 commits January 13, 2026 13:38

Add snapshot threads helper

204616b

Add context for reader thread signaling

8049191

Allow iterating over empty thread snapshots

161a11a

Event processing may be invoked despite no events being available for reading. e.g. event flushing when a session is being disabled Iterating over an empty snapshot list results in the same behavior.

noahfalk approved these changes Jan 14, 2026

View reviewed changes

src/native/eventpipe/ep-buffer-manager.c Outdated Show resolved Hide resolved

src/native/eventpipe/ep-thread.c Outdated Show resolved Hide resolved

mdh1418 added 2 commits January 14, 2026 21:46

Restrict scope of current_timestamp_boundary

c489497

Keep session alive to signal reading

3633a3a

build-analysis bot mentioned this pull request Jan 15, 2026

System.Transactions.Tests.OleTxTests.Recovery timeout #122621

Open

lateralusX approved these changes Jan 16, 2026

View reviewed changes

Add more clarifying comments

104c226

[EventPipe] Remove thread lock and cleanup ThreadSessionState lifecycle #118415

Are you sure you want to change the base?

[EventPipe] Remove thread lock and cleanup ThreadSessionState lifecycle #118415

Uh oh!

Conversation

mdh1418 commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thread Lock Removal

ThreadSessionState Lifecycle Cleanup

Slight Event Reading Optimization

Testing

Performance

Native InProc

IPC Streaming

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lateralusX commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lateralusX commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noahfalk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noahfalk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lateralusX left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lateralusX Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

mdh1418 Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

mdh1418 commented Aug 5, 2025 •

edited

Loading

lateralusX commented Aug 6, 2025 •

edited

Loading

lateralusX commented Jan 8, 2026 •

edited

Loading