[Runtime Metrics] Support for OTLP Runtime Metrics#8457
Conversation
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8457) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8457) - mean (73ms) : 69, 77
master - mean (72ms) : 70, 75
section Bailout
This PR (8457) - mean (77ms) : 75, 79
master - mean (77ms) : 75, 79
section CallTarget+Inlining+NGEN
This PR (8457) - mean (1,086ms) : 1033, 1139
master - mean (1,081ms) : 1031, 1132
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8457) - mean (114ms) : 108, 121
master - mean (113ms) : 109, 118
section Bailout
This PR (8457) - mean (115ms) : 111, 119
master - mean (119ms) : 112, 126
section CallTarget+Inlining+NGEN
This PR (8457) - mean (785ms) : 763, 808
master - mean (783ms) : 749, 816
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8457) - mean (103ms) : 96, 109
master - mean (104ms) : 98, 110
section Bailout
This PR (8457) - mean (105ms) : 100, 110
master - mean (104ms) : 99, 110
section CallTarget+Inlining+NGEN
This PR (8457) - mean (942ms) : 902, 983
master - mean (942ms) : 905, 980
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8457) - mean (100ms) : 97, 103
master - mean (100ms) : 97, 103
section Bailout
This PR (8457) - mean (102ms) : 99, 105
master - mean (104ms) : 98, 111
section CallTarget+Inlining+NGEN
This PR (8457) - mean (832ms) : 794, 870
master - mean (824ms) : 788, 860
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8457) - mean (206ms) : 194, 218
master - mean (209ms) : 197, 221
section Bailout
This PR (8457) - mean (211ms) : 199, 224
master - mean (214ms) : 202, 226
section CallTarget+Inlining+NGEN
This PR (8457) - mean (1,231ms) : 1182, 1280
master - mean (1,255ms) : 1179, 1332
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8457) - mean (303ms) : 284, 321
master - mean (298ms) : 277, 319
section Bailout
This PR (8457) - mean (305ms) : 282, 328
master - mean (298ms) : 281, 315
section CallTarget+Inlining+NGEN
This PR (8457) - mean (996ms) : 967, 1025
master - mean (983ms) : 944, 1022
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8457) - mean (298ms) : 279, 317
master - mean (299ms) : 273, 325
section Bailout
This PR (8457) - mean (301ms) : 276, 326
master - mean (300ms) : 274, 326
section CallTarget+Inlining+NGEN
This PR (8457) - mean (1,177ms) : 1130, 1223
master - mean (1,193ms) : 1123, 1263
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8457) - mean (296ms) : 272, 320
master - mean (309ms) : 271, 347
section Bailout
This PR (8457) - mean (295ms) : 278, 312
master - mean (312ms) : 268, 356
section CallTarget+Inlining+NGEN
This PR (8457) - mean (1,062ms) : 987, 1136
master - mean (1,121ms) : 998, 1244
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BenchmarksBenchmark execution time: 2026-05-05 21:50:33 Comparing candidate commit 31c69cd in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 27 metrics, 0 unstable metrics, 57 known flaky benchmarks, 30 flaky benchmarks without significant changes.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b582a23e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…race-dotnet into maximo/otlp-runtime-metrics
bouwkast
left a comment
There was a problem hiding this comment.
Haven't gone through everything yet but will pick it up again tomorrow
andrewlock
left a comment
There was a problem hiding this comment.
LGTM in general, some relatively minor suggestions and questions in general.
While reviewing this I had a realization - OpenTelemetry themselves add polyfills for many of these... e.g.
What happens if you have the same pollyfilled values, with the same meter and instrument names? 🤔 Do we need to handle that explicitly, or does the runtime handled it? And have you compared our implementation to theirs?
| // indicates a counter reset (e.g. process restart). Report currentValue | ||
| // as the delta, as if the previous cumulative was 0. | ||
| // ObservableUpDownCounter is non-monotonic so negative deltas are expected. | ||
| if (delta < 0 && InstrumentType is InstrumentType.ObservableCounter) |
There was a problem hiding this comment.
Just a thought, and not strictly related to this PR: Do we need a similar guard somewhere for negative values in InstrumentTrype.Counter too? 🤔 I don't think it needs to be right here (because this method is all about observable counters), but I believe we could have the same wrapping case to handle.
I think we may also need to handle this in our statsd case too - I'll look into it!
| // ObservableUpDownCounter is non-monotonic so negative deltas are expected. | ||
| if (delta < 0 && InstrumentType is InstrumentType.ObservableCounter) | ||
| { | ||
| delta = _runningDoubleValue; |
There was a problem hiding this comment.
Had to work this one through to convince myself it's correct, so included here so others don't need to 😅
In an overflow scenario - given that you would have (for example)
_runningDoubleValueis small (due to overflow) e.g. 10_lastObservedCumulativeis huge e.g. 1,000,000,000,000
Then
_delta = _runningDoubleValue - previousCumulative~-1,000,000,000,000
So we hit this branch and set
_delta = _runningDoubleValue=~10
so LGTM!
There was a problem hiding this comment.
Aay thank you for adding this investigation here!
zacharycmontoya
left a comment
There was a problem hiding this comment.
I have one confusion that needs clarifying, otherwise this looks good and the polyfill to implement the metrics for pre-.NET 9 is nice 👌🏼
bouwkast
left a comment
There was a problem hiding this comment.
Sorry for the delay!
Just one comment from me but I think everything else looks good and has been covered by others 👍
zacharycmontoya
left a comment
There was a problem hiding this comment.
LGTM, thanks for all the updates!
…8609) ## Summary of changes Adds explicit handling of `OutOfMemoryException` to runtime metrics polyfill added in #8457 ## Reason for change Doing anything when you have an `OutOfMemoryException` is liable to cause allocation, and trigger a full-on crash. We have guards for this today in `RuntimeMetricsWriter`, so we should probably do something similar with our pollyfill too. ## Implementation details - Add an explicit guard to the handler for `OutOfMemoryException` - Use try-finally to ensure we definitely reset `_handlingFirstChanceException` ## Test coverage Not easy to test in practice, and relatively self explanatory so meh
Summary of changes
Add support for exporting .NET runtime metrics via OTLP, using OpenTelemetry semantic convention names. When both
DD_RUNTIME_METRICS_ENABLED=trueandDD_METRICS_OTEL_ENABLED=true, runtime metrics (System.Runtime) are collected through the existing OTLP metrics pipeline instead of DogStatsD.Related One Pager: Sending Runtime Metrics via OTLP from dd-trace-*
Reason for change
Enables runtime visibility for users on OTLP-native pipelines or without a local Datadog Agent. Aligns .NET runtime metric names with OTel semantic conventions (
dotnet.gc.collections,dotnet.process.cpu.time, etc.).Implementation details
TracerManagerFactory: Disables the StatsDRuntimeMetricsWriterwhen OTLP runtime metrics are active to prevent duplicate reporting.MetricReaderHandler: Force-enables theSystem.Runtimemeter whenOtlpRuntimeMetricsEnabledis true, even if not listed inDD_METRICS_OTEL_METER_NAMES.RuntimeMetricsPolyfill(new): Provides the .NET 9 nativeSystem.Runtimeinstrument set on .NET 6–8 hosts. Names, units, descriptions, and tag keys mirror dotnet/runtimev9.0.0.MeterObservableUpDownCounterReflection(new): ResolvesMeter.CreateObservableUpDownCounter<T>at runtime so .NET 7/8 hosts (withSystem.Diagnostics.DiagnosticSource ≥ 7.0via roll-forward) emitObservableUpDownCounterfor the 7 point-in-time metrics that match native .NET 9. .NET 6 hosts without DiagnosticSource ≥ 7 fall back toObservableGauge.Datadog.Trace.Trimming.xml: Preserves theSystem.Diagnostics.Metricstypes the polyfill creates (Counter1,ObservableCounter1,ObservableGauge1,ObservableUpDownCounter1) so they survive aggressive trimming/AOT in customer apps.MetricsRuntime: Starts the polyfill on .NET < 9 with thread-safe initialization.MetricPoint: Detects counter resets onObservableCounter(treats negative deltas as resets and reports the current value as the delta).Test coverage
OpenTelemetrySdkTests.SubmitsOtlpRuntimeMetrics(new): end-to-end integration test verifying the OTLP runtime metrics payload via a Verify snapshot. A single snapshot covers all TFMs because the test sample transitively referencesOpenTelemetry1.13.x →System.Diagnostics.DiagnosticSource9.0, so the polyfill produces identical wire output on .NET 6/7/8 (and .NET 9+ uses native runtime metrics with the same wire format). Also asserts that StatsD is not emitted when OTLP runtime metrics are active.TracerSettingsTests.OpenTelemetryMetricsEnabled: parameterized unit test confirming theDD_TRACE_OPEN_TELEMETRY_METRICS_ENABLED(or whatever the actual config key name resolves to) boolean parses correctly with defaultfalse.DD_RUNTIME_METRICS_ENABLED=falseto existingSubmitsOtlpMetricstest to prevent runtime metrics from interfering with custom meter assertions.Verification of
.NET 6in-box behavior (out-of-band)MeterObservableUpDownCounterReflection.TryRegisterwas verified directly against each in-boxSystem.Diagnostics.DiagnosticSource(no OTel package installed), since the integration test masks the .NET 6 in-box scenario by transitively pulling inDiagnosticSource9.0:CreateObservableUpDownCounteroverloadsTryRegisterreturnsfalse→ omit branchtrue→ UpDownCountertrue→ UpDownCountertrue→ UpDownCounterLength == 4pin inFindOverloadcorrectly disambiguates on .NET 8/9 where 5-param overloads (withtags) also exist.Other details
system-testsrepo) , added needDD_RUNTIME_METRICS_ENABLED=falseadded to theirDEFAULT_ENVVARSto avoid unexpected runtime metrics in OTel metric assertions and created new test based on language expectations.