Fix flaky test metrics tag values#8099
Conversation
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8099) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8099) - mean (78ms) : 75, 81
master - mean (75ms) : 72, 79
section Bailout
This PR (8099) - mean (83ms) : 81, 85
master - mean (79ms) : 77, 82
section CallTarget+Inlining+NGEN
This PR (8099) - mean (1,109ms) : 1058, 1160
master - mean (1,075ms) : 1008, 1142
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8099) - mean (125ms) : 121, 130
master - mean (118ms) : 114, 122
section Bailout
This PR (8099) - mean (126ms) : 122, 129
master - mean (119ms) : 117, 122
section CallTarget+Inlining+NGEN
This PR (8099) - mean (809ms) : 763, 855
master - mean (783ms) : 723, 843
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8099) - mean (110ms) : 106, 114
master - mean (105ms) : 101, 108
section Bailout
This PR (8099) - mean (111ms) : 106, 115
master - mean (106ms) : 104, 109
section CallTarget+Inlining+NGEN
This PR (8099) - mean (778ms) : 701, 854
master - mean (758ms) : 683, 833
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8099) - mean (107ms) : 104, 111
master - mean (103ms) : 100, 107
section Bailout
This PR (8099) - mean (108ms) : 105, 112
master - mean (105ms) : 102, 109
section CallTarget+Inlining+NGEN
This PR (8099) - mean (721ms) : 696, 746
master - mean (693ms) : 666, 719
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8099) - mean (206ms) : 196, 216
master - mean (197ms) : 192, 203
section Bailout
This PR (8099) - mean (210ms) : 204, 216
master - mean (200ms) : 197, 204
section CallTarget+Inlining+NGEN
This PR (8099) - mean (1,209ms) : 1129, 1290
master - mean (1,159ms) : 1097, 1222
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8099) - mean (307ms) : 282, 333
master - mean (283ms) : 276, 291
section Bailout
This PR (8099) - mean (308ms) : crit, 283, 334
master - mean (282ms) : 277, 287
section CallTarget+Inlining+NGEN
This PR (8099) - mean (1,019ms) : 903, 1135
master - mean (962ms) : 919, 1005
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8099) - mean (300ms) : 272, 328
master - mean (277ms) : 270, 284
section Bailout
This PR (8099) - mean (299ms) : crit, 269, 330
master - mean (277ms) : 272, 282
section CallTarget+Inlining+NGEN
This PR (8099) - mean (969ms) : 892, 1045
master - mean (941ms) : 878, 1004
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8099) - mean (297ms) : 275, 319
master - mean (280ms) : 269, 291
section Bailout
This PR (8099) - mean (291ms) : 276, 306
master - mean (279ms) : 271, 288
section CallTarget+Inlining+NGEN
This PR (8099) - mean (919ms) : crit, 806, 1033
master - mean (860ms) : 833, 887
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BenchmarksBenchmark execution time: 2026-02-02 14:22:11 Comparing candidate commit dafc31b in PR branch Found 6 performance improvements and 15 performance regressions! Performance is the same for 157 metrics, 14 unstable metrics. scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0
scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net472
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0
scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0
scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1
scenario:Benchmarks.Trace.NLogBenchmark.EnrichedLog net472
scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net6.0
scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan net472
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0
scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin netcoreapp3.1
|
| testCase.DisplayName : | ||
| $"{TestMethod.TestClass.Class.Name}.{testCase.DisplayName}").Trim(); | ||
|
|
||
| if (testFullName.Length > 200) |
There was a problem hiding this comment.
For readability in the future can we just extract 200 to some variable to note that it is the maximum name length? var maxTestNameLength = 200
There was a problem hiding this comment.
Good idea! Done!
Co-authored-by: Steven Bouwkamp <steven.bouwkamp@datadoghq.com>
andrewlock
left a comment
There was a problem hiding this comment.
Thanks! I think we have very-similar code copy-pasted in a bunch of places, we should probably fix those cases too?
e.g.:
- https://github.com/DataDog/dd-trace-dotnet/blob/master/tracer/test/Datadog.Trace.TestHelpers.AutoInstrumentation/ErrorHelpers.cs#L82
- https://github.com/DataDog/dd-trace-dotnet/blob/master/tracer/build/_build/MetricHelper.cs
Thanks!
…/dd-trace-dotnet into nacho/FlakyFullNameTests
Nice catch, thanks! |
Summary of changes
We currently send telemetry metrcis related to flaky tests. These metrics, though, had some issues.
In some tests, the test name already contained the class name. The resulting test name would include repeated class names. For instance: "Datadog.Trace.Tests.Logging.DirectSubmission.Sink.PeriodicBatching.BatchingSinkTests.Datadog.Trace.Tests.Logging.DirectSubmission.Sink.PeriodicBatching.BatchingSinkTests.WhenRunning_AndAnEventIsQueued_ItIsWrittenToABatch"
We are sanitizing tag values by calling TryNormalizeTagName() which is useful for value normalization but adds some unneccessary restrictions. For instance, framewrok values were not being sent because they started with a number, which is fine for tag values but not for tag names. Therefore, TryNormalizeTagName will not be used for framework values.
Sometimes, the test name is not sent because we validate tags before sending them and we don't send tags with a length > 200. Test name has been trucated to 200 to avoid having empty test name tags.
Reason for change
Implementation details
Test coverage
Other details