Automatic retry ASM integration tests under failure#8011
Automatic retry ASM integration tests under failure#8011NachoEchevarria merged 5 commits intomasterfrom
Conversation
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8011) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8011) - mean (69ms) : 67, 70
master - mean (68ms) : 67, 70
section Bailout
This PR (8011) - mean (72ms) : 71, 73
master - mean (72ms) : 71, 73
section CallTarget+Inlining+NGEN
This PR (8011) - mean (1,014ms) : 960, 1067
master - mean (1,007ms) : 967, 1048
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8011) - mean (106ms) : 103, 109
master - mean (106ms) : 103, 109
section Bailout
This PR (8011) - mean (107ms) : 106, 108
master - mean (107ms) : 106, 108
section CallTarget+Inlining+NGEN
This PR (8011) - mean (737ms) : 660, 814
master - mean (726ms) : 661, 791
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8011) - mean (93ms) : 91, 96
master - mean (93ms) : 91, 96
section Bailout
This PR (8011) - mean (94ms) : 93, 95
master - mean (94ms) : 93, 96
section CallTarget+Inlining+NGEN
This PR (8011) - mean (705ms) : 654, 755
master - mean (709ms) : 675, 744
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8011) - mean (92ms) : 90, 94
master - mean (92ms) : 90, 94
section Bailout
This PR (8011) - mean (93ms) : 92, 95
master - mean (93ms) : 92, 94
section CallTarget+Inlining+NGEN
This PR (8011) - mean (635ms) : 614, 657
master - mean (632ms) : 618, 646
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8011) - mean (194ms) : 188, 200
master - mean (193ms) : 190, 197
section Bailout
This PR (8011) - mean (198ms) : 194, 201
master - mean (197ms) : 195, 199
section CallTarget+Inlining+NGEN
This PR (8011) - mean (1,110ms) : 1069, 1150
master - mean (1,119ms) : 1057, 1180
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8011) - mean (277ms) : 272, 283
master - mean (276ms) : 270, 282
section Bailout
This PR (8011) - mean (277ms) : 273, 281
master - mean (278ms) : 272, 283
section CallTarget+Inlining+NGEN
This PR (8011) - mean (925ms) : 837, 1013
master - mean (928ms) : 885, 972
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8011) - mean (271ms) : 265, 277
master - mean (271ms) : 265, 277
section Bailout
This PR (8011) - mean (270ms) : 267, 274
master - mean (270ms) : 267, 274
section CallTarget+Inlining+NGEN
This PR (8011) - mean (919ms) : 867, 972
master - mean (918ms) : 869, 967
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8011) - mean (270ms) : 265, 275
master - mean (270ms) : 265, 275
section Bailout
This PR (8011) - mean (271ms) : 266, 276
master - mean (269ms) : 265, 273
section CallTarget+Inlining+NGEN
This PR (8011) - mean (825ms) : 801, 849
master - mean (825ms) : 809, 842
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BenchmarksBenchmark execution time: 2026-01-07 11:36:08 Comparing candidate commit fe1020a in PR branch Found 3 performance improvements and 14 performance regressions! Performance is the same for 157 metrics, 12 unstable metrics. scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeArgs net6.0
scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0
scenario:Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery net472
scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0
scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog net472
scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog net472
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope net6.0
|
| - script: tracer\build.cmd RunIntegrationTests RunWindowsRegressionTests -Framework $(framework) --code-coverage-enabled $(CodeCoverageEnabled) | ||
| displayName: Run integration tests (ASM) | ||
| condition: eq(variables['area'], 'ASM') | ||
| retryCountOnTaskFailure: 2 |
There was a problem hiding this comment.
Gah, it's so ugly that we have to split these jobs, but I think that's the best we can do 😅
## Summary of changes As a continuation of the work done in #8011 and seeing the same flakes in integration_tests_windows_iis in the ASM tests, we should apply the same principle to these integration tests. ## Reason for change ## Implementation details ## Test coverage ## Other details <!-- Fixes #{issue} --> <!--⚠️ Note: Where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. MergeQueue is NOT enabled in this repository. If you have write access to the repo, the PR has 1-2 approvals (see above), and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #apm-dotnet channel in Slack. -->
Summary of changes
The following jobs have been found flaky, mostly due to a already reported crash condition:
The complete flakiness report can be found here:
https://docs.google.com/spreadsheets/d/1Gftmhb-66Dag4qFEXw9tyXp7U0fOQsdCE2gI-1voWyA/edit?gid=1708590243#gid=1708590243
The affected jobs will automatically be retried once to avoid CI flakiness.
For other teams, the flaky tests have been marked as flaky, which causes automatic retry. In the case of ASM, the failure does not occur on especific tests, so marking some of them would not solve the issue. This change can be reverted once the jobs are more stable.
This PR is part of an initiative of marking the most flaky tests or jobs of all the teams.
Reason for change
Implementation details
Test coverage
Other details