Improve CI resilency against connectivity errors#7974
Improve CI resilency against connectivity errors#7974NachoEchevarria merged 9 commits intomasterfrom
Conversation
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (7974) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7974) - mean (68ms) : 67, 70
master - mean (68ms) : 67, 70
section Bailout
This PR (7974) - mean (72ms) : 71, 74
master - mean (72ms) : 71, 73
section CallTarget+Inlining+NGEN
This PR (7974) - mean (1,006ms) : 965, 1046
master - mean (1,007ms) : 964, 1050
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7974) - mean (106ms) : 104, 108
master - mean (105ms) : 103, 108
section Bailout
This PR (7974) - mean (107ms) : 106, 108
master - mean (107ms) : 106, 108
section CallTarget+Inlining+NGEN
This PR (7974) - mean (711ms) : 679, 742
master - mean (712ms) : 677, 747
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7974) - mean (94ms) : 91, 97
master - mean (93ms) : 92, 95
section Bailout
This PR (7974) - mean (94ms) : 93, 95
master - mean (94ms) : 93, 95
section CallTarget+Inlining+NGEN
This PR (7974) - mean (667ms) : 646, 687
master - mean (671ms) : 624, 719
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7974) - mean (92ms) : 90, 94
master - mean (92ms) : 89, 95
section Bailout
This PR (7974) - mean (93ms) : 92, 95
master - mean (93ms) : 91, 94
section CallTarget+Inlining+NGEN
This PR (7974) - mean (629ms) : 614, 645
master - mean (631ms) : 617, 644
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7974) - mean (195ms) : 189, 200
master - mean (193ms) : 189, 198
section Bailout
This PR (7974) - mean (197ms) : 194, 200
master - mean (196ms) : 194, 199
section CallTarget+Inlining+NGEN
This PR (7974) - mean (1,124ms) : 1051, 1197
master - mean (1,111ms) : 1061, 1162
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7974) - mean (277ms) : 271, 283
master - mean (277ms) : 272, 281
section Bailout
This PR (7974) - mean (278ms) : 274, 281
master - mean (277ms) : 274, 281
section CallTarget+Inlining+NGEN
This PR (7974) - mean (906ms) : 853, 958
master - mean (907ms) : 867, 948
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7974) - mean (271ms) : 264, 278
master - mean (269ms) : 265, 274
section Bailout
This PR (7974) - mean (271ms) : 265, 276
master - mean (270ms) : 266, 273
section CallTarget+Inlining+NGEN
This PR (7974) - mean (897ms) : 841, 952
master - mean (888ms) : 850, 926
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7974) - mean (278ms) : 267, 289
master - mean (269ms) : 264, 274
section Bailout
This PR (7974) - mean (273ms) : 266, 279
master - mean (269ms) : 265, 273
section CallTarget+Inlining+NGEN
This PR (7974) - mean (834ms) : 806, 862
master - mean (826ms) : 802, 849
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BenchmarksBenchmark execution time: 2025-12-23 15:26:14 Comparing candidate commit 5a87c5b in PR branch Found 7 performance improvements and 6 performance regressions! Performance is the same for 157 metrics, 16 unstable metrics. scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net472
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1
scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1
scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice netcoreapp3.1
scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net472
scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0
scenario:Benchmarks.Trace.RedisBenchmark.SendReceive net472
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net472
scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net6.0
|
| # GitHub now requires TLS1.2. In PowerShell, run the following | ||
| [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 | ||
| Invoke-WebRequest "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-windows-x86_64.exe" -OutFile "${{ parameters.dockerComposePath }}" | ||
|
|
There was a problem hiding this comment.
Using this would avoid all these lines, but this is only supported on newer versions, using pwsh, which is not supported by some agents and makes the CI to fail.
Invoke-WebRequest -Uri <Uri> -OutFile dotnet-install.ps1 -MaximumRetryCount $(PWSH_MAX_RETRY_COUNT) -RetryIntervalSec $(PWSH_RETRY_INTERVAL_SEC) -TimeoutSec $(PWSH_TIMEOUT_SEC)
There was a problem hiding this comment.
Maybe we should look at ensuring we have pwsh installed on the agents in that case 😄
There was a problem hiding this comment.
I think that we should probably run this script when creating the Windows VMs.
if (-not (Get-Command pwsh -ErrorAction SilentlyContinue)) {
Write-Host "Installing PowerShell 7..."
Invoke-WebRequest https://github.com/PowerShell/PowerShell/releases/download/v7.4.6/PowerShell-7.4.6-win-x64.msi -OutFile pwsh.msi
Start-Process msiexec -ArgumentList "/i pwsh.msi /quiet /norestart" -Wait
Remove-Item pwsh.msi
} else {
Write-Host "PowerShell 7 already installed"
}
This would enable the pwsh feature. I can add these lines to the VM scripts in the doc so we update that the next time. In the meantime, we can use this and simplify the call after the machines generation. WDYT?
andrewlock
left a comment
There was a problem hiding this comment.
Thanks, good idea 👍
I think the description is maybe not right though, because AFAICT, we're not using pwsh are we?
| # GitHub now requires TLS1.2. In PowerShell, run the following | ||
| [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 | ||
| Invoke-WebRequest "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-windows-x86_64.exe" -OutFile "${{ parameters.dockerComposePath }}" | ||
|
|
There was a problem hiding this comment.
Maybe we should look at ensuring we have pwsh installed on the agents in that case 😄
Right, thanks! I have updated the description |
Summary of changes
Add retry logic to all external network operations in Azure Pipelines to improve CI reliability and reduce flaky build failures caused by transient network errors. This kind of errors are intermitent. While some jobs failend, very similar ones passed.
GH lately seems to have connection issues like this or this:
This PR improves our pipeline resilency against this type of errors.
Reason for change
Recent CI failures show transient network errors causing build failures:
Unable to connect to the remote serverwhen downloading docker-composeFailed to connect to github.com port 443when cloning repositoriesThese errors are typically transient and succeed on retry. Adding automatic retry logic reduces manual re-runs and improves CI reliability.
Implementation details
Git Operations
git -c http.retry=5 -c http.retryDelay=2to allgit clonecommandsclone-repo.ymlfor both Linux and Windows checkout operationsultimate-pipeline.yml(system-tests clones, master branch fetch)steps/clone-repo.yml(repository checkout operations)Download Operations
steps/install-docker-compose-v1.yml)--retry 5 --retry-delay 2 --retry-connrefused --connect-timeout 30 --max-time 120)Invoke-WebRequest(-MaximumRetryCount 5 -RetryIntervalSec 2) -TimeoutSec 120)API Calls
GitHub API (
ultimate-pipeline.yml)GitHub Status Updates (
steps/update-github-status.yml)Package Installation
ultimate-pipeline.yml, line ~6291)--retries 5 --timeout 120topip install ddapm-test-agentRetry Configuration
All retry configurations use consistent parameters:
Test coverage
pwsh(PowerShell 7+) which is available on all Azure Pipelines agentsOther details