Conversation
tracer/src/Datadog.Trace/Agent/DiscoveryService/DiscoveryService.cs
Outdated
Show resolved
Hide resolved
tracer/src/Datadog.Trace/Agent/DiscoveryService/DiscoveryService.cs
Outdated
Show resolved
Hide resolved
| /// <summary> | ||
| /// Gets or sets the container tags hash received from the agent, used by DBM/DSM | ||
| /// </summary> | ||
| public static string ContainerTagsHash { get; set; } |
There was a problem hiding this comment.
AFAICT, this isn't actually used anywhere? 🤔
There was a problem hiding this comment.
yet 😏
I could use it in the same PR, but I wanted to do baby steps. It's gonna be used in DSM and DBM injection
This comment has been minimized.
This comment has been minimized.
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (7893) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7893) - mean (74ms) : 72, 76
master - mean (73ms) : 70, 77
section Bailout
This PR (7893) - mean (79ms) : 77, 82
master - mean (78ms) : 75, 80
section CallTarget+Inlining+NGEN
This PR (7893) - mean (1,083ms) : 1025, 1141
master - mean (1,065ms) : 1022, 1109
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7893) - mean (119ms) : 114, 123
master - mean (117ms) : 113, 121
section Bailout
This PR (7893) - mean (120ms) : 116, 123
master - mean (118ms) : 114, 123
section CallTarget+Inlining+NGEN
This PR (7893) - mean (797ms) : 728, 866
master - mean (780ms) : 725, 836
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7893) - mean (106ms) : 103, 109
master - mean (102ms) : 98, 106
section Bailout
This PR (7893) - mean (106ms) : 103, 109
master - mean (103ms) : 100, 105
section CallTarget+Inlining+NGEN
This PR (7893) - mean (762ms) : 693, 831
master - mean (741ms) : 662, 820
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7893) - mean (104ms) : 101, 107
master - mean (100ms) : 97, 103
section Bailout
This PR (7893) - mean (105ms) : 102, 109
master - mean (101ms) : 100, 103
section CallTarget+Inlining+NGEN
This PR (7893) - mean (705ms) : 686, 724
master - mean (689ms) : 670, 708
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7893) - mean (199ms) : 193, 206
master - mean (198ms) : 193, 204
section Bailout
This PR (7893) - mean (203ms) : 199, 207
master - mean (201ms) : 198, 205
section CallTarget+Inlining+NGEN
This PR (7893) - mean (1,165ms) : 1118, 1212
master - mean (1,165ms) : 1110, 1219
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7893) - mean (285ms) : 276, 295
master - mean (283ms) : 278, 289
section Bailout
This PR (7893) - mean (286ms) : 280, 292
master - mean (284ms) : 278, 289
section CallTarget+Inlining+NGEN
This PR (7893) - mean (952ms) : 897, 1008
master - mean (966ms) : 917, 1016
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7893) - mean (278ms) : 270, 286
master - mean (276ms) : 269, 282
section Bailout
This PR (7893) - mean (278ms) : 272, 283
master - mean (277ms) : 272, 282
section CallTarget+Inlining+NGEN
This PR (7893) - mean (958ms) : 907, 1008
master - mean (949ms) : 887, 1011
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7893) - mean (277ms) : 269, 285
master - mean (278ms) : 269, 286
section Bailout
This PR (7893) - mean (277ms) : 272, 282
master - mean (277ms) : 272, 282
section CallTarget+Inlining+NGEN
This PR (7893) - mean (865ms) : 829, 901
master - mean (860ms) : 825, 896
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
## Summary of changes
Replaced custom mutex guard with `std::lock_guard`, using
`std::recursive_mutex` instead of `CRITICAL_SECTION` in windows and
`std::mutex` with railings in Linux
## Reason for change
Some locks have been spotted in smoke test wich could be cause by the
lack of thread recursive lock in the `std::mutex`
## Implementation details
## Test coverage
## Other details
<!-- Fixes #{issue} -->
<!-- ⚠️ Note:
Where possible, please obtain 2 approvals prior to merging. Unless
CODEOWNERS specifies otherwise, for external teams it is typically best
to have one review from a team member, and one review from apm-dotnet.
Trivial changes do not require 2 reviews.
MergeQueue is NOT enabled in this repository. If you have write access
to the repo, the PR has 1-2 approvals (see above), and all of the
required checks have passed, you can use the Squash and Merge button to
merge the PR. If you don't have write access, or you need help, reach
out in the #apm-dotnet channel in Slack.
-->
BenchmarksBenchmark execution time: 2026-02-03 13:02:27 Comparing candidate commit 7527eb7 in PR branch Found 10 performance improvements and 11 performance regressions! Performance is the same for 156 metrics, 15 unstable metrics. scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0
scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody net6.0
scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs net6.0
scenario:Benchmarks.Trace.Asm.AppSecWafBenchmark.RunWafRealisticBenchmarkWithAttack net6.0
scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0
scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net6.0
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool netcoreapp3.1
scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1
scenario:Benchmarks.Trace.SerilogBenchmark.EnrichedLog netcoreapp3.1
scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0
scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishScope netcoreapp3.1
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes netcoreapp3.1
scenario:Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin net6.0
|
| AgentHttpHeaderNames.MinimalHeaders, | ||
| BuildHeaders(containerId), | ||
| () => new MinimalAgentHeaderHelper(), |
There was a problem hiding this comment.
If you change one of these, you need to change both of them. Different ones are use for different transports. Which means you likely need a custom MinimalAgentHeaderHelper() here, something like this:
containerId is null
? () => new MinimalAgentHeaderHelper()
: () => new MinimalWithContainerIdHelper()I know, it sucks 😅
There was a problem hiding this comment.
hmm, yeah, not great, I have to do a different type because we use the instance to access a static field :/
btw, do you know why we don't use a normal Lazy in that helper ?
There was a problem hiding this comment.
I really didn't like the idea of duplicating the whole class, so i went another route, and made MinimalAgentHeaderHelper return a different value depending on whether it was given a container ID in the ctor.
tracer/src/Datadog.Trace/Agent/DiscoveryService/DiscoveryService.cs
Outdated
Show resolved
Hide resolved
| /// </summary> | ||
| internal static KeyValuePair<string, string>[] BuildHeaders(string? containerId) | ||
| { | ||
| if (containerId != null) |
There was a problem hiding this comment.
We never "discover" this later, right? i.e. if containerId is non null the first time this is called, it will be non-null for ever more (and never change) right?
There was a problem hiding this comment.
yes correct. Are you asking because if yes, then you'd have this code written differently, or just out of curiosity ?
There was a problem hiding this comment.
Yeah, if the answer had been "yes" we would have to write this differently, because we're building this collection once and never re-evaluating. Which is fine if this always gives the same result, but not otherwise 🙂
tracer/src/Datadog.Trace/PlatformHelpers/ContainerMetadata.NetFramework.cs
Outdated
Show resolved
Hide resolved
127aa99 to
4ca4fdf
Compare
| private async Task ProcessDiscoveryResponse(IApiResponse response) | ||
| { | ||
| // Extract and store container tags hash from response headers | ||
| var containerTagsHash = response.GetHeader(AgentHttpHeaderNames.ContainerTagsHash); |
There was a problem hiding this comment.
I was wondering about this code, as we're going to do the work to extract and set this hash with every response, but I think that's fine, given that we shouldn't call the discovery service so much any more since #7979 (and also given the amount of work we do later in this method)
There was a problem hiding this comment.
yeah, since it's not supposed to change, we could check if we already have it, and only query it then. But if you say it's fine :)
| _metadataHeaders = string.Concat(headers); | ||
| } | ||
|
|
||
| if (_containerId != null) |
There was a problem hiding this comment.
I think we should probably refactor the MinimalAgentHeaderHelper code in general, there's some perf opportunities, and it feels like this is overall more complex than it needs to be, but I'll do that in a subsequent PR after you merge this 🙂
There was a problem hiding this comment.
ok thanks. Yeah that code felt a bit... dated
| // This product includes software developed at Datadog (https://www.datadoghq.com/). Copyright 2017 Datadog, Inc. | ||
| // </copyright> | ||
|
|
||
| #nullable enable |
tracer/test/Datadog.Trace.TestHelpers/TransportHelpers/TestApiResponse.cs
Outdated
Show resolved
Hide resolved
| private const int MaxRetryDelayMs = 50; | ||
| private const int RecheckIntervalMs = 300_000; | ||
|
|
||
| private static readonly ContainerMetadata NullContainerMetadata = new(containerId: null, entityId: null); |
There was a problem hiding this comment.
nit: should prob do this in the TraceExporterTests too
There was a problem hiding this comment.
I did it here because it's used in many tests, but it's only used once in TraceExporterTests, so it feels less relevant to extract a property ?
|
|
||
| var containerMetadata = NullContainerMetadata; | ||
|
|
||
| var ds = new DiscoveryService(factory, containerMetadata, InitialRetryDelayMs, MaxRetryDelayMs, RecheckIntervalMs); |
There was a problem hiding this comment.
RecheckIntervalMs = 300s - are you sure that's correct 🤔 If you "mis" the first check (which happens only 50ms (InitialRetryDelayMs) after calling new()), then this test will flake I think. It's rare, but looks like a definite possibility. If you set RecheckIntervalMs such that it is much smaller than the timeout you use in the mutex, then that should reduce the chance of flake I think?
EDIT: actually, this should be fine 😄 There's no real race condition here: the hash can be updated before subcribetochanges is called, but the mutex is guaranteed to be called in that case. Ignore me 😄
Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com>
Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com>
Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com>
Summary of changes
same as DataDog/dd-trace-java#9156
This hash is meant to be matched in the backend with the actual container tags, with the advantage that :
related RFC: https://docs.google.com/document/d/15GtNOKGBCt6Dc-HsDNnMmCdZwhewFQx8yUlI9in5n3M
Implementation details
Test coverage
Other details