[tracer] Create EventBridge Instrumentation and Inject Trace Context#6096
[tracer] Create EventBridge Instrumentation and Inject Trace Context#6096
Conversation
Datadog ReportBranch report: ✅ 0 Failed, 362288 Passed, 2084 Skipped, 15h 32m 8.13s Total Time New Flaky Tests (1)
⌛ Performance Regressions vs Default Branch (1)
|
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing the following branches/commits: Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6096) - mean (71ms) : 68, 73
. : milestone, 71,
master - mean (70ms) : 68, 72
. : milestone, 70,
section CallTarget+Inlining+NGEN
This PR (6096) - mean (1,140ms) : 1115, 1164
. : milestone, 1140,
master - mean (1,111ms) : 1087, 1135
. : milestone, 1111,
gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6096) - mean (113ms) : 109, 116
. : milestone, 113,
master - mean (109ms) : 106, 111
. : milestone, 109,
section CallTarget+Inlining+NGEN
This PR (6096) - mean (791ms) : 765, 817
. : milestone, 791,
master - mean (772ms) : 756, 789
. : milestone, 772,
gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6096) - mean (93ms) : 89, 97
. : milestone, 93,
master - mean (92ms) : 89, 95
. : milestone, 92,
section CallTarget+Inlining+NGEN
This PR (6096) - mean (729ms) : 715, 743
. : milestone, 729,
master - mean (730ms) : 714, 745
. : milestone, 730,
gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6096) - mean (190ms) : 187, 193
. : milestone, 190,
master - mean (189ms) : 186, 192
. : milestone, 189,
section CallTarget+Inlining+NGEN
This PR (6096) - mean (1,215ms) : 1141, 1288
. : milestone, 1215,
master - mean (1,197ms) : 1170, 1224
. : milestone, 1197,
gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6096) - mean (275ms) : 270, 280
. : milestone, 275,
master - mean (273ms) : 269, 278
. : milestone, 273,
section CallTarget+Inlining+NGEN
This PR (6096) - mean (948ms) : 903, 994
. : milestone, 948,
master - mean (939ms) : 919, 958
. : milestone, 939,
gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6096) - mean (264ms) : 259, 270
. : milestone, 264,
master - mean (263ms) : 259, 266
. : milestone, 263,
section CallTarget+Inlining+NGEN
This PR (6096) - mean (925ms) : 904, 947
. : milestone, 925,
master - mean (926ms) : 906, 947
. : milestone, 926,
|
Throughput/Crank Report ⚡Throughput results for AspNetCoreSimpleController comparing the following branches/commits: Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red. Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards! gantt
title Throughput Linux x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6096) (11.009M) : 0, 11008821
master (11.168M) : 0, 11168199
benchmarks/2.9.0 (11.081M) : 0, 11080577
section Automatic
This PR (6096) (7.414M) : 0, 7413507
master (7.377M) : 0, 7377465
benchmarks/2.9.0 (7.732M) : 0, 7732233
section Trace stats
master (7.726M) : 0, 7725756
section Manual
master (11.062M) : 0, 11062026
section Manual + Automatic
This PR (6096) (6.784M) : 0, 6783817
master (6.902M) : 0, 6901659
section DD_TRACE_ENABLED=0
master (10.195M) : 0, 10194503
gantt
title Throughput Linux arm64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6096) (9.623M) : 0, 9623498
master (9.584M) : 0, 9583681
benchmarks/2.9.0 (9.798M) : 0, 9798067
section Automatic
This PR (6096) (6.414M) : 0, 6413888
master (6.629M) : 0, 6629382
section Trace stats
master (6.851M) : 0, 6850733
section Manual
master (9.454M) : 0, 9454208
section Manual + Automatic
This PR (6096) (6.040M) : 0, 6040346
master (6.166M) : 0, 6165611
section DD_TRACE_ENABLED=0
master (8.858M) : 0, 8858385
gantt
title Throughput Windows x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6096) (9.988M) : 0, 9988253
master (10.308M) : 0, 10307557
benchmarks/2.9.0 (10.067M) : 0, 10067315
section Automatic
This PR (6096) (6.708M) : 0, 6708195
master (6.544M) : 0, 6543815
benchmarks/2.9.0 (7.552M) : 0, 7552193
section Trace stats
master (7.344M) : 0, 7343919
section Manual
master (10.030M) : 0, 10029915
section Manual + Automatic
This PR (6096) (6.106M) : 0, 6106327
master (6.179M) : 0, 6178568
section DD_TRACE_ENABLED=0
master (9.389M) : 0, 9389459
|
Benchmarks Report for tracer 🐌Benchmarks for #6096 compared to master:
The following thresholds were used for comparing the benchmark speeds:
Allocation changes below 0.5% are ignored. Benchmark detailsBenchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SpanBenchmark - Faster 🎉 Same allocations ✔️
|
| Benchmark | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
|---|---|---|---|---|
| Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 | 1.183 | 477.21 | 403.35 |
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | StartFinishSpan |
net6.0 | 477ns | 0.236ns | 0.913ns | 0.00816 | 0 | 0 | 576 B |
| master | StartFinishSpan |
netcoreapp3.1 | 615ns | 0.275ns | 1.07ns | 0.00776 | 0 | 0 | 576 B |
| master | StartFinishSpan |
net472 | 710ns | 0.75ns | 2.91ns | 0.0917 | 0 | 0 | 578 B |
| master | StartFinishScope |
net6.0 | 483ns | 0.286ns | 1.11ns | 0.00972 | 0 | 0 | 696 B |
| master | StartFinishScope |
netcoreapp3.1 | 780ns | 1.52ns | 5.89ns | 0.00934 | 0 | 0 | 696 B |
| master | StartFinishScope |
net472 | 927ns | 1.44ns | 5.59ns | 0.104 | 0 | 0 | 658 B |
| #6096 | StartFinishSpan |
net6.0 | 403ns | 0.568ns | 2.2ns | 0.00803 | 0 | 0 | 576 B |
| #6096 | StartFinishSpan |
netcoreapp3.1 | 595ns | 0.45ns | 1.74ns | 0.00774 | 0 | 0 | 576 B |
| #6096 | StartFinishSpan |
net472 | 728ns | 0.465ns | 1.74ns | 0.0916 | 0 | 0 | 578 B |
| #6096 | StartFinishScope |
net6.0 | 480ns | 0.27ns | 1.05ns | 0.00982 | 0 | 0 | 696 B |
| #6096 | StartFinishScope |
netcoreapp3.1 | 762ns | 0.819ns | 3.17ns | 0.00945 | 0 | 0 | 696 B |
| #6096 | StartFinishScope |
net472 | 934ns | 2.53ns | 9.48ns | 0.104 | 0 | 0 | 658 B |
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | RunOnMethodBegin |
net6.0 | 616ns | 0.426ns | 1.59ns | 0.0096 | 0 | 0 | 696 B |
| master | RunOnMethodBegin |
netcoreapp3.1 | 1.01μs | 0.453ns | 1.75ns | 0.00922 | 0 | 0 | 696 B |
| master | RunOnMethodBegin |
net472 | 1.14μs | 1.32ns | 4.76ns | 0.105 | 0 | 0 | 658 B |
| #6096 | RunOnMethodBegin |
net6.0 | 642ns | 0.341ns | 1.27ns | 0.00976 | 0 | 0 | 696 B |
| #6096 | RunOnMethodBegin |
netcoreapp3.1 | 993ns | 0.436ns | 1.69ns | 0.00943 | 0 | 0 | 696 B |
| #6096 | RunOnMethodBegin |
net472 | 1.13μs | 0.791ns | 3.06ns | 0.105 | 0 | 0 | 658 B |
bouwkast
left a comment
There was a problem hiding this comment.
I haven't gone through much yet but it is looking really good.
There are some test failures where CI can't find the sample application.
tracer/src/Datadog.Trace.Trimming/build/Datadog.Trace.Trimming.xml
Outdated
Show resolved
Hide resolved
tracer/src/Datadog.Trace.Trimming/build/Datadog.Trace.Trimming.xml
Outdated
Show resolved
Hide resolved
tracer/test/Datadog.Trace.ClrProfiler.IntegrationTests/AWS/AwsEventBridgeTests.cs
Show resolved
Hide resolved
|
And thanks for the PR description! Very helpful :) |
...r/test/test-applications/integrations/Samples.AWS.EventBridge/Samples.AWS.EventBridge.csproj
Outdated
Show resolved
Hide resolved
...er/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/EventBridge/AwsEventBridgeCommon.cs
Outdated
Show resolved
Hide resolved
tracer/test/Datadog.Trace.ClrProfiler.IntegrationTests/AWS/AwsEventBridgeTests.cs
Show resolved
Hide resolved
bouwkast
left a comment
There was a problem hiding this comment.
Nice work!
Left some comments, but looks good to me overall
tracer/test/Datadog.Trace.ClrProfiler.IntegrationTests/AWS/AwsEventBridgeTests.cs
Show resolved
Hide resolved
tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/EventBridge/ContextPropagation.cs
Show resolved
Hide resolved
tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/EventBridge/ContextPropagation.cs
Outdated
Show resolved
Hide resolved
…ect-trace-context' into nicholas.hulston/eventbridge-inject-trace-context
tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/EventBridge/ContextPropagation.cs
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
* Use the standard logger instead of `Console.WriteLine()` * Don't pass in redundant parameter. * Remove unused imports
tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/EventBridge/ContextPropagation.cs
Show resolved
Hide resolved
tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/EventBridge/ContextPropagation.cs
Outdated
Show resolved
Hide resolved
tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/EventBridge/ContextPropagation.cs
Outdated
Show resolved
Hide resolved
...c/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/EventBridge/PutEventsAsyncIntegration.cs
Show resolved
Hide resolved
…ect-trace-context' into nicholas.hulston/eventbridge-inject-trace-context
## Summary of changes When the EventBridge instrumentation is disabled we should not inject headers / context into the object. This was the case initially in #6096, but appears to have been accidentally removed in a refactoring in #6157. ## Reason for change Appears that we do not entirely support certain message types here. ## Implementation details Essentially restored the `if (scope?.Span.Context is { } context)` check before injecting. ## Test coverage Our test coverage _passes_ with / without this as we correctly don't create a span when the instrumentation is disabled but we don't really have the means to assert that we do not modify the underlying objects in some way. This is potentially something we should look into implementation to catch future issues like this potentially. ## Other details <!-- Fixes #{issue} --> <!--⚠️ Note: Where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. MergeQueue is NOT enabled in this repository. If you have write access to the repo, the PR has 1-2 approvals (see above), and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #apm-dotnet channel in Slack. -->

Summary of changes
This creates a new instrumentation for EventBridge and intercepts
PutEventsandPutEventsAsyncto inject trace context. This allows the agent to combine spans from a distributed (serverless) architecture into a single trace.This PR only injects trace context. I'm working on PR 1 and PR 2 to update the Lambda extension to use this trace context to create EventBridge spans.
I am also working on a similar PR in dd-trace-java and dd-trace-go.
Reason for change
SNS and SQS are already supported, and the tracer currently injects trace context into message attributes fields for them. However, EventBridge wasn't supported, and this PR aims to fix this problem.
Implementation details
I followed the documentation to create an instrumentation. Much of the logic was mirrored from the existing implementation of SNS, since EventBridge and SNS are extremely similar.
Overall, AWS's EventBridge API is lacking some features, so we have to do some hacky solutions.
detail[_datadog]under the headerx-datadog-start-timedetail[_datadog]under the headerx-datadog-resource-nameTest coverage
I added system tests for SNS/SQS: DataDog/system-tests#3204
I added unit tests and integration tests.
Unit tests can be ran with:
Integration tests can be ran with these commands:
I also did manual testing:

Other details
There are lots of diffs and files changed. I recommend reviewers to review the PR commit by commit. All the autogenerated files were added in a single commit, which should make the review process less overwhelming.