Support adding kafka_cluster_id for Confluent.Kafka by robcarlan-datadog · Pull Request #7702 · DataDog/dd-trace-dotnet

robcarlan-datadog · 2025-10-23T18:01:57Z

Summary of changes

Adds kafka_cluster_id as a tag for APM Spans and DSM checkpoints/offsets. (Note that DSM is enabled by default). Without this, we cannot differentiate between reported offsets on the same topic but across different clusters, which leads us to wildly incorrect metric values. (For example, a prod cluster with consume/produce offsets at 1000,1001 should have an offset lag of 1. If there is also a matching dev cluster with offsets at 0 and 1, then we can calculate the offset lag as 1000).
In most tracer libraries (python, java, node) we support tracking kafka_cluster_id. Without this, metrics become incorrect when there are the same topics across different clusters (i.e. dev/prod environments).
It's available for Confluent.Kafka 2.3.0 and above only - roughly half of orgs that use this library according to telemetry. (For DSM where this is most useful, we can add a recommendation to our docs to use this version or above.)

Note: I drafted an alternative approach calling librdkafka directly here. That gives the cluster id without any external API call or version dependency (It's available and unchanged from librdkafka 1.0.0), but calls into unmanaged code

Reason for change

This functionality exists in other tracers:

Java (doesn't block, intercepts the metadata response to enrich cluster id going forwards, see here and here)
Node (blocks, see here and here)
Python (blocks with a 1 second timeout, see here and here)

Implementation details

Constructs the Kafka Admin API client and queries for the cluster id directly on consumer/producer startup.
We cache this by bootstrap servers, so we expect to make this API call once.

Test coverage

Other details

These DSM metrics will now tag by kafka_cluster_id

Added to spans (I checked this on produce too, I just don't have a screenshot of it)

dd-trace-dotnet-ci-bot · 2025-10-23T19:16:22Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (7702) and master.

✅ No regressions detected - check the details below

Full Metrics Comparison

FakeDbCommand

Metric	Master (Mean ± 95% CI)	Current (Mean ± 95% CI)	Change	Status
.NET Framework 4.8 - Baseline
duration	69.31 ± (69.54 - 69.99) ms	68.91 ± (68.86 - 69.13) ms	-0.6%	✅
.NET Framework 4.8 - Bailout
duration	73.09 ± (73.02 - 73.28) ms	73.13 ± (72.98 - 73.37) ms	+0.1%	✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration	1044.03 ± (1046.70 - 1054.47) ms	1039.94 ± (1043.59 - 1051.21) ms	-0.4%	✅
.NET Core 3.1 - Baseline
process.internal_duration_ms	21.90 ± (21.87 - 21.94) ms	21.92 ± (21.88 - 21.95) ms	+0.1%	✅⬆️
process.time_to_main_ms	79.85 ± (79.68 - 80.03) ms	79.88 ± (79.73 - 80.03) ms	+0.0%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.89 ± (10.89 - 10.89) MB	10.91 ± (10.91 - 10.91) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	12 ± (12 - 12)	12 ± (12 - 12)	+0.0%	✅
.NET Core 3.1 - Bailout
process.internal_duration_ms	21.88 ± (21.85 - 21.91) ms	21.82 ± (21.80 - 21.85) ms	-0.3%	✅
process.time_to_main_ms	81.31 ± (81.16 - 81.45) ms	81.21 ± (81.06 - 81.35) ms	-0.1%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.94 ± (10.93 - 10.94) MB	10.94 ± (10.94 - 10.94) MB	+0.0%	✅⬆️
runtime.dotnet.threads.count	13 ± (13 - 13)	13 ± (13 - 13)	+0.0%	✅
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms	220.38 ± (219.28 - 221.48) ms	218.23 ± (217.04 - 219.41) ms	-1.0%	✅
process.time_to_main_ms	472.68 ± (472.05 - 473.32) ms	472.80 ± (472.23 - 473.37) ms	+0.0%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	47.24 ± (47.21 - 47.27) MB	47.26 ± (47.23 - 47.28) MB	+0.0%	✅⬆️
runtime.dotnet.threads.count	28 ± (28 - 28)	28 ± (28 - 28)	-0.3%	✅
.NET 6 - Baseline
process.internal_duration_ms	20.78 ± (20.75 - 20.81) ms	20.77 ± (20.74 - 20.80) ms	-0.0%	✅
process.time_to_main_ms	69.91 ± (69.76 - 70.07) ms	69.42 ± (69.28 - 69.56) ms	-0.7%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.62 ± (10.62 - 10.62) MB	10.62 ± (10.61 - 10.62) MB	-0.0%	✅
runtime.dotnet.threads.count	10 ± (10 - 10)	10 ± (10 - 10)	+0.0%	✅
.NET 6 - Bailout
process.internal_duration_ms	20.52 ± (20.50 - 20.54) ms	20.88 ± (20.85 - 20.91) ms	+1.7%	✅⬆️
process.time_to_main_ms	70.16 ± (70.07 - 70.26) ms	70.73 ± (70.60 - 70.85) ms	+0.8%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	10.73 ± (10.73 - 10.74) MB	10.73 ± (10.73 - 10.73) MB	-0.0%	✅
runtime.dotnet.threads.count	11 ± (11 - 11)	11 ± (11 - 11)	+0.0%	✅
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms	206.33 ± (204.11 - 208.56) ms	201.86 ± (199.77 - 203.95) ms	-2.2%	✅
process.time_to_main_ms	473.56 ± (472.71 - 474.41) ms	472.33 ± (471.58 - 473.08) ms	-0.3%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	49.03 ± (48.97 - 49.10) MB	48.96 ± (48.90 - 49.03) MB	-0.1%	✅
runtime.dotnet.threads.count	29 ± (29 - 29)	29 ± (29 - 29)	-0.0%	✅
.NET 8 - Baseline
process.internal_duration_ms	18.85 ± (18.82 - 18.88) ms	18.86 ± (18.83 - 18.88) ms	+0.0%	✅⬆️
process.time_to_main_ms	68.63 ± (68.48 - 68.77) ms	68.57 ± (68.45 - 68.70) ms	-0.1%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	7.69 ± (7.68 - 7.69) MB	7.66 ± (7.65 - 7.66) MB	-0.4%	✅
runtime.dotnet.threads.count	10 ± (10 - 10)	10 ± (10 - 10)	+0.0%	✅
.NET 8 - Bailout
process.internal_duration_ms	18.88 ± (18.86 - 18.91) ms	18.94 ± (18.91 - 18.97) ms	+0.3%	✅⬆️
process.time_to_main_ms	69.67 ± (69.55 - 69.79) ms	69.79 ± (69.66 - 69.91) ms	+0.2%	✅⬆️
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	7.74 ± (7.73 - 7.75) MB	7.72 ± (7.71 - 7.73) MB	-0.3%	✅
runtime.dotnet.threads.count	11 ± (11 - 11)	11 ± (11 - 11)	+0.0%	✅
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms	157.63 ± (156.77 - 158.48) ms	156.24 ± (155.26 - 157.22) ms	-0.9%	✅
process.time_to_main_ms	451.64 ± (451.12 - 452.16) ms	450.27 ± (449.57 - 450.97) ms	-0.3%	✅
runtime.dotnet.exceptions.count	0 ± (0 - 0)	0 ± (0 - 0)	+0.0%	✅
runtime.dotnet.mem.committed	36.53 ± (36.51 - 36.55) MB	36.51 ± (36.48 - 36.53) MB	-0.1%	✅
runtime.dotnet.threads.count	28 ± (28 - 28)	28 ± (28 - 28)	+0.0%	✅⬆️

HttpMessageHandler

Metric	Master (Mean ± 95% CI)	Current (Mean ± 95% CI)	Change	Status
.NET Framework 4.8 - Baseline
duration	195.58 ± (195.53 - 196.34) ms	194.06 ± (194.00 - 194.86) ms	-0.8%	✅
.NET Framework 4.8 - Bailout
duration	197.34 ± (197.14 - 197.92) ms	197.48 ± (197.22 - 197.73) ms	+0.1%	✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration	1143.98 ± (1144.38 - 1151.56) ms	1149.86 ± (1150.60 - 1157.67) ms	+0.5%	✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms	188.27 ± (187.94 - 188.59) ms	189.22 ± (188.81 - 189.63) ms	+0.5%	✅⬆️
process.time_to_main_ms	80.90 ± (80.73 - 81.08) ms	81.58 ± (81.32 - 81.84) ms	+0.8%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	16.14 ± (16.10 - 16.17) MB	16.15 ± (16.13 - 16.18) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	20 ± (19 - 20)	20 ± (19 - 20)	-0.0%	✅
.NET Core 3.1 - Bailout
process.internal_duration_ms	188.05 ± (187.62 - 188.48) ms	188.40 ± (187.96 - 188.84) ms	+0.2%	✅⬆️
process.time_to_main_ms	82.62 ± (82.44 - 82.80) ms	82.69 ± (82.52 - 82.86) ms	+0.1%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	16.09 ± (16.07 - 16.12) MB	16.11 ± (16.09 - 16.14) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	21 ± (20 - 21)	21 ± (21 - 21)	+0.4%	✅⬆️
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms	397.34 ± (395.60 - 399.08) ms	394.74 ± (392.70 - 396.79) ms	-0.7%	✅
process.time_to_main_ms	474.27 ± (473.58 - 474.95) ms	475.90 ± (475.26 - 476.53) ms	+0.3%	✅⬆️
runtime.dotnet.exceptions.count	3 ± (3 - 3)	3 ± (3 - 3)	+0.0%	✅
runtime.dotnet.mem.committed	57.93 ± (57.78 - 58.08) MB	57.78 ± (57.61 - 57.96) MB	-0.2%	✅
runtime.dotnet.threads.count	30 ± (30 - 30)	30 ± (30 - 30)	-0.1%	✅
.NET 6 - Baseline
process.internal_duration_ms	192.60 ± (192.19 - 193.00) ms	192.95 ± (192.58 - 193.31) ms	+0.2%	✅⬆️
process.time_to_main_ms	70.45 ± (70.24 - 70.66) ms	70.81 ± (70.62 - 71.00) ms	+0.5%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	16.15 ± (16.02 - 16.28) MB	16.17 ± (16.04 - 16.30) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	18 ± (18 - 19)	18 ± (18 - 19)	+0.0%	✅⬆️
.NET 6 - Bailout
process.internal_duration_ms	191.69 ± (191.36 - 192.03) ms	192.59 ± (192.26 - 192.92) ms	+0.5%	✅⬆️
process.time_to_main_ms	71.18 ± (71.06 - 71.30) ms	71.60 ± (71.48 - 71.73) ms	+0.6%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	16.17 ± (16.04 - 16.31) MB	16.03 ± (15.88 - 16.19) MB	-0.9%	✅
runtime.dotnet.threads.count	20 ± (20 - 20)	19 ± (19 - 20)	-1.1%	✅
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms	428.23 ± (426.42 - 430.04) ms	428.71 ± (426.88 - 430.55) ms	+0.1%	✅⬆️
process.time_to_main_ms	476.18 ± (475.42 - 476.94) ms	478.91 ± (478.05 - 479.77) ms	+0.6%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	60.22 ± (60.15 - 60.29) MB	60.32 ± (60.26 - 60.38) MB	+0.2%	✅⬆️
runtime.dotnet.threads.count	30 ± (30 - 30)	30 ± (30 - 30)	+0.0%	✅⬆️
.NET 8 - Baseline
process.internal_duration_ms	190.84 ± (190.41 - 191.27) ms	190.84 ± (190.47 - 191.21) ms	-0.0%	✅
process.time_to_main_ms	70.21 ± (70.00 - 70.43) ms	70.31 ± (70.11 - 70.51) ms	+0.1%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	11.77 ± (11.74 - 11.79) MB	11.78 ± (11.76 - 11.80) MB	+0.1%	✅⬆️
runtime.dotnet.threads.count	18 ± (18 - 18)	18 ± (18 - 18)	-0.4%	✅
.NET 8 - Bailout
process.internal_duration_ms	190.29 ± (189.98 - 190.61) ms	190.98 ± (190.60 - 191.36) ms	+0.4%	✅⬆️
process.time_to_main_ms	70.94 ± (70.81 - 71.06) ms	71.35 ± (71.22 - 71.48) ms	+0.6%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	11.85 ± (11.82 - 11.88) MB	11.84 ± (11.81 - 11.87) MB	-0.1%	✅
runtime.dotnet.threads.count	19 ± (19 - 19)	19 ± (19 - 19)	-0.6%	✅
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms	351.91 ± (350.56 - 353.26) ms	354.79 ± (353.54 - 356.05) ms	+0.8%	✅⬆️
process.time_to_main_ms	452.69 ± (452.09 - 453.29) ms	455.80 ± (455.21 - 456.39) ms	+0.7%	✅⬆️
runtime.dotnet.exceptions.count	4 ± (4 - 4)	4 ± (4 - 4)	+0.0%	✅
runtime.dotnet.mem.committed	48.41 ± (48.36 - 48.45) MB	48.38 ± (48.34 - 48.42) MB	-0.1%	✅
runtime.dotnet.threads.count	30 ± (30 - 30)	30 ± (30 - 30)	-0.3%	✅

Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts

FakeDbCommand (.NET Framework 4.8)

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7702) - mean (69ms)  : 67, 71
    master - mean (70ms)  : 66, 73

    section Bailout
    This PR (7702) - mean (73ms)  : 71, 75
    master - mean (73ms)  : 72, 74

    section CallTarget+Inlining+NGEN
    This PR (7702) - mean (1,047ms)  : 993, 1102
    master - mean (1,051ms)  : 995, 1106

FakeDbCommand (.NET Core 3.1)

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7702) - mean (108ms)  : 104, 111
    master - mean (107ms)  : 104, 111

    section Bailout
    This PR (7702) - mean (108ms)  : 107, 110
    master - mean (109ms)  : 107, 110

    section CallTarget+Inlining+NGEN
    This PR (7702) - mean (727ms)  : 708, 746
    master - mean (730ms)  : 711, 750

FakeDbCommand (.NET 6)

gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7702) - mean (95ms)  : 93, 98
    master - mean (96ms)  : 93, 99

    section Bailout
    This PR (7702) - mean (97ms)  : 95, 98
    master - mean (96ms)  : 94, 98

    section CallTarget+Inlining+NGEN
    This PR (7702) - mean (703ms)  : 654, 752
    master - mean (709ms)  : 668, 751

FakeDbCommand (.NET 8)

gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7702) - mean (94ms)  : 91, 97
    master - mean (94ms)  : 92, 97

    section Bailout
    This PR (7702) - mean (95ms)  : 93, 98
    master - mean (95ms)  : 93, 97

    section CallTarget+Inlining+NGEN
    This PR (7702) - mean (636ms)  : 620, 651
    master - mean (641ms)  : 615, 667

HttpMessageHandler (.NET Framework 4.8)

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7702) - mean (194ms)  : 190, 199
    master - mean (196ms)  : 191, 201

    section Bailout
    This PR (7702) - mean (197ms)  : 195, 200
    master - mean (198ms)  : 194, 201

    section CallTarget+Inlining+NGEN
    This PR (7702) - mean (1,154ms)  : 1103, 1205
    master - mean (1,148ms)  : 1097, 1199

HttpMessageHandler (.NET Core 3.1)

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7702) - mean (279ms)  : 274, 285
    master - mean (277ms)  : 273, 282

    section Bailout
    This PR (7702) - mean (279ms)  : 273, 285
    master - mean (279ms)  : 274, 284

    section CallTarget+Inlining+NGEN
    This PR (7702) - mean (902ms)  : 878, 927
    master - mean (903ms)  : 876, 929

HttpMessageHandler (.NET 6)

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7702) - mean (272ms)  : 268, 276
    master - mean (272ms)  : 266, 277

    section Bailout
    This PR (7702) - mean (273ms)  : 269, 277
    master - mean (271ms)  : 267, 275

    section CallTarget+Inlining+NGEN
    This PR (7702) - mean (941ms)  : 912, 970
    master - mean (936ms)  : 909, 963

HttpMessageHandler (.NET 8)

gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7702) - mean (271ms)  : 263, 279
    master - mean (271ms)  : 263, 278

    section Bailout
    This PR (7702) - mean (272ms)  : 266, 278
    master - mean (271ms)  : 266, 275

    section CallTarget+Inlining+NGEN
    This PR (7702) - mean (843ms)  : 814, 873
    master - mean (835ms)  : 814, 855

pr-commenter · 2025-10-23T19:28:02Z

Benchmarks

Benchmark execution time: 2026-03-20 14:34:16

Comparing candidate commit d527467 in PR branch rob.carlan/kafka-cluster-id-tag-reflection with baseline commit b0dc7b1 in branch master.

Found 5 performance improvements and 10 performance regressions! Performance is the same for 251 metrics, 22 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

🟩 = significantly better candidate vs. baseline
🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1

🟥 execution_time [+111.029ms; +112.823ms] or [+128.054%; +130.123%]

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

🟩 execution_time [-103.461ms; -103.374ms] or [-50.207%; -50.165%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody netcoreapp3.1

🟥 execution_time [+10.973ms; +15.840ms] or [+5.538%; +7.994%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

🟩 execution_time [-18.777ms; -12.801ms] or [-8.712%; -5.939%]
🟩 throughput [+139554.415op/s; +210777.762op/s] or [+6.223%; +9.399%]

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1

🟩 execution_time [-21.797ms; -20.994ms] or [-10.849%; -10.450%]

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1

🟥 execution_time [+68.466ms; +77.876ms] or [+59.385%; +67.547%]

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

🟩 throughput [+119.976op/s; +143.472op/s] or [+11.601%; +13.873%]

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1

🟥 execution_time [+10.423ms; +13.835ms] or [+5.286%; +7.017%]

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0

🟥 execution_time [+15.377ms; +19.336ms] or [+7.829%; +9.844%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

🟥 allocated_mem [+14.625KB; +14.654KB] or [+5.671%; +5.682%]

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1

🟥 allocated_mem [+20.193KB; +20.220KB] or [+7.855%; +7.866%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0

🟥 execution_time [+100.199ms; +101.308ms] or [+100.444%; +101.556%]

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1

🟥 throughput [-15715213.516op/s; -14556485.880op/s] or [-6.520%; -6.039%]

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0

🟥 execution_time [+15.486ms; +20.834ms] or [+7.733%; +10.404%]

Extract Kafka cluster_id via AdminClient.DescribeClusterAsync using duck typing and add it to span tags, DSM edge tags, and backlog tracking tags. Changes: - Add KafkaHelper.GetClusterId() using duck typed Confluent.Kafka AdminClient API with re-entrancy guard (AdminClient internally creates a Producer) - Extend ConsumerCache/ProducerCache to store cluster_id alongside existing metadata - Add ClusterId property to KafkaTags (tag: messaging.kafka.cluster_id) - Include kafka_cluster_id in DSM edge tags for produce/consume checkpoints - Include kafka_cluster_id in commit/produce backlog tracking tags - Add duck type interfaces: IAdminClient, IAdminClientBuilder, IAdminClientConfig, IDescribeClusterResult with IDuckTypeTask for async handling - Add unit tests for cache classes, KafkaTags, and GetClusterId edge cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix DSM edge tags alphabetical sort: direction before kafka_cluster_id - Guard kafka_cluster_id in OffsetsCommittedCallbacks with IsNullOrEmpty check, consistent with all other callsites - Rename misleading test name to HandlesConnectionFailureGracefully Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The span_attribute_schema markdown files had formatting-only changes (table alignment) that were accidentally included from WIP commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Verify the duck typing chain works for IAdminClientConfig, IAdminClientBuilder, and check DescribeClusterAsync availability (requires Confluent.Kafka 2.0+). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…st coverage Pass the producer instance to TryInjectHeaders so it can query ProducerCache directly for cluster_id, instead of casting span.Tags back to KafkaTags. This matches how the consumer side uses ConsumerCache. Also add cluster_id assertions to integration tests and register the tag as optional in span metadata rules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tead of sender Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

robcarlan-datadog · 2026-02-25T17:54:08Z

+            {
+                _isGettingClusterId = true;
+
+                var configType = Type.GetType("Confluent.Kafka.AdminClientConfig, Confluent.Kafka");


afaik this is the only way to get this object, and the builder below

…lusterAsync Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…or cluster_id Instead of creating a standalone AdminClient with a new connection to the broker, use DependentAdminClientBuilder which reuses the producer/consumer's existing connection. Move cluster_id resolution from OnMethodBegin to OnMethodEnd so the client handle is available after construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… fallback Remove standalone AdminClient fallback and IAdminClientConfig since DependentAdminClientBuilder is available in all supported Confluent.Kafka versions (1.4.0+). Clean up constructor OnMethodEnd to use early returns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

andrewlock · 2026-03-11T11:07:00Z

                    x => x.Tags[Tags.KafkaConsumerGroup] == "Samples.Kafka.AutoCommitConsumer1"
                      || x.Tags[Tags.KafkaConsumerGroup] == "Samples.Kafka.ManualCommitConsumer2");

+            // cluster_id is only available with Confluent.Kafka 2.0+ (DescribeClusterAsync)


Right... I forgot this 🤔 I think we should potentially split our integrations to account for this, because otherwise we're going to be doing expensive work which we know is going to fail when we try to describe the cluster.

There's two ways to do that, depending on how reliably we can detect what version we're in from the KafkaHelper:

Extract the common code out and create a duplicate of the integration

One integration targets 1.x.x only, and does the existing stuff

The other integration targets 2.x.x+, and includes the describe cluster work

Pro-actively try to grab the DescribeClusterOptions type - if it doesn't exist, there's no point trying to build and duck type an AdminClient, so we can (and should) just bail at that point. What's more, we know that can never work, so we should probably cache a null/"" value for the bootstrap servers (and potentially bypass that cache entirely - just always return null if that lookup of the Type has failed...)

I like the second approach in terms of simplicity, I'll do that. Flagging an early null/"" result in that case seems like a good thing to do

(My original comment was wrong and the version requirement is actually > 2.3.0)

zacharycmontoya

I've added a couple of comments but I think Andrew already did a very comprehensive review. Looking forward to changes based on PR feedback, but in general I think this is looking pretty promising

- Use inline ternary for backlog tags to avoid extra string allocations - Replace string.IsNullOrEmpty with StringUtil.IsNullOrEmpty throughout - Simplify GetClusterId with direct DuckCast and using statement - Accept nullable clusterId in ConsumerCache.SetConsumerGroup - Use typed Config class instead of string[] in consumer constructor - Add nullability annotations to duck typing interfaces Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…duck typing Anchor reflection type lookups to the Confluent.Kafka assembly obtained from the actual client instance, avoiding fragile global assembly searches. Replace raw reflection property access on DescribeClusterOptions with a duck type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…duck typing Anchor reflection type lookups to the Confluent.Kafka assembly obtained from the actual client instance, avoiding fragile global assembly searches. Replace raw reflection property access on DescribeClusterOptions with a duck type. Early-return and cache empty result for Confluent.Kafka < 2.3.0 where DescribeClusterAsync is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

andrewlock

LGTM overall - just a couple of final tweaks around logging and caching, that should probably be addressed, but otherwise looking good!

… paths

zacharycmontoya

Aside from the small comments, LGTM!

This comment has been minimized.

Sign in to view

bouwkast force-pushed the rob.carlan/kafka-cluster-id-tag-reflection branch from f83a435 to 3d16b25 Compare November 4, 2025 19:34

robcarlan-datadog force-pushed the rob.carlan/kafka-cluster-id-tag-reflection branch from e3c11d3 to 04181fd Compare February 19, 2026 22:20

robcarlan-datadog and others added 4 commits February 20, 2026 13:01

Revert unrelated docs formatting changes

1abfae6

The span_attribute_schema markdown files had formatting-only changes (table alignment) that were accidentally included from WIP commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove unnecessary comments that restate what the code does

30ed9a4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add duck typing validation tests for cluster_id feature

7ba9829

Verify the duck typing chain works for IAdminClientConfig, IAdminClientBuilder, and check DescribeClusterAsync availability (requires Confluent.Kafka 2.0+). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

robcarlan-datadog added area:data-streams-monitoring AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos labels Feb 23, 2026

robcarlan-datadog changed the title ~~Draft: .NET cluster id via reflection~~ Support adding kafka_cluster_id for Confluent.Kafka Feb 24, 2026

robcarlan-datadog and others added 5 commits February 24, 2026 14:35

Remove KafkaClusterIdTests in favor of integration test coverage

6aa0d44

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add missing XML doc param tag for producer parameter

0fed7eb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix cluster_id lookup in OffsetsCommittedCallbacks using consumer ins…

830c068

…tead of sender Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

remove comments from DuckTyped classes

2519973

robcarlan-datadog commented Feb 25, 2026

View reviewed changes

robcarlan-datadog marked this pull request as ready for review February 25, 2026 17:54

robcarlan-datadog requested review from a team as code owners February 25, 2026 17:54

robcarlan-datadog and others added 7 commits February 25, 2026 12:54

Merge branch 'master' into rob.carlan/kafka-cluster-id-tag-reflection

40d0159

Cache cluster_id by bootstrap servers and add 2s timeout to DescribeC…

8ab7ede

…lusterAsync Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove UpdateClusterId, move cache writes entirely to OnMethodEnd

f489cf0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Trim doc comments and restore RemoveProducer cleanup on error

b66ef67

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove reentrancy guard and unnecessary RemoveProducer call

5b5591e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

robcarlan-datadog commented Feb 25, 2026

View reviewed changes

Comment thread tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Kafka/KafkaHelper.cs Outdated

robcarlan-datadog requested review from andrewlock and zacharycmontoya March 9, 2026 14:28

andrewlock reviewed Mar 11, 2026

View reviewed changes

zacharycmontoya reviewed Mar 11, 2026

View reviewed changes

Comment thread tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Kafka/KafkaHelper.cs Outdated

zacharycmontoya reviewed Mar 11, 2026

View reviewed changes

Comment thread tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Kafka/KafkaHelper.cs Outdated

zacharycmontoya reviewed Mar 11, 2026

View reviewed changes

robcarlan-datadog and others added 3 commits March 11, 2026 16:27

Merge branch 'master' into rob.carlan/kafka-cluster-id-tag-reflection

9b19c6a

robcarlan-datadog force-pushed the rob.carlan/kafka-cluster-id-tag-reflection branch 4 times, most recently from 79d465e to 3962123 Compare March 13, 2026 14:11

robcarlan-datadog force-pushed the rob.carlan/kafka-cluster-id-tag-reflection branch from 3962123 to c6531c5 Compare March 13, 2026 14:14

robcarlan-datadog requested review from andrewlock and zacharycmontoya March 13, 2026 14:15

andrewlock approved these changes Mar 13, 2026

View reviewed changes

PR comments; pass through DescribeClusterOptions type and cache error…

2625eec

… paths

zacharycmontoya approved these changes Mar 16, 2026

View reviewed changes

robcarlan-datadog added 5 commits March 17, 2026 11:58

Merge branch 'master' into rob.carlan/kafka-cluster-id-tag-reflection

83d2c11

Merge branch 'master' into rob.carlan/kafka-cluster-id-tag-reflection

be1369b

downgrade cluster id exception to warning for transient cases

cc793b7

Merge branch 'master' into rob.carlan/kafka-cluster-id-tag-reflection

4fed097

Merge branch 'master' into rob.carlan/kafka-cluster-id-tag-reflection

d527467

robcarlan-datadog merged commit 6f4edac into master Mar 20, 2026
140 checks passed

robcarlan-datadog deleted the rob.carlan/kafka-cluster-id-tag-reflection branch March 20, 2026 16:33

github-actions Bot added this to the vNext-v3 milestone Mar 20, 2026

andrewlock mentioned this pull request Mar 25, 2026

Fix exception in Kafka on .NET Framework #8366

Merged

Conversation

robcarlan-datadog commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Reason for change

Implementation details

Test coverage

Other details

Uh oh!

This comment has been minimized.

dd-trace-dotnet-ci-bot Bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Execution-Time Benchmarks Report ⏱️

FakeDbCommand

HttpMessageHandler

Uh oh!

pr-commenter Bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Explanation

More details about the CI and significant changes

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1

scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.ObjectExtractorSimpleBody netcoreapp3.1

scenario:Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs netcoreapp3.1

scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1

scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472

scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync netcoreapp3.1

scenario:Benchmarks.Trace.ILoggerBenchmark.EnrichedLog net6.0

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark net6.0

scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0

scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore netcoreapp3.1

scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0

Uh oh!

robcarlan-datadog Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewlock Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

robcarlan-datadog Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

robcarlan-datadog Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zacharycmontoya left a comment

Choose a reason for hiding this comment

Uh oh!

andrewlock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zacharycmontoya left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

robcarlan-datadog commented Oct 23, 2025 •

edited

Loading

dd-trace-dotnet-ci-bot Bot commented Oct 23, 2025 •

edited

Loading

pr-commenter Bot commented Oct 23, 2025 •

edited

Loading