Instrumentation test error visibility by deejgregor · Pull Request #9979 · DataDog/dd-trace-java

deejgregor · 2025-11-16T07:36:11Z

What Does This Do

This improves visibility of errors in instrumentation tests built on top of InstrumentationSpecification by failing the test with details of the errors, in particular these cases:

Instrumentation blocked due to MuzzleCheck errors.
Throwables caught from instrumentation code and reported to InstrumentationErrors by the Byte Buddy ExceptionHandlers class.
Byte Buddy nstrumentation errors reported to InstrumentationSpecification's AgentBuilder.Listener.onError.

Also, ListWriterAssert.assertTraces will now fail fast when the errors from the first two cases above are reported.

Lastly, the code comment in ExceptionHandlers has been updated to reflect the current implementation and InstrumentationContext exceptions are more explicit about the likely causes of why the methods might not have been rewritten.

Motivation

These changes should reduce discovery and investigation time during development when there are instrumentation errors. Every single one of these issues are ones that I've run while developing instrumentations (most of them I've run into multiple times). ;-) These changes have helped me out enough (particularly for the first two use cases above) that I realized I really should see about contributing this back to help others--especially for those new to developing instrumentation, but also to assist those that have been doing it awhile when the inevitable error slips through.

Previously, the first error case above was not checked in tests using InstrumentationSpecification, and was only visible in log output, and the other two error cases just included error counts. In all of these cases, the developer would have to go digging in log output to find existence and/or details of errors.

When ListWriterAssert.assertTraces was used previously, if there were instrumentation errors that led to traces not being properly generated, tests will often fail due to failed assertions (missing spans, incorrect tags, etc.) and the developer might spend significant time chasing down the cause of the assertion failures to learn it was an instrumentation error. With these changes, if the cause was an underlying instrumentation error, ListWriterAssert.assertTraces will fail fast for the first two use cases above and clearly indicate there was an instrumentation error.

These changes will not only help our human developers by giving better context when there are failures, but also when using AI coding assistants.

Additional Notes

One possible improvement: The fail fast change could easily apply to all of the use cases above including the last use case if InstrumentationSpecification's AgentBuilder.Listener.onError recorded the error to InstrumentationErrors.

Best reviewed commit-by-commit:

Example output

InstrumentationErrors and InstrumentationContext exception

I commented-out the contextStore method on HikariPoolInstrumentation.

Condition not satisfied:

instrumentationErrorCount == 0
|                         |
1                         false

1 instrumentation errors were seen:
java.lang.RuntimeException: Calls to this method will be rewritten by Instrumentation Context Provider (e.g. FieldBackedProvider). If you get this exception, this method has not been rewritten. Ensure instrumentation class has a contextStore method and the call to InstrumentationContext.get happens directly in an instrumentation Advice class. See how_instrumentations_work.md for details.
	at datadog.trace.bootstrap.InstrumentationContext.get(InstrumentationContext.java:26)
	at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:151)
	at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:73)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:72)

Contributor Checklist

Format the title according the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any useful labels
Don't use close, fix or any linking keywords when referencing an issue.
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, move, or deletion
Update the public documentation in case of new configuration flag or behavior

Jira ticket: [PROJ-IDENT]

AlexeyKuznetsov-DD · 2025-11-17T21:39:22Z


 public class InstrumentationErrors {
-  private static final AtomicLong COUNTER = new AtomicLong();
+  private static final List<String> ERRORS = Collections.synchronizedList(new ArrayList<>());


❓ (For my own understanding): Why we need synchronized list here (and similar place in other class)?
Maybe we can use some concurrent data structure here instead?

Short answer (not a great answer): because I'm an old-school Java programmer and still learning some modern (Java 8, heh) features. ;-)

Longer answer: these few places all had an AtomicLong and when I replaced it with a List Collections.synchronizedList(new ArrayList<>()) was my muscle-memory replacement. It looks like CopyOnWriteArrayList is a decent alternative in this case, so that's what I'll be doing.

Updated:

https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-82dd6ac77ecfcdc8b0ff7590d7370d00c33f7ac78ca37f78cbf841c0ced625b3R10

https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-2328d587d0382c573151b6c14fa9bab635d7482f5075206e22f0bf7114910834R16

https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-39d1612c29708df2c2a24a9d0888e933768aaecd8461e0691f6f6e28edf0050eR169

AlexeyKuznetsov-DD · 2025-11-17T21:42:54Z

+    def instrumentationErrorCount = InstrumentationErrors.getErrors().size()
+    assert instrumentationErrorCount == 0, instrumentationErrorCount + " instrumentation errors were seen:\n" + InstrumentationErrors.getErrors().join("\n---\n")
+    def muzzleErrorCount = MuzzleCheck.getErrors().size()
+    assert muzzleErrorCount == 0, muzzleErrorCount + " muzzle errors were seen:\n" + MuzzleCheck.getErrors().join("\n---\n")


nitpick: I think Groovy string interpolation can be used here?

Yup! Thank you for the suggestion.

Updated:

https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-39d1612c29708df2c2a24a9d0888e933768aaecd8461e0691f6f6e28edf0050eR510

https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-39d1612c29708df2c2a24a9d0888e933768aaecd8461e0691f6f6e28edf0050eR512

https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-39d1612c29708df2c2a24a9d0888e933768aaecd8461e0691f6f6e28edf0050eR555

AlexeyKuznetsov-DD · 2025-11-17T21:44:00Z

@@ -47,7 +49,21 @@ class ListWriterAssert {
    @ClosureParams(value = SimpleType, options = ['datadog.trace.agent.test.asserts.ListWriterAssert'])
    @DelegatesTo(value = ListWriterAssert, strategy = Closure.DELEGATE_FIRST) Closure spec) {
    try {
-      writer.waitForTraces(expectedSize)
+      // Fail fast if we see any muzzle errors or instrumentation errors
+      for (int timeoutRemaining = 20; timeoutRemaining > 0; timeoutRemaining--) {


nitpick: probably it make sense to declare named const for magic number 20?

Good suggestion--I introduced a constant and used it in two places:

Updates:

https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-0d9fec4a568cdb6980879f651c3a51870a04b096ba70ebcbfc6cd4b72a4ee32dR20

https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-0d9fec4a568cdb6980879f651c3a51870a04b096ba70ebcbfc6cd4b72a4ee32dR88

https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-130cc10c95eb5a319363779ba4f67115eebd9f12b06593a8625c75d47f6df67dR53

deejgregor · 2025-11-18T08:30:46Z

@AlexeyKuznetsov-DD thank you for the feedback. I'll work on the updates in the next day or two.

This puts the details front and center in the test summary and is much easier than hunting back through log messages. Note: In InstrumentationErrors, errors are only recorded when enableRecordingAndReset has been called. This is only done from the InstrumentationSpecification test specification, so we don't need to worry about the ArrayList growing without bounds in production. I considered adding a limit to the ArrayList, but opted not to for simplicity. I also wanted to avoid silently discarding some errors.

If there is a blocked instrumentation, the test will fail and the failure message will include the same information that is logged from MuzzleCheck.

deejgregor · 2025-11-19T04:01:33Z

@AlexeyKuznetsov-DD I made the recommended changes, rebased on latest master and force pushed.

I made one additional update: I realized there could be a case where the assertions in ListWriterAssert would not be checked: if the expected number of traces were received within 1 second. I refactored the asserts out into a new method, assertNoErrors, and used that both inside the for loop and once after to make sure it runs at least once.

AlexeyKuznetsov-DD

PR in general LGTM, but better to get one more approve from someone more familiar with codebase.

AlexeyKuznetsov-DD

LGTM, but better to have one more approval from code owners.

mcculls

Hi @deejgregor - thanks for the contribution, there are some CI failures that I need to look into - and some cleanup around ExceptionHandlers to reduce the impact in production (since we can rely on telemetry, and therefore don't need the additional argument stack allocations)

I'll add comments here in the next week or so

github-actions · 2026-05-01T04:20:37Z

This pull request has been marked as stale because it has not had activity over the past quarter. It will be closed in 7 days if no further activity occurs. Feel free to reopen the PR if you are still working on it.

deejgregor requested a review from a team as a code owner November 16, 2025 07:36

deejgregor requested a review from smola November 16, 2025 07:36

AlexeyKuznetsov-DD reviewed Nov 17, 2025

View reviewed changes

mcculls added the tag: community Community contribution label Nov 18, 2025

deejgregor added 5 commits November 18, 2025 19:52

ExceptionHandlers: Fix code in bytecode comment

9e598ac

InstrumentationContext: be more explicit about the reasons for failure

2a05f63

Check for blocked instrumentation in tests

a87cc73

If there is a blocked instrumentation, the test will fail and the failure message will include the same information that is logged from MuzzleCheck.

Fail fast on muzzle or instrumentation errors

547e34d

deejgregor force-pushed the feature-instrumentation-test-error-visibility branch from 974b975 to 547e34d Compare November 19, 2025 03:53

deejgregor requested a review from AlexeyKuznetsov-DD November 19, 2025 04:01

AlexeyKuznetsov-DD reviewed Nov 19, 2025

View reviewed changes

Comment thread dd-java-agent/agent-bootstrap/src/main/java/datadog/trace/bootstrap/InstrumentationErrors.java

Comment thread dd-java-agent/agent-tooling/src/main/java/datadog/trace/agent/tooling/muzzle/MuzzleCheck.java

AlexeyKuznetsov-DD approved these changes Nov 20, 2025

View reviewed changes

mcculls mentioned this pull request Jan 7, 2026

🪞 9979 - Record detailed instrumentation errors during tests #10300

Merged

mcculls self-requested a review January 7, 2026 10:39

mcculls requested changes Jan 7, 2026

View reviewed changes

github-actions Bot added the tag: stale Stale pull requests label May 1, 2026

gh-worker-dd-mergequeue-cf854d Bot closed this in #10300 May 22, 2026

Conversation

deejgregor commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Additional Notes

Example output

InstrumentationErrors and InstrumentationContext exception

Contributor Checklist

Uh oh!

AlexeyKuznetsov-DD Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deejgregor Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

AlexeyKuznetsov-DD Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

deejgregor Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

AlexeyKuznetsov-DD Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

deejgregor Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

deejgregor commented Nov 18, 2025

Uh oh!

deejgregor commented Nov 19, 2025

Uh oh!

AlexeyKuznetsov-DD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AlexeyKuznetsov-DD left a comment

Choose a reason for hiding this comment

Uh oh!

mcculls left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deejgregor commented Nov 16, 2025 •

edited

Loading

AlexeyKuznetsov-DD Nov 17, 2025 •

edited

Loading