Instrumentation test error visibility#9979
Conversation
|
|
||
| public class InstrumentationErrors { | ||
| private static final AtomicLong COUNTER = new AtomicLong(); | ||
| private static final List<String> ERRORS = Collections.synchronizedList(new ArrayList<>()); |
There was a problem hiding this comment.
❓ (For my own understanding): Why we need synchronized list here (and similar place in other class)?
Maybe we can use some concurrent data structure here instead?
There was a problem hiding this comment.
Short answer (not a great answer): because I'm an old-school Java programmer and still learning some modern (Java 8, heh) features. ;-)
Longer answer: these few places all had an AtomicLong and when I replaced it with a List Collections.synchronizedList(new ArrayList<>()) was my muscle-memory replacement. It looks like CopyOnWriteArrayList is a decent alternative in this case, so that's what I'll be doing.
Updated:
- https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-82dd6ac77ecfcdc8b0ff7590d7370d00c33f7ac78ca37f78cbf841c0ced625b3R10
- https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-2328d587d0382c573151b6c14fa9bab635d7482f5075206e22f0bf7114910834R16
- https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-39d1612c29708df2c2a24a9d0888e933768aaecd8461e0691f6f6e28edf0050eR169
| def instrumentationErrorCount = InstrumentationErrors.getErrors().size() | ||
| assert instrumentationErrorCount == 0, instrumentationErrorCount + " instrumentation errors were seen:\n" + InstrumentationErrors.getErrors().join("\n---\n") | ||
| def muzzleErrorCount = MuzzleCheck.getErrors().size() | ||
| assert muzzleErrorCount == 0, muzzleErrorCount + " muzzle errors were seen:\n" + MuzzleCheck.getErrors().join("\n---\n") |
There was a problem hiding this comment.
nitpick: I think Groovy string interpolation can be used here?
There was a problem hiding this comment.
Yup! Thank you for the suggestion.
Updated:
- https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-39d1612c29708df2c2a24a9d0888e933768aaecd8461e0691f6f6e28edf0050eR510
- https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-39d1612c29708df2c2a24a9d0888e933768aaecd8461e0691f6f6e28edf0050eR512
- https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-39d1612c29708df2c2a24a9d0888e933768aaecd8461e0691f6f6e28edf0050eR555
| @@ -47,7 +49,21 @@ class ListWriterAssert { | |||
| @ClosureParams(value = SimpleType, options = ['datadog.trace.agent.test.asserts.ListWriterAssert']) | |||
| @DelegatesTo(value = ListWriterAssert, strategy = Closure.DELEGATE_FIRST) Closure spec) { | |||
| try { | |||
| writer.waitForTraces(expectedSize) | |||
| // Fail fast if we see any muzzle errors or instrumentation errors | |||
| for (int timeoutRemaining = 20; timeoutRemaining > 0; timeoutRemaining--) { | |||
There was a problem hiding this comment.
nitpick: probably it make sense to declare named const for magic number 20?
There was a problem hiding this comment.
Good suggestion--I introduced a constant and used it in two places:
Updates:
- https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-0d9fec4a568cdb6980879f651c3a51870a04b096ba70ebcbfc6cd4b72a4ee32dR20
- https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-0d9fec4a568cdb6980879f651c3a51870a04b096ba70ebcbfc6cd4b72a4ee32dR88
- https://github.com/DataDog/dd-trace-java/pull/9979/files#diff-130cc10c95eb5a319363779ba4f67115eebd9f12b06593a8625c75d47f6df67dR53
|
@AlexeyKuznetsov-DD thank you for the feedback. I'll work on the updates in the next day or two. |
This puts the details front and center in the test summary and is much easier than hunting back through log messages. Note: In InstrumentationErrors, errors are only recorded when enableRecordingAndReset has been called. This is only done from the InstrumentationSpecification test specification, so we don't need to worry about the ArrayList growing without bounds in production. I considered adding a limit to the ArrayList, but opted not to for simplicity. I also wanted to avoid silently discarding some errors.
If there is a blocked instrumentation, the test will fail and the failure message will include the same information that is logged from MuzzleCheck.
974b975 to
547e34d
Compare
|
@AlexeyKuznetsov-DD I made the recommended changes, rebased on latest master and force pushed. I made one additional update: I realized there could be a case where the assertions in |
AlexeyKuznetsov-DD
left a comment
There was a problem hiding this comment.
PR in general LGTM, but better to get one more approve from someone more familiar with codebase.
AlexeyKuznetsov-DD
left a comment
There was a problem hiding this comment.
LGTM, but better to have one more approval from code owners.
mcculls
left a comment
There was a problem hiding this comment.
Hi @deejgregor - thanks for the contribution, there are some CI failures that I need to look into - and some cleanup around ExceptionHandlers to reduce the impact in production (since we can rely on telemetry, and therefore don't need the additional argument stack allocations)
I'll add comments here in the next week or so
|
This pull request has been marked as stale because it has not had activity over the past quarter. It will be closed in 7 days if no further activity occurs. Feel free to reopen the PR if you are still working on it. |
What Does This Do
This improves visibility of errors in instrumentation tests built on top of
InstrumentationSpecificationby failing the test with details of the errors, in particular these cases:InstrumentationErrorsby the Byte BuddyExceptionHandlersclass.InstrumentationSpecification'sAgentBuilder.Listener.onError.Also,
ListWriterAssert.assertTraceswill now fail fast when the errors from the first two cases above are reported.Lastly, the code comment in
ExceptionHandlershas been updated to reflect the current implementation andInstrumentationContextexceptions are more explicit about the likely causes of why the methods might not have been rewritten.Motivation
These changes should reduce discovery and investigation time during development when there are instrumentation errors. Every single one of these issues are ones that I've run while developing instrumentations (most of them I've run into multiple times). ;-) These changes have helped me out enough (particularly for the first two use cases above) that I realized I really should see about contributing this back to help others--especially for those new to developing instrumentation, but also to assist those that have been doing it awhile when the inevitable error slips through.
Previously, the first error case above was not checked in tests using
InstrumentationSpecification, and was only visible in log output, and the other two error cases just included error counts. In all of these cases, the developer would have to go digging in log output to find existence and/or details of errors.When
ListWriterAssert.assertTraceswas used previously, if there were instrumentation errors that led to traces not being properly generated, tests will often fail due to failed assertions (missing spans, incorrect tags, etc.) and the developer might spend significant time chasing down the cause of the assertion failures to learn it was an instrumentation error. With these changes, if the cause was an underlying instrumentation error,ListWriterAssert.assertTraceswill fail fast for the first two use cases above and clearly indicate there was an instrumentation error.These changes will not only help our human developers by giving better context when there are failures, but also when using AI coding assistants.
Additional Notes
One possible improvement: The fail fast change could easily apply to all of the use cases above including the last use case if
InstrumentationSpecification'sAgentBuilder.Listener.onErrorrecorded the error toInstrumentationErrors.Best reviewed commit-by-commit:
Example output
InstrumentationErrors and InstrumentationContext exception
I commented-out the
contextStoremethod onHikariPoolInstrumentation.Contributor Checklist
type:and (comp:orinst:) labels in addition to any useful labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]