Skip to content

Add span links to messaging system tracing#2610

Merged
eyalkoren merged 21 commits intoelastic:mainfrom
eyalkoren:span-links
Jun 6, 2022
Merged

Add span links to messaging system tracing#2610
eyalkoren merged 21 commits intoelastic:mainfrom
eyalkoren:span-links

Conversation

@eyalkoren
Copy link
Copy Markdown
Contributor

@eyalkoren eyalkoren commented May 3, 2022

What does this PR do?

Closes #2489:

  • Adding basic span links capabilities
  • Add span link to polling spans in all messaging plugins
  • Kafka: when there is an active span before the beginning of the iteration - add span links to it for each of the polled records. Otherwise- create a transaction per record as a child of the sending span.
  • SpringAmqpBatchMessageListenerInstrumentation - same as Kafka (if makes sense)
  • AWS Lambda functions triggered by SNS/SQS: create one transaction always and add span links
  • Adjust documentation
  • Tested end-to-end - see screenshots below
  • Add to CHANGELOG, also as potentially breaking change regarding SQS/SNS (losing message context data and not rendered in service maps)

@ghost
Copy link
Copy Markdown

ghost commented May 3, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-06-06T12:41:16.166+0000

  • Duration: 53 min 47 sec

Test stats 🧪

Test Results
Failed 0
Passed 2939
Skipped 22
Total 2961

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark tests.

  • run jdk compatibility tests : Run the JDK Compatibility tests.

  • run integration tests : Run the Agent Integration tests.

  • run end-to-end tests : Run the APM-ITs.

  • run windows tests : Build & tests on windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@eyalkoren
Copy link
Copy Markdown
Contributor Author

run integration tests

@eyalkoren eyalkoren changed the title Add basic span links capabilities Add span links to messaging system tracing May 4, 2022
@eyalkoren
Copy link
Copy Markdown
Contributor Author

run integration tests

Copy link
Copy Markdown
Member

@AlexanderWert AlexanderWert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eyalkoren eyalkoren marked this pull request as ready for review May 30, 2022 13:57
@eyalkoren eyalkoren requested a review from SylvainJuge May 30, 2022 13:57
@github-actions
Copy link
Copy Markdown

/test

Copy link
Copy Markdown
Member

@SylvainJuge SylvainJuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly minor comments & questions.

errorPool.recycle(error);
}

public void recycle(TraceContext traceContext) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] maybe rename this method to recycleSpanLink to avoid any confusion with recycling another TraceContext.

Suggested change
public void recycle(TraceContext traceContext) {
public void recycleSpanLink(TraceContext traceContext) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it doesn't need to be specific for span links, it's for any potential use of pooled TraceContext, so I'll leave it just as is, similarly to all other recycle methods that differ by their argument type.

*/
public <H, C> boolean addSpanLink(TraceContext.ChildContextCreatorTwoArg<C, HeaderGetter<H, C>> childContextCreator,
HeaderGetter<H, C> headerGetter, @Nullable C carrier) {
if (spanLinks.size() == MAX_ALLOWED_SPAN_LINKS) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] we might reach the limit and still be fine if the span link is already in the spanLinks collection, we should probably issue this warning only when we are unable to add to the collection, which also means the limit has to be added to UniqueSpanLinkArrayList.

Copy link
Copy Markdown
Contributor Author

@eyalkoren eyalkoren May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started moving it, but losing the ability to log which span has the problem seems like a big compromise only for this extreme corner case where there are EXACTLY 1000 span links and repetitions in addition to that. Even if we encounter this somehow, this single logging is not so bad anyway.

Comment on lines +293 to +294
assertThat(spans.size()).isGreaterThanOrEqualTo(1);
verifyPollSpanContents(spans);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] I'm not sure to understand here why we now have 1+ instead of exactly one previously.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't without looking at the other changes - see https://github.com/elastic/apm-agent-java/pull/2610/files#diff-12e2c27883cbfcbc62d039b9bf284b58b2a743637f207e7309a5f9ce101ba544R372-R388 - for the new test I had to change the polling method so that it polls until all records are consumed, so it is non-deterministic how many polls actually occur. The other alternative to make it robust is wait for quite long before polling, which I didn't want.
Since I verify that these spans have the expected contents, and they all combined contain the expected span links, I think this is fine and properly reflects real scenarios.

List<Message> processedBatch = messageBatch;
Transaction batchTransaction = null;

if (tracer.isRunning() && messageBatch != null && !messageBatch.isEmpty()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] maybe return early in case where tracer is not running or batch is empty to prevent some if nesting

active);
}

active = tracer.getActive();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the active variable already set with the active span/transaction on line 80 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but line 88 potentially activates a transaction. This basically says: "if there was an active span before, or if we just created one and activated it, I want to add span links in both cases".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the flow of this method is a bit complicated and because the return value is complex one that may be affected by multiple decision branches, I prefer one return in such methods (related to the comment above), but it's definitely a mater of taste

eyalkoren and others added 4 commits May 31, 2022 16:26
…ain/java/co/elastic/apm/agent/kafka/KafkaConsumerInstrumentation.java

Co-authored-by: SylvainJuge <syl20j@gmail.com>
…c/main/java/co/elastic/apm/agent/kafka/NewKafkaPollExitInstrumentation.java

Co-authored-by: SylvainJuge <syl20j@gmail.com>
@eyalkoren
Copy link
Copy Markdown
Contributor Author

Below are screenshots reflecting the following scenario: a transaction sends two records to a Kafka "Request-Topic", waits for two corresponding reply records on a "Reply-Topic" and then iterates over the replies. Asynchronously, a "Batch-processing transaction" polls records from the "Request-Topic" and processes the record batch by sending a reply record for each request record.

The test transaction:
Screen Shot 2022-06-06 at 15 07 36
Each send span has a link to the batch-processing transaction:
Screen Shot 2022-06-06 at 15 25 26
Each poll span either contains no link if it returned without finding records or contains link/s to the corresponding span that reflects the send to the "Reply-Topic":
Screen Shot 2022-06-06 at 15 27 04
The transaction contains two links reflecting the two reply-records processing (linking to the corresponding send span):
Screen Shot 2022-06-06 at 15 16 26

On the batch processing side:
Screen Shot 2022-06-06 at 15 06 26
The transaction contains a link for each request record in the batch, corresponding the sending span:
Screen Shot 2022-06-06 at 15 06 42
Each reply record send span contains two links - one for the polling span and one for the processing span (the test transaction):
Screen Shot 2022-06-06 at 15 34 34

@eyalkoren eyalkoren enabled auto-merge (squash) June 6, 2022 12:41
@eyalkoren eyalkoren merged commit 37e66e0 into elastic:main Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[META 605] Add span links for message tracing

4 participants