Skip to content

Wrap code in new tracing contexts where required#88920

Merged
pugnascotia merged 9 commits intoelastic:mainfrom
pugnascotia:use-more-tracing-contexts
Aug 3, 2022
Merged

Wrap code in new tracing contexts where required#88920
pugnascotia merged 9 commits intoelastic:mainfrom
pugnascotia:use-more-tracing-contexts

Conversation

@pugnascotia
Copy link
Copy Markdown
Contributor

@pugnascotia pugnascotia commented Jul 28, 2022

Part of #84369. Split out from #88443. This PR wraps parts of the code
in a new tracing context. This is necessary so that a tracing
implementation can use the thread context to propagate tracing headers,
but without the code attempting to set the same key twice in the thread
context, which is illegal.

Note that in some places we actually clear the tracing context
completely. This is done where the operation to be performed should have
no association with the current trace context. For example, when
creating a new index via a REST request, the resulting background tasks
for the index should not be associated with the REST request in
perpetuity.

Part of elastic#84369. Split out from elastic#88443. This PR wraps parts of the code
either in a new tracing context. This is necessary so that a tracing
implementation can use the thread context to propagate tracing headers,
but without the code attempting to set the same key twice in the thread
context, which is illegal.

Note that in some places we actually clear the tracing context
completely. This is done where the operation to be performed should have
no association with the current trace context. For example, when
creating a new index via a REST request, the resulting background tasks
for the index should not be associated with the REST request in
perpetuity.
@pugnascotia pugnascotia added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >refactoring v8.5.0 labels Jul 28, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Jul 28, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@pugnascotia pugnascotia added the :Core/Infra/Core Core issues without another label label Jul 28, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jul 28, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Copy Markdown
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pugnascotia
Copy link
Copy Markdown
Contributor Author

I'll hold on merging this until @original-brownbear has had a change to review.

Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented inline, we have to do this a little differently.

public void onAfter() {
request.decRef();
}
public void onFailure(Exception e) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not ok, the executor might directly call onFailure when it rejects the runnable because its queue is full. We have to go through the normal error handling mechanism here in this callback if we are using AbstractRunnable.

public void onFailure(Exception e) {
handleSendToLocalException(channel, e, action);
}
public void onFailure(Exception e) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can't do this. We need to use the callback when using the exeuctor with an AbstractRunnable


@Override
public void onAfter() {
request.decRef();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this go, regardless of the fact that we need to do this differently, it seems we lost the decRef here?

Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer less magic in terms of how we create the ContextPreservingAbstractRunnable and just do it outright as commented on inline. But other than that, this looks like it should work fine.

/**
* Should the runnable start a new tracing context before it executes?
*/
public boolean useNewTraceContext() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes more sense to just create a utility method on ContextPreservingAbstractRunnable that wraps an AbstractRunnable in a ContextPreservingAbstractRunnable with new trace context true flag set to true and use that?
That way we don't have to leak the high level tracing into low level AbstractRunnable. It's also probably less verbose than overriding to return true in multiple spots.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m struggling to see how this would work. ContextPreservingAbstractRunnable is private and only created in ThreadContext.preserveContext(Runnable), but I need to communicate the need for a new trace context from where the runnable is created in the first place, for example in InboundHandler.

I thought about creating a named sub-class of AbstractRunnable e.g. TracingAbstractRunnable or something like that, but I couldn't see how that would avoid the need to changing AbstractRunnable, at least not without duplicating much of ContextPreservingAbstractRunnable.

I don't think there's a simpler answer for where to insert this logic, because there's a lot of parts that are collaborating here. In the case of InboundHandler, we see:

  • InboundHandler creates an AbstractRunnable instance and schedules it
  • The scheduler asks the thread context to wrap the runnable
  • The thread context potentially wraps the runnable
  • The scheduler executes the runnable
  • AbstractRunnable#run orchestrates doRun, onFailure etc

We could change the visibility of ContextPreservingAbstractRunnable maybe, and sometimes create one explicitly? e.g. ContextPreservingAbstractRunnable.preservingContext(AbstractRunnable)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could just add a method to ThreadContext similar to org.elasticsearch.common.util.concurrent.ThreadContext#preserveContext that takes an AbstractRunnable and does the wrapping for you without even having to expose ContextPreservingAbstractRunnable?

You have the thread context available via the threadPool and the threadpool will just pass through a ContextPreservingAbstractRunnable unchanged when executing => less code in the callsites and no need to change anything about AbstractRunnable right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear ah, you're right. I had another go, please take another look.

public void onResponse(NodesStatsResponse nodesStatsResponse) {
logger.trace("received node stats response");
try (var ignored = threadPool.getThreadContext().clearTraceContext()) {
client.admin().cluster().nodesStats(nodesStatsRequest, ActionListener.runAfter(new ActionListener<>() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is fairly extreme in terms of how noisy it is line wise.
Maybe, also in the sense of future readability of this code, could we extract what's inside the try-with-resources into a new method here and in the other spots that got indented so extremely now? That way the change-set stays small because the code doesn't move in terms of indent level and things don't get much more complicated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored most of these sites with the extra wrapping to extract methods.

Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the extra iterations Rory!

@pugnascotia pugnascotia merged commit 9285249 into elastic:main Aug 3, 2022
@pugnascotia pugnascotia deleted the use-more-tracing-contexts branch August 3, 2022 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Core Core issues without another label :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >refactoring Team:Core/Infra Meta label for core/infra team Team:Distributed Meta label for distributed team. v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants