Wrap code in new tracing contexts where required by pugnascotia · Pull Request #88920 · elastic/elasticsearch

pugnascotia · 2022-07-28T21:35:14Z

Part of #84369. Split out from #88443. This PR wraps parts of the code
in a new tracing context. This is necessary so that a tracing
implementation can use the thread context to propagate tracing headers,
but without the code attempting to set the same key twice in the thread
context, which is illegal.

Note that in some places we actually clear the tracing context
completely. This is done where the operation to be performed should have
no association with the current trace context. For example, when
creating a new index via a REST request, the resulting background tasks
for the index should not be associated with the REST request in
perpetuity.

Part of elastic#84369. Split out from elastic#88443. This PR wraps parts of the code either in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal. Note that in some places we actually clear the tracing context completely. This is done where the operation to be performed should have no association with the current trace context. For example, when creating a new index via a REST request, the resulting background tasks for the index should not be associated with the REST request in perpetuity.

elasticsearchmachine · 2022-07-28T21:35:37Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2022-07-28T21:36:06Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

grcevski

LGTM!

pugnascotia · 2022-07-29T15:30:08Z

I'll hold on merging this until @original-brownbear has had a change to review.

original-brownbear

Commented inline, we have to do this a little differently.

original-brownbear · 2022-08-01T10:20:48Z

server/src/main/java/org/elasticsearch/transport/InboundHandler.java

-                                    public void onAfter() {
-                                        request.decRef();
-                                    }
+                                    public void onFailure(Exception e) {}


This is not ok, the executor might directly call onFailure when it rejects the runnable because its queue is full. We have to go through the normal error handling mechanism here in this callback if we are using AbstractRunnable.

original-brownbear · 2022-08-01T10:21:13Z

server/src/main/java/org/elasticsearch/transport/TransportService.java

-                        public void onFailure(Exception e) {
-                            handleSendToLocalException(channel, e, action);
-                        }
+                        public void onFailure(Exception e) {}


Same here, can't do this. We need to use the callback when using the exeuctor with an AbstractRunnable

original-brownbear · 2022-08-01T10:22:40Z

server/src/main/java/org/elasticsearch/transport/TransportService.java

-
-                        @Override
-                        public void onAfter() {
-                            request.decRef();


Where did this go, regardless of the fact that we need to do this differently, it seems we lost the decRef here?

original-brownbear

I think I'd prefer less magic in terms of how we create the ContextPreservingAbstractRunnable and just do it outright as commented on inline. But other than that, this looks like it should work fine.

original-brownbear · 2022-08-02T14:32:01Z

server/src/main/java/org/elasticsearch/common/util/concurrent/AbstractRunnable.java

+    /**
+     * Should the runnable start a new tracing context before it executes?
+     */
+    public boolean useNewTraceContext() {


Maybe it makes more sense to just create a utility method on ContextPreservingAbstractRunnable that wraps an AbstractRunnable in a ContextPreservingAbstractRunnable with new trace context true flag set to true and use that?
That way we don't have to leak the high level tracing into low level AbstractRunnable. It's also probably less verbose than overriding to return true in multiple spots.

I’m struggling to see how this would work. ContextPreservingAbstractRunnable is private and only created in ThreadContext.preserveContext(Runnable), but I need to communicate the need for a new trace context from where the runnable is created in the first place, for example in InboundHandler.

I thought about creating a named sub-class of AbstractRunnable e.g. TracingAbstractRunnable or something like that, but I couldn't see how that would avoid the need to changing AbstractRunnable, at least not without duplicating much of ContextPreservingAbstractRunnable.

I don't think there's a simpler answer for where to insert this logic, because there's a lot of parts that are collaborating here. In the case of InboundHandler, we see:

InboundHandler creates an AbstractRunnable instance and schedules it

The scheduler asks the thread context to wrap the runnable

The thread context potentially wraps the runnable

The scheduler executes the runnable

AbstractRunnable#run orchestrates doRun, onFailure etc

We could change the visibility of ContextPreservingAbstractRunnable maybe, and sometimes create one explicitly? e.g. ContextPreservingAbstractRunnable.preservingContext(AbstractRunnable)?

I think you could just add a method to ThreadContext similar to org.elasticsearch.common.util.concurrent.ThreadContext#preserveContext that takes an AbstractRunnable and does the wrapping for you without even having to expose ContextPreservingAbstractRunnable?

You have the thread context available via the threadPool and the threadpool will just pass through a ContextPreservingAbstractRunnable unchanged when executing => less code in the callsites and no need to change anything about AbstractRunnable right?

@original-brownbear ah, you're right. I had another go, please take another look.

original-brownbear · 2022-08-02T14:33:48Z

server/src/main/java/org/elasticsearch/cluster/InternalClusterInfoService.java

-                public void onResponse(NodesStatsResponse nodesStatsResponse) {
-                    logger.trace("received node stats response");
+            try (var ignored = threadPool.getThreadContext().clearTraceContext()) {
+                client.admin().cluster().nodesStats(nodesStatsRequest, ActionListener.runAfter(new ActionListener<>() {


This change is fairly extreme in terms of how noisy it is line wise.
Maybe, also in the sense of future readability of this code, could we extract what's inside the try-with-resources into a new method here and in the other spots that got indented so extremely now? That way the change-set stays small because the code doesn't move in terms of indent level and things don't get much more complicated.

I refactored most of these sites with the extra wrapping to extract methods.

…ntexts

original-brownbear

LGTM thanks for the extra iterations Rory!

pugnascotia added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >refactoring v8.5.0 labels Jul 28, 2022

elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Jul 28, 2022

pugnascotia added the :Core/Infra/Core Core issues without another label label Jul 28, 2022

pugnascotia requested review from grcevski and original-brownbear July 28, 2022 21:35

elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jul 28, 2022

pugnascotia added 2 commits July 29, 2022 10:12

Fix compile issue

25d5633

Fix test

5281f1a

grcevski approved these changes Jul 29, 2022

View reviewed changes

original-brownbear suggested changes Aug 1, 2022

View reviewed changes

pugnascotia added 2 commits August 1, 2022 12:17

Take a different approach to tracing and AbstractRunnable

d94a2d5

Formatting

d07da2d

original-brownbear reviewed Aug 2, 2022

View reviewed changes

pugnascotia added 4 commits August 2, 2022 15:54

Refactor InternalClusterInfoService, put implementation into methods

c759377

Extract more trace context wrapped code into methods

055d080

Improve ContextPreservingAbstractRunnable abstraction

1edcc23

Merge remote-tracking branch 'upstream/main' into use-more-tracing-co…

fab9088

…ntexts

original-brownbear approved these changes Aug 3, 2022

View reviewed changes

pugnascotia merged commit 9285249 into elastic:main Aug 3, 2022

pugnascotia deleted the use-more-tracing-contexts branch August 3, 2022 10:15

Conversation

pugnascotia commented Jul 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 28, 2022

Uh oh!

elasticsearchmachine commented Jul 28, 2022

Uh oh!

grcevski left a comment

Choose a reason for hiding this comment

Uh oh!

pugnascotia commented Jul 29, 2022

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pugnascotia commented Jul 28, 2022 •

edited

Loading