Introduce tracing interfaces by pugnascotia · Pull Request #87921 · elastic/elasticsearch

pugnascotia · 2022-06-22T14:36:13Z

Part of #84369. Split out from #87696. Introduce tracing interfaces in advance of adding APM support to Elasticsearch. The only implementation at this point is a no-op class.

elasticmachine · 2022-06-22T14:36:19Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

pugnascotia · 2022-06-22T15:37:08Z

@rjernst this is actually a draft since it builds on #87917.

pugnascotia · 2022-07-08T14:53:27Z

@rjernst I think this is ready for review. We can continue the discussion from the PR #87696 here.

pugnascotia · 2022-07-14T09:51:43Z

@original-brownbear I'd appreciate your thoughts on this PR, it's a precursor to #88443, which I asked you about yesterday.

original-brownbear

I gave this a quick read and added a few questions that would help with giving better feedback here.

server/src/main/java/org/elasticsearch/http/DefaultRestChannel.java

original-brownbear · 2022-07-14T10:14:22Z

server/src/main/java/org/elasticsearch/rest/RestChannel.java

 * A channel used to construct bytes / builder based outputs, and send responses.
 */
-public interface RestChannel {
+public interface RestChannel extends Traceable {


I'm not sure I understand the need for making this interface more complex.
Can't we simply compute these things inline where we currently pass the channel to the tracer? What's the value of making this interface more complex and associating more state (looking at the new empty methods added here) with the channel potentially?

I've tried to simplify this by moving the default implementations to AbstractRestChannel, which is a better home for them anyway. Does that help?

Not better by much I'm afraid. My main concern is simply with the fact that the RestChannel as an abstraction is already kind of broken and on the list of things to refactor away.

With this change we get a lot less flexible in fixing the abstractions here. If you just inline the logic from here to where we call the tracer on the other hand, the change has zero negative impact on the flexibility we have in refactoring the REST layer.
Looking at this from a different perspective: There really isn't a clear definition of the lifecycle of a RestChannel today so either this doesn't need to live on the rest channel and is just needless complexity or it introduces new subtle requirements on its lifecycle. Just inlining the code and making it completely independent of the channel instances avoids trouble either way.

original-brownbear · 2022-07-14T10:28:27Z

server/src/main/java/org/elasticsearch/rest/RestChannel.java

+        return attributes;
+    }
+
+    void setTracePath(String path);


What is this going to be used for? Unless we have to, we shouldn't start associating more state with the implementations of this interface. They are already hard to maintain and on the list of things to refactor, due to the difficulty behind the releasing of the x-content builders that come out of them.

When tracing REST requests, APM typically names spans and transactions using a concatenation of the HTTP method and request path. However, it's better for the name to be low cardinality, so if you're using path variables, you want to use paths that still have the /{var} placeholders. This trace path stuff is all about getting access to the path without variables.

original-brownbear · 2022-07-14T10:35:48Z

server/src/main/java/org/elasticsearch/tracing/Traceable.java

+/**
+ * Something which maps onto a <i>span</i> in a distributed trace.
+ */
+public interface Traceable {


I'm not sure I understand the motivation or need for this interface.
It appears a Traceable is just a tuple of two strings and a map and each call-site of a Tracer will only ever receive a single type anyway.
Why can't we just pass two strings and a map instead and put the logic for stuff like the fact that the span id is the task id as a string to the call sites? That way we avoid the added complexity of blowing up various of interfaces and having to implement rather redundant things like the overrides in the delegate rest response channel.
Unless we have call sites where we would e.g. pass either a channel or a task, this seems like the wrong level of abstraction to me.

The abstraction was introduced in the original SpaceTime project. TBH I never thought about changing it.

We have this abstraction because we won't stop with tracing tasks. For example, in pugnascotia@06d6105 (and bear in mind that is just a quickly hacked-together example, hence it does some questionable things), we're tracing FetchSubPhaseProcessor instances, and there's quite a few implementations of that interface. It's not difficult to imagine that each span might want to add different attributes to the span, and it would be better for that logic to live on the individual implementations that rather in the FetchPhase class.

It's also important the the logic for the span ID is the same when opened and closing spans, so having it in once place via the interface ensures this.

Having said all that, I would be particularly interested for us to add tracing to e.g. repository API requests (S3), so I would be very interested to know whether you think this API could work for that, or if we need to make some adjustments. Would you be able to do that?

It's also important the the logic for the span ID is the same when opened and closing spans, so having it in once place via the interface ensures this.

But it's not really "one place". Now we're forcing new implicit life-cycle requirements on things like the rest-channel (which to me seems the worst one from this perspective) and making our whole codebase less flexible because any change to functionality that implements this interface now has to account for the tracing logic.

If you build the three objects here in-line where they're actually used you avoid complicating the codebase like that and we retain all flexibility. For the specific case of the fetch phase we could still add an abstraction to cover different implementations maybe but for all the other spots forcing this interface on top of random existing classes will increase the cost of future changes needlessly as far as I can tell.
If you think it's easiest to piggy back onto existing classes at this point, a better solution to this may be to just have RestChannel or Task return a Traceable via a getter without introducing a common abstraction across these very different classes. Then if we want to change or remove any of these we can just pull the Traceable out of them and pass it around some other way pretty easily? (also doing this Traceable could just become a record and doesn't even need to be an interface maybe because we don't have any state associated with an instance anyway as its getters only?)

original-brownbear · 2022-07-14T10:36:36Z

server/src/main/java/org/elasticsearch/rest/AbstractRestChannel.java


    private BytesStream bytesOut;

+    private String tracePath;


What is the trace path?

original-brownbear · 2022-07-14T10:41:18Z

server/src/main/java/org/elasticsearch/tracing/Tracer.java

+
+    void onTraceException(Traceable traceable, Throwable throwable);
+
+    void setAttribute(Traceable traceable, String key, boolean value);


Can you add some javadoc on what these are supposed to do in a real implementation here? Would this be recorded to somewhere like a log line or would we create state associated with a specific Traceable here that we'd have to take care of at some point?

I've added more class-level Javadoc to this class, which I hope covers this.

Instead of messing around with supporting tracing methods directly on a `RestChannel`, construct a `RestController` with a `Tracer` and trace requests the usual way. As a result, strip out added code from `RestChannel` and `HttpTracer`. Also add Javadoc to `Tracer`.

…art-2

original-brownbear

Thanks for the responses Rory, I think I understand the ideas here better now. I tried to come up with an alternative suggestion here that retains the current level of flexibility in the code. Let me know what you think :)

original-brownbear · 2022-07-20T10:20:39Z

server/src/main/java/org/elasticsearch/rest/RestChannel.java

 * A channel used to construct bytes / builder based outputs, and send responses.
 */
-public interface RestChannel {
+public interface RestChannel extends Traceable {


Not better by much I'm afraid. My main concern is simply with the fact that the RestChannel as an abstraction is already kind of broken and on the list of things to refactor away.

With this change we get a lot less flexible in fixing the abstractions here. If you just inline the logic from here to where we call the tracer on the other hand, the change has zero negative impact on the flexibility we have in refactoring the REST layer.
Looking at this from a different perspective: There really isn't a clear definition of the lifecycle of a RestChannel today so either this doesn't need to live on the rest channel and is just needless complexity or it introduces new subtle requirements on its lifecycle. Just inlining the code and making it completely independent of the channel instances avoids trouble either way.

original-brownbear · 2022-07-20T10:33:08Z

server/src/main/java/org/elasticsearch/tracing/Traceable.java

+/**
+ * Something which maps onto a <i>span</i> in a distributed trace.
+ */
+public interface Traceable {


It's also important the the logic for the span ID is the same when opened and closing spans, so having it in once place via the interface ensures this.

But it's not really "one place". Now we're forcing new implicit life-cycle requirements on things like the rest-channel (which to me seems the worst one from this perspective) and making our whole codebase less flexible because any change to functionality that implements this interface now has to account for the tracing logic.

If you build the three objects here in-line where they're actually used you avoid complicating the codebase like that and we retain all flexibility. For the specific case of the fetch phase we could still add an abstraction to cover different implementations maybe but for all the other spots forcing this interface on top of random existing classes will increase the cost of future changes needlessly as far as I can tell.
If you think it's easiest to piggy back onto existing classes at this point, a better solution to this may be to just have RestChannel or Task return a Traceable via a getter without introducing a common abstraction across these very different classes. Then if we want to change or remove any of these we can just pull the Traceable out of them and pass it around some other way pretty easily? (also doing this Traceable could just become a record and doesn't even need to be an interface maybe because we don't have any state associated with an instance anyway as its getters only?)

…art-2

pugnascotia · 2022-07-20T14:50:53Z

@original-brownbear I experimented with dropping Traceable entirely in f0f0d77, please take a look and see what you think.

pugnascotia · 2022-07-20T14:51:54Z

Argh, I fumbled the commit, please see 00d4b50 instead.

original-brownbear

Thanks Rory this is pretty much exactly what I had in mind! The approach now LGTM :)

original-brownbear · 2022-07-20T16:32:54Z

server/src/main/java/org/elasticsearch/http/DefaultRestChannel.java

        String contentLength = null;
+        final Runnable onFinish = () -> {
+            Releasables.close(toClose);
+            tracer.onTraceStopped(traceId);


why not just add () -> tracer.onTraceStopped(traceId) to toClose?

I have no idea, it's so obvious now you've pointed it out 🤷‍♂️

original-brownbear

Some comments on code details and one on docs, looks good otherwise!

original-brownbear · 2022-07-21T12:12:21Z

server/src/main/java/org/elasticsearch/rest/RestController.java

+            name = restPath;
+        }
+
+        Map<String, Object> attributes = new HashMap<>();


Maybe pre-size this correctly e.g. using Maps. utilities? This isn't on the super hot path but sill it's on a network thread :)

original-brownbear · 2022-07-21T12:25:32Z

server/src/test/java/org/elasticsearch/rest/RestControllerTests.java

+     * and the exception is captured in the span.
+     */
+    public void testDispatchUnsupportedHttpMethodTracesException() {
+        final RestRequest request = RestRequest.request(parserConfig(), new HttpRequest() {


Can't we override FakeHttpRequest here? Seems we only want the method() override right?

original-brownbear · 2022-07-21T12:27:17Z

server/src/test/java/org/elasticsearch/tasks/TaskManagerTests.java

+                }
+            },
+            null,
+            new ActionListener<>() {


If you want this shorter, you can use ActionTestUtils.assertNoFailure(r -> {})

original-brownbear · 2022-07-21T12:30:50Z

server/src/main/java/org/elasticsearch/tasks/TaskManager.java


+    private void startTrace(ThreadContext threadContext, Task task) {
+        TaskId parentTask = task.getParentTaskId();
+        Map<String, Object> attributes = new HashMap<>();


We could pre-size the map correctly here or just use Map.of( (with a slight bit of repetition for the task id key but whatever) to save some memory and make this compile nicer since its on the hot path.

original-brownbear · 2022-07-21T12:34:15Z

server/src/main/java/org/elasticsearch/http/DefaultRestChannel.java


+        final String traceId = "rest-" + this.request.getRequestId();
+
        final ArrayList<Releasable> toClose = new ArrayList<>(3);


can make this a 4 now :)

original-brownbear · 2022-07-21T12:39:21Z

server/src/main/java/org/elasticsearch/rest/RestController.java

+        startTrace(threadContext, channel, null);
+    }
+
+    private void startTrace(ThreadContext threadContext, RestChannel channel, String restPath) {


We can pass the RestRequest here, we don't need the channel I think (and should have the request available in a variable at most callsites already anyway.
Mainly asking for this since I noticed we already have some strange redundancy in caller methods where we pass both the channel and the request (which we also have on the channel) which is confusing.

original-brownbear · 2022-07-21T12:42:57Z

server/src/main/java/org/elasticsearch/tracing/Tracer.java

+ * methods, e.g.  {@link #setAttribute(String, String, String)}. This allows you
+ * to attach data that is not available when the span is opened.
+ */
+public interface Tracer {


I think we should add a comment here that the methods must have performance characteristics similar to logging since they're all used on hot transport threads.
If we implement this interface to send out network requests on any method we need to make sure that it's all non-blocking and with appropriate queuing to avoid slowing down transport threads.

original-brownbear

LGTM if CI is happy, though Ryan should probably have a look as well to make sure he's good with the overall approach also.
Thanks for the extra iterations here!

rjernst

Thanks for splitting this out from the larger PR. I left a few thoughts.

rjernst · 2022-07-19T16:58:56Z

server/src/main/java/org/elasticsearch/http/AbstractHttpServerTransport.java

        }

-        final HttpTracer trace = tracer.maybeTraceRequest(restRequest, exception);
+        final HttpTracer maybeHttpTracer = httpTracer.maybeTraceRequest(restRequest, exception);


It's confusing to any reader why there is both Tracer and HttpTracer in this class. Perhaps at least a comment somewhere explaining the difference? Is this something that can eventually be subsumed by the new tracer? I see it also bleeds into other classes, maybe also add the explanation there, or reference to where the explanation exists.

I agree, it is confusing, and for a while I have HttpTracer implementing Tracer, but I ended up taking it as part of simplifying some stuff with Armin. The fundamental problem is that HttpTracer is actually just a logger, but we reference it in our docs so renaming the class isn't 100% trivial. However I've renamed many of the uses / instances of HttpTracer so make it clearer that it's about logging.

Thanks for the rename of the member, it does help in the reading. However it is still confusing since the class is still called HttpTracer.

we reference it in our docs so renaming the class isn't 100% trivial.

I think this could be handled through keeping the HttpTracer class, but with no impl, so that whatever the class is renamed to can still have LogManager.getLogger(HttpTracer.class). We do need a way deprecate such logging; it is an existing problem that our class and package names leak into logger configuration since it restricts us from easily changing, or we accidentally make a configuration breaking change without realizing it.

The rename to HttpLogger I suggest above isn't crucial for this PR, but I do think it should be done as a followup.

rjernst · 2022-07-25T05:18:44Z

server/src/main/java/org/elasticsearch/plugins/Plugin.java

+    /**
+     * Called with a Tracer so that each plugin has a chance to work with it.
+     */
+    public void onTracer(Tracer tracer) {}


"onXXX" doesn't really fit with other plugin methods (the one exception is onIndexModule, but that is a holdover from the older style push based model, which index specific plugin extensions still use). The purpose of this seems to be to give plugins this global Tracer object, so that they can stash it to use later.

If the tracer object needs to be passed around the entire system, then it doesn't seem any better than just having actual global state for this purpose. Otherwise every time we realize something else needs access to the tracer, we have to plumb it through, which it seems like could eventually be the entire system? If I've misread the purpose of this object, then it would seem better as an argument to createComponents, where other "common" systems are passed to plugins so they can be utilized in plugin service construction.

rjernst · 2022-07-25T05:39:27Z

server/src/main/java/org/elasticsearch/node/Node.java

    }

+    private Tracer getTracer(Collection<Object> pluginComponents) {
+        final List<Tracer> tracers = pluginComponents.stream().map(c -> c instanceof Tracer t ? t : null).filter(Objects::nonNull).toList();


Thanks to java modules, we have a newer way of achieving this type of extensibility that we only intend a single implementation. See RestInterceptorActionPlugin. Basically you would have a specific interface for creating the tracer, and that package would only be exported to the module implementing it. Since this implementation is coming in a followup PR, it could be in a new unqualified export in this PR, then make it qualified in the followup.

server/src/main/java/org/elasticsearch/tracing/Tracer.java

The class `HttpTracer` is really a logging class, which is unfortunate because we're introducing the `Tracer` interface for actual distributed tracing. Change how the class is used to make it clearer that its purpose is logging. Note that we can't rename the class without also affecting how the HTTP logging is configured and documented, so that is left for another time.

pugnascotia · 2022-07-25T12:50:12Z

@rjernst I'm not sure if this is exactly what you intended, but: I introduced a TracerPlugin interface, which is a marker for plugins that can create a Tracer. This allows me to create a tracer and pass it into createComponents(), instead of waiting until createComponents() is called on all the plugins and then pulling it out of the components.

The TracerPlugin class could later have its visibility restricted. Does that work?

rjernst · 2022-07-25T17:09:34Z

The TracerPlugin class could later have its visibility restricted. Does that work?

Yes, the TracerPlugin + createComponents changes are what I was thinking there.

rjernst

LGTM assuming you agree with "startTrace" and "stopTrace"

pugnascotia · 2022-07-25T19:04:59Z

LGTM assuming you agree with "startTrace" and "stopTrace"

Oh man, I meant to rename them that way before. 🤦‍♂️

Split out from elastic#88443. Part of elastic#84369. Use the tracing API that was added in elastic#87921 in TaskManager. This won't actually do anything until we provide a tracer with an actual implemenation.

Split out from #88443. Part of #84369. Use the tracing API that was added in #87921 in TaskManager. This won't actually do anything until we provide a tracer with an actual implemenation.

pugnascotia added >feature >non-issue :Core/Infra/Core Core issues without another label v8.4.0 labels Jun 22, 2022

pugnascotia requested a review from rjernst June 22, 2022 14:36

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jun 22, 2022

pugnascotia marked this pull request as draft June 22, 2022 15:36

Introduce tracer interfaces

8f3f5f1

pugnascotia force-pushed the apm-integration-part-2 branch from c913b8b to 8f3f5f1 Compare July 7, 2022 12:43

Tweaks

c99f313

pugnascotia marked this pull request as ready for review July 8, 2022 14:52

pugnascotia mentioned this pull request Jul 11, 2022

Instrument Elasticsearch with APM #84369

Closed

21 tasks

pugnascotia requested a review from original-brownbear July 14, 2022 09:50

original-brownbear reviewed Jul 14, 2022

View reviewed changes

pugnascotia added 6 commits July 14, 2022 13:01

Remove withScope from DefaultRestChannel

c0f9eeb

Merge remote-tracking branch 'upstream/master' into apm-integration-p…

e384c71

…art-2

Formatting

585f200

Revert changes to HttpTracer

eb55854

More Javadoc

d6ec348

original-brownbear reviewed Jul 20, 2022

View reviewed changes

pugnascotia added 2 commits July 20, 2022 12:36

Merge remote-tracking branch 'upstream/master' into apm-integration-p…

a677db8

…art-2

Drop Traceable interface, shift work to call sites

f0f0d77

Actually shift work to call sites

00d4b50

pugnascotia added 3 commits July 20, 2022 16:25

Imports

c1ad5ea

Fix test

adbc7c6

Damn you, spotless

98099e9

original-brownbear reviewed Jul 20, 2022

View reviewed changes

Cleanup up DefaultRestChannel

fbfa576

original-brownbear reviewed Jul 21, 2022

View reviewed changes

pugnascotia added 2 commits July 21, 2022 14:44

Address review feedback

d3e4b40

Spotless

d1e2b89

original-brownbear approved these changes Jul 21, 2022

View reviewed changes

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:06

rjernst reviewed Jul 25, 2022

View reviewed changes

pugnascotia added 6 commits July 25, 2022 10:11

Merge remote-tracking branch 'upstream/main' into apm-integration-part-2

69c1774

Rename tracer methods to remove 'on' prefix

083be2a

Introduce TracerPlugin interface

46a86ab

Add Tracer to createComponents() signature

ad00db8

Remove onTracer() method

1dfb9c4

pugnascotia self-assigned this Jul 25, 2022

rjernst approved these changes Jul 25, 2022

View reviewed changes

Rename Tracer methods properly this time 🤦

1f51c63

pugnascotia added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 25, 2022

elasticsearchmachine merged commit 5c5981d into elastic:main Jul 25, 2022

pugnascotia deleted the apm-integration-part-2 branch July 25, 2022 20:02

pugnascotia mentioned this pull request Jul 28, 2022

Use tracing API in TaskManager #88885

Merged


		void onTraceException(Traceable traceable, Throwable throwable);

		void setAttribute(Traceable traceable, String key, boolean value);


		final String traceId = "rest-" + this.request.getRequestId();

		final ArrayList<Releasable> toClose = new ArrayList<>(3);

Conversation

pugnascotia commented Jun 22, 2022

Uh oh!

elasticmachine commented Jun 22, 2022

Uh oh!

pugnascotia commented Jun 22, 2022

Uh oh!

pugnascotia commented Jul 8, 2022

Uh oh!

pugnascotia commented Jul 14, 2022

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pugnascotia commented Jul 20, 2022

Uh oh!

pugnascotia commented Jul 20, 2022

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

rjernst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!