feat(node-experimental): Use native OTEL Spans by mydea · Pull Request #9161 · getsentry/sentry-javascript

mydea · 2023-10-03T07:39:13Z

This PR changes the performance handling of the node-experimental package fundamentally, aligning it even more with the OpenTelemetry model.

### Tasks
- [x] Store breadcrumbs as `TimedEvent` on otel spans
- [x] Update `startSpan` / `startInactiveSpan` to create & expose OTEL spans (instead of sentry spans)
- [x] Update span processing to avoid global map & use exporter
- [x] Add unit tests
- [x] Add E2E test

The core changes are:

startSpan / startInactiveSpan create Otel spans now, not Sentry Spans/Transactions
Generally, Sentry spans/transactions should be completely gone, from a users POV
Splits the functionality that was previously in the SpanProcessor into a new SpanProcessor + SpanExporter
Ditches the global map for state in favor of some WeakMaps, which means we do not need to clean our references up manually anymore etc.
Breadcrumbs are stored as events on the otel spans

How does transaction/span creation work now

In the old model, we would start transactions/spans in the span processor onStart and onEnd hook. This required us to keep track of the parent Sentry span, as we need it to call parentSpan.startSpan() in onStart. Since it can be tricky to know when a span is not needed anymore as a parent etc, this made garbage collection harder and messier, and also required us to still sprinkle Sentry spans/transactions everywhere through our code.

In the new model, only minimal processing is done in the span processor, and importantly, we do not create any Sentry spans yet. We store some additional data we need later in a WeakMap associated to the (Otel) span.

Then we leverage the underlying BatchSpanProcessor from OTEL, which collects spans together and sends them for processing to a SpanExporter. So only finished spans end up in our span processor. Our custom span exporter does the following:

Builds a tree hierarchy of the spans
Picks the root spans that are found, and builds transactions from there - adding children down the tree
Every span/transaction is immediately finished (with the correct end time) and then sent
Note that we store the current scope when the transaction was created and apply this scope.

For now I copied most of the stuff from opentelemetry-node over, eventually we can merge most of this together probably and export the parts from opentelemetry-node.

How do breadcrumbs work

We now pick all events added to spans and add them as breadcrumbs.
For this, we walk up the tree of spans up to the root and collect all breadcrumbs together. We use a special JSON field for now to actually store the breadcrumbs data (TODO: Maybe there is a better way to do this...).
When we add a breadcrumb, we actually always add it to the root span for now, not the active span. The reason is that this works better with our mental model of breadcrumbs, where anything that happens in this root span is relevant. But we'll also pick up any other events added by otel instrumentation along the way.

Open questions

Should we apply the scope when a span is started, or when it is finished? OTEL uses the context when it was started, and the model makes this easier to implement, but if we prefer we can probably also pick the current context/scope in onEnd. But not 100% sure how this would work with parallel spans, needs to be tested I guess.

packages/e2e-tests/test-applications/node-experimental-fastify-app/event-proxy-server.ts

+      const sentryRequest = https.request(
+        sentryIngestUrl,
+        { headers: proxyRequest.headers, method: proxyRequest.method },
+        sentryResponse => {
+          sentryResponse.addListener('data', (chunk: Buffer) => {
+            proxyResponse.write(chunk, 'binary');
+            sentryResponseChunks.push(chunk);
+          });
+
+          sentryResponse.addListener('end', () => {
+            eventCallbackListeners.forEach(listener => {
+              const rawSentryResponseBody = Buffer.concat(sentryResponseChunks).toString();
+
+              const data: SentryRequestCallbackData = {
+                envelope: parseEnvelope(proxyRequestBody, new TextEncoder(), new TextDecoder()),
+                rawProxyRequestBody: proxyRequestBody,
+                rawSentryResponseBody,
+                sentryResponseStatusCode: sentryResponse.statusCode,
+              };
+
+              listener(Buffer.from(JSON.stringify(data)).toString('base64'));
+            });
+            proxyResponse.end();
+          });
+
+          sentryResponse.addListener('error', err => {
+            throw err;
+          });
+
+          proxyResponse.writeHead(sentryResponse.statusCode || 500, sentryResponse.headers);
+        },
+      );


github-actions · 2023-10-03T07:47:27Z

size-limit report 📦

Path	Size
@sentry/browser (incl. Tracing, Replay) - Webpack (gzipped)	84.24 KB (0%)
@sentry/browser (incl. Tracing) - Webpack (gzipped)	31.41 KB (0%)
@sentry/browser - Webpack (gzipped)	22 KB (0%)
@sentry/browser (incl. Tracing, Replay) - ES6 CDN Bundle (gzipped)	78.76 KB (-0.01% 🔽)
@sentry/browser (incl. Tracing) - ES6 CDN Bundle (gzipped)	28.59 KB (-0.01% 🔽)
@sentry/browser - ES6 CDN Bundle (gzipped)	21 KB (-0.01% 🔽)
@sentry/browser (incl. Tracing, Replay) - ES6 CDN Bundle (minified & uncompressed)	254.38 KB (0%)
@sentry/browser (incl. Tracing) - ES6 CDN Bundle (minified & uncompressed)	86.66 KB (0%)
@sentry/browser - ES6 CDN Bundle (minified & uncompressed)	62.35 KB (0%)
@sentry/browser (incl. Tracing) - ES5 CDN Bundle (gzipped)	31.45 KB (-0.01% 🔽)
@sentry/react (incl. Tracing, Replay) - Webpack (gzipped)	84.27 KB (0%)
@sentry/react - Webpack (gzipped)	22.05 KB (0%)
@sentry/nextjs Client (incl. Tracing, Replay) - Webpack (gzipped)	102.23 KB (0%)
@sentry/nextjs Client - Webpack (gzipped)	50.99 KB (0%)

Lms24

Ok, so I took a look at the PR and while I didn't review every single detail (frankly, I think I lack the context for a lot of the more otel-specific APIs), the general concept makes sense to me. Had some questions around the exporter and types but nothing blocking.

Should we apply the scope when a span is started, or when it is finished

I think it's fine (for now) to use the scope from when the span was started. Especially because it's how Otel does it and I think we generally want to stick with Otel in this package.

Another only tangentially related thought while reviewing: We definitely need to add proper docs for the package (however it is called) we release/maintain alongside @sentry/node during v8.

packages/node-experimental/src/opentelemetry/spanExporter.ts

Lms24 · 2023-10-06T13:03:24Z

packages/node-experimental/src/sdk/initOtel.ts

    diag.setLogger(otelLogger, DiagLogLevel.DEBUG);
  }

+  if (client) {


Just wondering: If client is undefined, should we even do anything in this function?

good question 😅 I would say right now that is more of a theoretical question, as that is just called by init() right after we called initNode(), so there should always be a client. But once we eventually split this up into more easily consumable parts, we'll need to handle these cases better.

Makes sense!

packages/node-experimental/src/sdk/scope.ts

Lms24 · 2023-10-06T13:10:54Z

packages/node-experimental/src/sdk/trace.ts

      span.end();
    }

+    _initSpan(span as OtelSpan, spanContext);


q: I'm seeing quite a lot of type casting when dealing with otel spans. Is this because Otel types are somehow wrong/too broad?

this is because @opentelemetry/api and everything related to that passes a basic Span type around, while we often need the spans that are actually generated by @opentelemetry/sdk-trace-base for certain things (because that has some more fields with things we need). But it's something we should look into when we stabilize this/split this up into better reusable parts, ideally we can avoid as much of this as possible 😬

actually here specifically I'll update it and just pass the regular 'spans' around. that's safer and prob. more correct anyhow!

Lms24 · 2023-10-06T13:13:06Z

packages/node-experimental/src/sdk/trace.ts


-function isTransaction(span: Span): span is Transaction {
-  return span instanceof Transaction;
+function _initSpan(span: OtelSpan, spanContext: NodeExperimentalSpanContext): void {


l: wdyt about calling this something around applySentryAttributesToSpan or similar?

sounds good to me! 👍

Lms24 · 2023-10-06T13:16:18Z

packages/node-experimental/src/sdk/transaction.ts

+/**
+ * This is a fork of the base Transaction with OTEL specific stuff added.
+ */
+export class NodeExperimentalTransaction extends Transaction {


l: iiuc, Transaction::finish shouldn't do anything anymore, right? Should we override it here to print a warning that it noops if it's called? I guess chances are low that users would call finish given that they'll only work with spans, so feel free to disregard.

actually users should not get a hold of a transaction ever 😅 startTransaction is not exported, and only used in the span exporter.

Lms24 · 2023-10-06T13:22:35Z

packages/node-experimental/src/opentelemetry/spanExporter.ts

+
+    this._finishedSpans.push(...spans);
+
+    const remainingSpans = maybeSend(this._finishedSpans);


q: When would there be remaining spans and what happens with them? Is it when we have multiple concurrent root spans?

There are two scenarios, one is "normal" and will happen, and one should not happen but may happen (who knows):

When a root span is not finished yet, all the child spans will remain in there. E.g. think one http request in a transaction may be finished and go to the exporter, while the overall transaction is still running. In this case, the http span will remain here until the root span is completed.

Somewhere along the way some span was dropped (for whatever reason), so the parent span of this span never gets to the exporter. This should not happen, but 🤷 So in this case we'll eventually clean this up and just discard the span.

I also added a comment to explain this a bit better!

thanks for explaining!

packages/node-experimental/src/opentelemetry/spanExporter.ts

mydea · 2023-10-09T08:50:37Z

I've updated this based on feedback from @Lms24 , thanks a lot.
I also went through and actually aligned the Span type used. Wherever possible, I try to avoid type casting it and just use import { Span } from '@opentelemetry/api', so the most generic span type. However, in some places this is not possible, as we expect/need more fields 😢 But I try to narrow this down as much as possible and use instance checks where possible to actually ensure this works as robustly as possible.

Use regular `Span` type from `@opentelemetry/api` wherever possible.

Lms24

Thanks for applying my feedback and answering my questions!

mydea requested review from AbhiPrasad and Lms24 October 3, 2023 07:39

mydea self-assigned this Oct 3, 2023

github-advanced-security bot found potential problems Oct 3, 2023

View reviewed changes

Lms24 approved these changes Oct 6, 2023

View reviewed changes

Lms24 reviewed Oct 6, 2023

View reviewed changes

packages/node-experimental/src/opentelemetry/spanExporter.ts Outdated Show resolved Hide resolved

mydea force-pushed the fn/potel-native-spans branch from 1531c77 to 5e984c2 Compare October 6, 2023 14:53

feat(node-experimental): Use native OTEL Spans

613269d

mydea force-pushed the fn/potel-native-spans branch from 5e984c2 to 613269d Compare October 9, 2023 07:41

align span & function names

385636a

Use regular `Span` type from `@opentelemetry/api` wherever possible.

mydea force-pushed the fn/potel-native-spans branch from 7c7d4e6 to 385636a Compare October 9, 2023 09:07

Lms24 approved these changes Oct 9, 2023

View reviewed changes

ref: More lenient checks for span types

9286ec8

mydea merged commit aeb4462 into develop Oct 9, 2023

mydea deleted the fn/potel-native-spans branch October 9, 2023 10:19


		this._finishedSpans.push(...spans);

		const remainingSpans = maybeSend(this._finishedSpans);

Uh oh!

Conversation

mydea commented Oct 3, 2023

How does transaction/span creation work now

How do breadcrumbs work

Open questions

Uh oh!

Check failure

github-actions bot commented Oct 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

size-limit report 📦

Uh oh!

Lms24 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mydea commented Oct 9, 2023

Uh oh!

Lms24 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Oct 3, 2023 •

edited

Loading