🐛 Bug Report: Opentelemetry setup is not working properly when javascript is bundled.
📜 Description
The Backstage opentelemetry tutorial works in development but when the backend gets bundled it messes up the order of the code causing certain instrumentations to not be applied
👍 Expected behavior
All instrumentations from the getNodeAutoInstrumentations() metapackage should be applied after building the backend.
👎 Actual Behavior with Screenshots
Only certain instrumentations get applied, causing metrics to not be tracked.
Bellow is the bundled code which shows how the imports are being grouped together making the code for the instrumentation get pushed under all of the other imports. This is what is breaking the instrumentations. Per the documentation, the code for the sdk initialization needs to be at the top before all other imports.
'use strict';
var autoInstrumentationsNode = require('@opentelemetry/auto-instrumentations-node');
-----
other imports
-----
function _interopDefaultLegacy (e) { return e && typeof e === 'object' && 'default' in e ? e : { 'default': e }; }
function _interopNamespace(e) {
if (e && e.__esModule) return e;
var n = Object.create(null);
if (e) {
Object.keys(e).forEach(function (k) {
if (k !== 'default') {
var d = Object.getOwnPropertyDescriptor(e, k);
Object.defineProperty(n, k, d.get ? d : {
enumerable: true,
get: function () { return e[k]; }
});
}
});
}
n["default"] = e;
return Object.freeze(n);
}
var Router__default = /*#__PURE__*/_interopDefaultLegacy(Router);
var Docker__default = /*#__PURE__*/_interopDefaultLegacy(Docker);
var esb__default = /*#__PURE__*/_interopDefaultLegacy(esb);
var winston__namespace = /*#__PURE__*/_interopNamespace(winston);
var fetch__default = /*#__PURE__*/_interopDefaultLegacy(fetch);
const sdk = new sdkNode.NodeSDK({
metricReader: new exporterPrometheus.PrometheusExporter({}, () => {
console.log(
`OpenTelemetry prometheus scrape endpoint: http://localhost:${exporterPrometheus.PrometheusExporter.DEFAULT_OPTIONS.port}${exporterPrometheus.PrometheusExporter.DEFAULT_OPTIONS.endpoint}`
);
}),
instrumentations: [autoInstrumentationsNode.getNodeAutoInstrumentations()]
});
sdk.start();
👟 Reproduction steps
- Follow tutorial shown here. You might need to switch from console metrics exporter to something else, like prometheus, since the default one will spam a lot of logs https://backstage.io/docs/tutorials/setup-opentelemetry
- Add OTEL_LOG_LEVEL=DEBUG environment variable
- Start application in development mode and observe different instrumentations being applied. One of them is @opentelemetry/instrumentation-http
- Build your application and create a container from the built image.
- Add OTEL_LOG_LEVEL=DEBUG environment variable and start looking
- Observe that the @opentelemetry/instrumentation-http instrumentation and lots of other ones are not being applied.
- You can also check your bundled backend inside app/packages/backend/dist/*.cjs.js file which will confirm the problem with the order
📃 Provide the context for the Bug.
I played around with this a bit and I couldn't find a solution. The fix would be to somehow inject the instrumentation code at the beginning of the bundled backend application. This is something I have tested by directly editing the bundled backend by setting the configuration at the beginning and restarting the application. After that the instrumentations get applied correctly. There is also an option to copy the instrumentation.ts file inside the Dockerfile to have it there after everything gets build. With this we can than run the node application by specifying the --require option with the path to the instrumentation. I have not managed to get this working, but it could be possible
🖥️ Your Environment
No response
👀 Have you spent some time to check if this bug has been raised before?
- [X] I checked and didn't find similar issue
🏢 Have you read the Code of Conduct?
- [X] I have read the Code of Conduct
Are you willing to submit PR?
No, but I'm happy to collaborate on a PR with someone else
@SonilPro, are you using new backend system or the legacy?
Are you making sure to put this code in the packages/backend/src and importing it as early as possible to avoid this issue?
We've not run into this timing issue per se, so quite interested to work out what's going on here. Thinking that it could be something to do with different versions of otel being used - not sure.
@benjdlambert I am still using the legacy backend. The initialization is placed first in the index.ts (same as in the tutorial). The issues probably happens because the initialization code for the sdk gets pushed below all the imports after it has been bundled. You can test this by running yarn build:backend, letting it run for a second or two, canceling and then the index.cjs.js will be located under packages/backend/dist. Open it and you will see the wrong order. Same as in the code I provided above. When I manually go into the application files and move the initialization code back to the top of the file (before the imports) and rerun the application it starts applying the correct instrumentation.
I have not tried it with the new version, but if the bundled for the backend is the same (I believe this is rollup), that it will be the same outcome.
The otel versions are the picture below.
@SonilPro can you try moving around the imports somewhat.
Put the initialization code in another file, and require that file first in backend/src/index.ts rather than requiring other imports?
Does that help and fix the issue? I wonder if we should update the docs if it does.
Would also be good to add some documentation for the new backend system and how this all works there, as I don't think we run into this internally with the new backend system with the Catalog.
@benjdlambert Yes. The initialization code is inside the instrumentation.ts file. It is then imported at the top of the index.ts file. I tried putting the initialization code directly inside the index.ts (at the top), but no luck either. Just want to clear things up a bit. The built in metrics that Backstage created do work and are shown. The thing that doesn't work is instrumentation.
Oh right, that makes more sense.
I wonder if we can use something like this: https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/metapackages/auto-instrumentations-node#usage-auto-instrumentation to require the auto-instrumentations first before the init.
Try passing in NODE_OPTIONS and see if it requires first?
I have tried this, I believe it would work, but the problem is I don't see an option to specify a metrics reader or any of the other configuration options.
Have you tried doing both?
Yes. It registers the instrumentation but doesn't show them in the configured metrics reader.
We can however do something like this node --require './instrumentation.ts' app.js. I tried copying the file into the docker image and adding a require as well. But it didn't work. I will investigate a bit more.
Managed to get it to work, but it requires changing the docker image and having the instrumentation.ts file outside backend /src folder
I can do a quick POC PR
@HenrikStanley paging here as this may very well be why we don't see the scaffolder metrics but the others.
@SonilPro what exactly did you do to make it work?
@CasperGN the scaffolder metrics aren't on OpenTelemetry I think yet - They're still using prom-client I believe.
https://github.com/backstage/backstage/blob/685b693fc0e9ce5d59f5883e1c4a7e43bd51d111/plugins/scaffolder-backend/src/util/metrics.ts#L29-L73
This will all need updating to use opentelemetry/metrics instead.
@CasperGN Opened a PR. Works for me, but maybe someone could try to test as well. Or if someone has a better solution or concerns. https://github.com/backstage/backstage/pull/22648
@SonilPro awesome. We'll try it out and give feedback.
@CasperGN Did you manage to get it going?
Paging @HenrikStanley as he's the one doing the testing
@CasperGN Did you manage to get it going?
I have tested and done a review on the PR: https://github.com/backstage/backstage/pull/22648/files#
The TL;DR is that moving the instrumentation.ts file outside of the src/ folder seems to break imports.
@HenrikStanley Yeah, you would have to replace imports with require.
Also, I think you haven't finished the review. Can't see any comments
@HenrikStanley Yeah, you would have to replace imports with
require. Also, I think you haven't finished the review. Can't see any comments
Right you are.. Review added now. It is true I have not changed my import statements to require, we might want to update the docs to reflect this.
In the example, require is being used, but it might also be good to add a note as well.
So if I follow the current guide, by putting import './instrumentaton' at the top of packages/backend/src/index.ts as line 1, then I get instrumentation like http.server.duration and http.client.duration. What other instrumentation is missing from this list that is fixed when doing a different way?
@benjdlambert I think only a few instrumentations get applied properly with the previous setup. Now, with this new setup, every instrumentation from the list at the bottom of this page should work. I am also observing this when I configure OTEL_LOG_LEVEL=debug. In the logs There are a lot more instrumentations being applied. Before, there would only be a few
However, for the metrics, only the
http.server.duration and http.client.duration is additionally added. I am not sure if the other instrumentations even provide metrics. For example we can see that the instrumentation-pg is applied but I cannot see any metrics for it.
@benjdlambert I think only a few instrumentations get applied properly with the previous setup. Now, with this new setup, every instrumentation from the list at the bottom of this page should work. I am also observing this when I configure
OTEL_LOG_LEVEL=debug. In the logs There are a lot more instrumentations being applied. Before, there would only be a fewHowever, for the metrics, only the
http.server.durationandhttp.client.durationis additionally added. I am not sure if the other instrumentations even provide metrics. For example we can see that the instrumentation-pg is applied but I cannot see any metrics for it.
It seems to break the metrics when using the requires method. When using the imports method, I got the following metrics using the OTEL Exporter:
backend_tasks.task.runs.count
backend_tasks.task.runs.duration
catalog.processed.entities.count
catalog.processing.duration
catalog.processing.queue.delay
catalog.processors.duration
catalog.stitched.entities.count
catalog.stitching.duration
catalog.stitching.queue.length
catalog_entities_count
catalog_registered_locations_count
catalog_relations_count
http.client.duration
http.server.duration
Using the requires method I only get the following metrics:
backend_tasks.task.runs.count
backend_tasks.task.runs.duration
catalog.stitching.queue.length
catalog_registered_locations_count
catalog_relations_count
http.client.duration
http.server.duration
For our use case we have primarily looked at the metrics and not at the traces yet. Neither method seems to output every metric as described in: https://github.com/backstage/backstage/blob/master/contrib/docs/tutorials/prometheus-metrics.md
However some of those, especially the plugin related ones like the scaffolder ones, are likely due to a lack of OTEL support in the plugin itself.
Did some digging. Seems like the instrumentation for pg only exports traces. I am not familiarized with tracing and have no app to visualize them (we only use metrics). Maybe someone else could check that.
@HenrikStanley Hmm, for me they are all there. Did you try to wait a bit. Metrics will not appear if they are not triggered at least once. In this case, if a Backstage background job has not run yet
$
@HenrikStanley Hmm, for me they are all there. Did you try to wait a bit. Metrics will not appear if they are not triggered at least once. In this case, if a Backstage background job has not run yet
I waited roughly 10 minutes. However For the requires method I did switch to the console exporter as that was the only way to do a quick test without deploying an entire observability stack on my local test k8s cluster. On the local machine dev I have used the OTEL Exporter and just run the OTEL collector from a docker container. Which Exporter I use should not matter though as they should both export all the same data.
I will do some digging and see if I can replicate your results and get the same amount of metrics.
@HenrikStanley Hmm, for me they are all there. Did you try to wait a bit. Metrics will not appear if they are not triggered at least once. In this case, if a Backstage background job has not run yet
Turns out that the result where I get less metrics only seems to happen in our Backstage backend when using the ConsoleExport. If I vend a clean new backend with the ConsoleExporter I see the full amount. We have made very minimal changes to our backend, and I cannot see how they should impact this. However I will keep digging and see if I can get some more standardized results. I will report back when I know more.
@HenrikStanley Hmm, for me they are all there. Did you try to wait a bit. Metrics will not appear if they are not triggered at least once. In this case, if a Backstage background job has not run yet
Turns out that the result where I get less metrics only seems to happen in our Backstage backend when using the ConsoleExport. If I vend a clean new backend with the ConsoleExporter I see the full amount. We have made very minimal changes to our backend, and I cannot see how they should impact this. However I will keep digging and see if I can get some more standardized results. I will report back when I know more.
Are you testing in dev or when you bundle the application? Also, setting OTEL_LOG_LEVEL=debug might show something useful.
I think it would be easier to test in production when you have the bundled javascript. You will then be able to see if the code was injected correctly at the top of the backend index.ts file.
It might also be some dependency version issue. You might be using some old backstage code that didn't have all the metrics yet
@HenrikStanley Any updates on this?
However, for the metrics, only the 