[DRAFT] NOT MERGING POC: Send Node.js runtime metrics via OTLP using OTel-native naming#7869
Closed
link04 wants to merge 19 commits into
Closed
[DRAFT] NOT MERGING POC: Send Node.js runtime metrics via OTLP using OTel-native naming#7869link04 wants to merge 19 commits into
link04 wants to merge 19 commits into
Conversation
Contributor
Overall package sizeSelf size: 5.79 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.1 | 82.56 kB | 817.39 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: f0615c5 | Docs | Datadog PR Page | Give us feedback! |
BenchmarksBenchmark execution time: 2026-05-07 21:33:58 Comparing candidate commit f0615c5 in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 1748 metrics, 95 unstable metrics. scenario:shimmer-runtime-asyncdeclared-wrapfn-20
|
link04
added a commit
to DataDog/system-tests
that referenced
this pull request
Apr 9, 2026
New scenario OTLP_RUNTIME_METRICS that sets DD_METRICS_OTEL_ENABLED=true alongside DD_RUNTIME_METRICS_ENABLED=true. Tests verify OTel-native metric names (dotnet.*, jvm.*, go.*, v8js.*) appear in OTLP payloads and that DD-proprietary names (runtime.dotnet.*, runtime.go.*) do not. All languages marked as missing_feature in manifests until POC PRs are merged: - .NET: DataDog/dd-trace-dotnet#8299 - Go: DataDog/dd-trace-go#4611 - Node.js: DataDog/dd-trace-js#7869 - Java: DataDog/dd-trace-java#10985 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
3 tasks
Adds v8js.memory.heap.*, process.cpu.utilization, process.memory.usage, nodejs.eventloop.delay.* as OTel instruments on the existing OTLP metrics pipeline. Routes to OTLP when config.otelMetricsEnabled is true, skipping DogStatsD to avoid double-reporting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused sinon import - Remove unused DEFAULT_INTERVAL constant - Use ternary in router per lint rule - Break long lines under 120 char limit - Add trailing commas per style Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the mapped metric (nodejs.eventloop.utilization) and higher percentiles. Fixes lint errors. Total: 13 OTel metrics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add assertions for nodejs.eventloop.delay.p90, p99, and nodejs.eventloop.utilization. Verify disabled state for new event loop metrics. Matches existing runtime_metrics.spec.js pattern of verifying every metric by name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rest of the tracer (agent writer) calls increment(), gauge(), etc. on the runtime metrics module. When OTLP replaces DogStatsD, these methods must exist as noops to avoid TypeError crashes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Assert exactly 13 instruments - Assert no DD-proprietary names (runtime.node.*) - Matches .NET PR #8457 test pattern Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dd-trace-js custom MeterProvider does not implement addBatchObservableCallback. Switch to per-instrument addCallback which is the pattern used by all existing OTel metrics in the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirrors the .NET PR's SystemRuntimeMetricsFlowThroughOtlpPipeline test: builds a real MeterProvider + PeriodicMetricReader, registers our 13 runtime instruments via otlp_runtime_metrics.start(), forces a flush, and asserts the captured export contains all expected metric names, the correct scope name, and positive observed values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Audited each instrument against the upstream Node-specific OTel semantic convention YAMLs (model/v8js, model/nodejs, model/process) and the @opentelemetry/instrumentation-runtime-node community implementation. Findings + fixes: - 5 memory metrics had wrong instrument type. v8js.memory.heap.used, v8js.memory.heap.limit, v8js.memory.heap.space.available_size, v8js.memory.heap.space.physical_size, and process.memory.usage are updowncounters per spec (state metrics that can go up or down). Switched from createObservableGauge to createObservableUpDownCounter. This matches the spec; OTel contrib still uses gauge here, mirroring the same drift the .NET PR addressed. - nodejs.eventloop.delay.* used unit 'ns' and multiplied by 1e6. perf_hooks.monitorEventLoopDelay returns nanoseconds, so the old conversion inflated values by 1e6. Switched to unit 's' and divide by 1e9, matching both the spec and OTel contrib's EventLoopDelayCollector. - nodejs.eventloop.utilization reported the cumulative-since-startup ratio, which converges to an average rather than reflecting recent utilization. Now computes a delta vs the last sample, mirroring OTel contrib's EventLoopUtilizationCollector. Tests now cover instrument types per metric, the unit fixes, and the sum-vs-gauge wire shape after the OtlpTransformer. End-to-end flow test is unchanged in shape but now passes a fourth assertion that exercises the OTLP serialization path.
The Node.js OTel semantic conventions list 7 event loop delay metrics (min, max, mean, stddev, p50, p90, p99). We were emitting 6, missing stddev. The histogram already exposes the value, so this is a single ObservableGauge in seconds, matching the unit and instrument type the spec defines and the @opentelemetry/instrumentation-runtime-node contrib package uses. Bumps unit-test and pipeline-flow expected counts to 14. Co-authored-by: Cursor <cursoragent@cursor.com>
…NERS The previous layout put new test files under test/runtime_metrics/, a new subdirectory that fell outside both the test:trace:core glob and any CODEOWNERS rule. Existing convention is flat: src/runtime_metrics/foo.js is tested by test/foo.spec.js, mirroring how runtime_metrics.spec.js sits next to runtime_metrics.js's test root, not in a subdir. - Move both spec files up one level (path adjusts in proxyquire/require). - CODEOWNERS gets two specific entries paralleling the existing runtime_metrics.spec.js line, owned by apm-sdk-capabilities-js. - Picks up packages/dd-trace/test/*.spec.js glob in test:trace:core automatically, so verify-exercised-tests stops flagging.
Replaces the per-file entry for runtime_metrics.spec.js (and would-be extra lines for otlp_runtime_metrics.spec.js and .flow.spec.js) with a single glob /packages/dd-trace/test/*runtime_metrics*.spec.js owned by apm-sdk-capabilities-js. Net: -2 lines, same coverage.
…ec.js
Three things at once, all keyed off the existing repo trend rather than
adding new infrastructure:
1. Move the OTLP unit + flow tests into runtime_metrics.spec.js (the
existing test file for the runtime_metrics module) instead of the
separate test/runtime_metrics/*.spec.js files I had under a new
subdirectory. Drops the new directory, no CODEOWNERS / package.json
glob update needed — the existing entry covers everything.
2. Trim from 13 OTLP tests to 4 with the same coverage:
- registers all 14 OTel-native metrics with spec-correct types,
units, and callbacks (locks in the spec partition)
- observes positive values + emits required attributes
(v8js.heap.space.name, process.cpu.state)
- skips event loop metrics when disabled and is restartable
- flushes all 14 metrics with correct wire shape and scope name
end-to-end through MeterProvider + PeriodicMetricReader +
OtlpTransformer
Each one consolidates a previously per-behavior assertion. Same
spec coverage, ~210 fewer lines.
3. Add JSDoc to start(config) / stop() in otlp_runtime_metrics.js per
CLAUDE.md.
Source directory was never owned (legacy runtime_metrics.js included). Mergegate flagged the new otlp_runtime_metrics.js as unowned. One directory entry covers both the legacy file and the new OTLP module, parallels the existing test-side ownership rule, owned by the same team that owns the corresponding test file.
Master invokes runtimeMetrics.start before initializeOpenTelemetryMetrics, so when our OTLP runtime metrics module calls metrics.getMeterProvider(), it gets the noop provider — instruments register against it and never export. Move the OTel logs/metrics init blocks to fire before runtimeMetrics.start.
b037585 to
78a5949
Compare
- Drop verbose JSDoc and section markers; keep only comments that explain non-obvious behavior (ns→s conversion, noop method contract, ELU baseline capture). - Drop cross-language references in test descriptions. - Extract three multi-callsite helpers (defineHeapStat, defineHeapSpaceStat, defineEventLoopDelay) — the event-loop helper collapses 7 near-identical blocks. Single-callsite cases stay inline.
track, boolean, histogram, count, gauge each had zero production callers across the whole codebase. They were mirroring the legacy runtime_metrics.js interface for safety — but per the no-fallbacks-for- scenarios-that-cant-happen rule, leaving them in is just dead weight. Only increment and decrement remain (17 + 6 real callers in opentracing/span.js and opentracing/tracer.js).
- Fix dispatcher reading config.otelMetricsEnabled (renamed to config.DD_METRICS_OTEL_ENABLED in master 7209b4f). - Cache os.cpus().length once at start instead of per-callback. - Drop unreachable null guard on getMeterProvider() (always returns at least the noop provider). - Tighten increment/decrement noop comment. - Add JSDoc with v8/perf_hooks types on the three file-private helpers per CLAUDE.md "all new methods should receive a full JSDoc comment".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sends v8js.memory.heap., process.cpu.utilization, process.memory.usage, nodejs.eventloop.delay. via OTLP. Related: DataDog/dd-trace-dotnet#8299
🤖 Generated with Claude Code