Skip to content

feat(telemetry): support custom resource attributes and add metric cardinality controls #4365

@doudouOUC

Description

@doudouOUC

What would you like to be added?

Two related telemetry capabilities that today are both missing:

  1. Custom resource attribute support — allow operators to attach
    arbitrary resource attributes (e.g. user_id, team, env,
    cost_center) to every span / log / metric emitted by Qwen Code,
    via standard OTel mechanisms (OTEL_RESOURCE_ATTRIBUTES,
    OTEL_SERVICE_NAME) and a .qwen/settings.json equivalent.

  2. Metric cardinality controls — give operators include/exclude
    toggles for high-cardinality attributes on metric time series
    (session.id, user.account_uuid, app.version), so metric
    backends (Prometheus, ARMS Metric, etc.) are not blown up by
    per-session series fan-out.

These are bundled into one sub-issue because they are coupled: adding
custom resource attribute support without cardinality controls makes
it easy for an operator to accidentally explode metric storage.

This sub-issue covers the P3 line "Add resource attribute policy
and cardinality controls"
in #3731. It does not cover the
separate OTLP request-header work (P1 "Add OTLP headers support in
.qwen/settings.json" and Foundation "Add explicit static OTLP
header support and tests") — those are about HTTP request headers
to the backend, not attributes on the telemetry data itself.

Design doc:
docs/design/telemetry-resource-attributes-design.md
— full design: layering, merge precedence, reserved keys, config
schema, file-by-file changes, phased PR split, test plan, migration
notes, and a comparison with the Claude Code implementation.

Why is this needed?

Custom resource attribute support — currently zero

packages/core/src/telemetry/sdk.ts:156-161 constructs the OTel
Resource manually with only three attributes:

const resource = resourceFromAttributes({
  [SemanticResourceAttributes.SERVICE_NAME]: SERVICE_NAME,
  [SemanticResourceAttributes.SERVICE_VERSION]:
    config.getCliVersion() || 'unknown',
  'session.id': config.getSessionId(),
});

And sdk.ts:274-278 explicitly disables the OTel SDK's automatic
resource detectors:

sdk = new NodeSDK({
  resource,
  autoDetectResources: false,
  ...
});

That disables the standard envDetector which would normally read
OTEL_RESOURCE_ATTRIBUTES and OTEL_SERVICE_NAME. As a result,
neither standard OTel environment variable is honored today, and
there is no settings.json equivalent either.

Compare with Claude Code, which honors OTEL_RESOURCE_ATTRIBUTES
out of the box (per its monitoring docs) so operators can do:

export OTEL_RESOURCE_ATTRIBUTES="team=platform,env=prod,cost_center=eng-123"

Note: simply flipping autoDetectResources: true is not the fix
— the comment in sdk.ts:275-277 explains the existing detectors
are async and trigger diag.error from HttpInstrumentation span
creation before they settle. The fix is to parse
OTEL_RESOURCE_ATTRIBUTES synchronously and merge it into the manual
resourceFromAttributes call.

Metric cardinality controls — currently zero

Two compounding issues:

  1. session.id is set as a resource attribute, so it
    automatically attaches to every signal — including metrics.
    Every CLI session produces a new value, so any metric backend will
    see unbounded time-series fan-out.
  2. There is no include/exclude toggle for any high-cardinality
    attribute on metrics.

This means qwen-code metrics are not safe to scrape into
Prometheus / ARMS Metric / VictoriaMetrics today without a
cardinality guard at the collector / write side. Operators have no
in-product knob.

Claude Code's precedent: per-attribute include toggles, metric
only
(spans and logs are per-event and not affected by cardinality
in the same way):

  • OTEL_METRICS_INCLUDE_SESSION_ID
  • OTEL_METRICS_INCLUDE_ACCOUNT_UUID
  • OTEL_METRICS_INCLUDE_VERSION

Suggested implementation

Part 1 — custom resource attributes

  • Add parseOtelResourceAttributes() helper that synchronously parses
    process.env.OTEL_RESOURCE_ATTRIBUTES (key1=val1,key2=val2,
    percent-decoded), no async detector
  • Add .qwen/settings.json key
    telemetry.resourceAttributes: Record<string, string>
  • Honor OTEL_SERVICE_NAME for service.name
  • Merge order (lowest → highest priority):
    OTEL_RESOURCE_ATTRIBUTESsettings.telemetry.resourceAttributes
    → built-in (service.name, service.version, session.id)
  • Keep autoDetectResources: false — do not regress the existing
    diag.error fix

Part 2 — metric cardinality controls

  • Move session.id out of the Resource and into per-span /
    per-log attribute injection (it is already set on spans via
    session-tracing.ts; the resource-level duplication is what leaks
    into metrics)
  • Add config (settings.json + env var) for include toggles, default
    false:
    • telemetry.metrics.includeSessionId
      / QWEN_TELEMETRY_METRICS_INCLUDE_SESSION_ID
    • telemetry.metrics.includeVersion
      / QWEN_TELEMETRY_METRICS_INCLUDE_VERSION
    • (extend list as needed for any future high-card attribute)
  • Apply filter at metric instrument call sites in metrics.ts
    wrap counter.add() / histogram.record() attribute payloads
    through a filterMetricAttributes(config, attrs) helper

Acceptance criteria

  • OTEL_RESOURCE_ATTRIBUTES and OTEL_SERVICE_NAME are honored and
    appear on emitted spans, logs, and metrics
  • .qwen/settings.json telemetry.resourceAttributes works and is
    merged with env-var values per the documented precedence
  • Span attributes still carry session.id for trace correlation
  • Metric time series do not carry session.id by default;
    setting the include toggle re-enables it explicitly
  • autoDetectResources remains false and no new
    diag.error regressions appear in debug logs
  • Docs (docs/developers/development/telemetry.md) document both
    custom resource attributes and cardinality toggles
  • Unit tests cover: env var parsing (including percent-encoding),
    settings + env var merge precedence, metric attribute filtering on
    / off, session.id not present on metrics by default

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions