Skip to content

Fix "Accessing resource attributes before async attributes settled" telemetry error#256880

Merged
mbondyra merged 5 commits intoelastic:mainfrom
mbondyra:fix_phoenix_traces
Mar 11, 2026
Merged

Fix "Accessing resource attributes before async attributes settled" telemetry error#256880
mbondyra merged 5 commits intoelastic:mainfrom
mbondyra:fix_phoenix_traces

Conversation

@mbondyra
Copy link
Copy Markdown
Contributor

This has been developed with Cursor, please check carefully.

Summary

  • Fixes a race condition in initTelemetry where initTracing and initMetrics were called with a Resource whose async detectors (host, OS, env, process) hadn't resolved yet, causing repeated [ERROR][telemetry] Accessing resource attributes before async attributes settled errors at startup
  • Defers tracer/metrics provider initialization until resource.waitForAsyncAttributes() resolves, while keeping auto-instrumentations registered synchronously so HTTP context propagation isn't missed

Details

resources.detectResources() uses async detectors that return promises. The resulting Resource has asyncAttributesPending = true until those promises settle. The previous code passed this resource directly to initTracing(), which meant:

  1. BaseInferenceSpanProcessor.onEnd and PhoenixSpanProcessor.processInferenceSpan accessed span.resource.attributes while promises were still pending
  2. OpenTelemetry's Resource.attributes getter logged the error and silently skipped unsettled attribute values, producing spans with incomplete resource data
  3. This could cause exported spans to not appear correctly in Phoenix/Langfuse

The fix awaits resource.waitForAsyncAttributes() before creating the tracer provider and span processors. Auto-instrumentations (maybeInitAutoInstrumentations) remain synchronous since they need to monkey-patch HTTP modules before any requests are made and don't depend on resource attributes. The deferral is safe because inference tracing spans only occur well after Kibana startup (~seconds), while async resource detection resolves in <100ms.

Test plan

  • Start Kibana with telemetry.tracing.enabled: true and a Phoenix exporter configured
  • Verify no Accessing resource attributes before async attributes settled errors in logs
  • Trigger an inference call (e.g. via Agent Builder) and verify the trace appears in Phoenix
  • Existing unit test (init_telemetry.test.ts) continues to pass — it validates resourceFromAttributes is called with correct attributes, which still happens synchronously

@mbondyra mbondyra added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting labels Mar 10, 2026
@mbondyra mbondyra marked this pull request as ready for review March 10, 2026 11:33
@mbondyra mbondyra requested review from a team as code owners March 10, 2026 11:33
Copy link
Copy Markdown
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally. LGTM

@elasticmachine
Copy link
Copy Markdown
Contributor

⏳ Build in-progress, with failures

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #7 / apiKeysManagementApp renders application and sets breadcrumbs
  • [job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store CCS logs extraction (test against local instance) - Should run CCS extraction for generic and write to updates then latest index
  • [job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store CCS logs extraction (test against local instance) - Should run CCS extraction for generic and write to updates then latest index
  • [job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store CCS logs extraction (test against local instance) - Should run CCS extraction for host and write to updates then latest index
  • [job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store CCS logs extraction (test against local instance) - Should run CCS extraction for host and write to updates then latest index
  • [job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store CCS logs extraction (test against local instance) - Should run CCS extraction for service and write to updates then latest index
  • [job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store CCS logs extraction (test against local instance) - Should run CCS extraction for service and write to updates then latest index
  • [job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store CCS logs extraction (test against local instance) - Should run CCS extraction for user and write to updates then latest index
  • [job] [logs] Scout: [ security / entity_store ] plugin / local-stateful-classic - Entity Store CCS logs extraction (test against local instance) - Should run CCS extraction for user and write to updates then latest index

History

@mbondyra mbondyra merged commit c4c5747 into elastic:main Mar 11, 2026
18 checks passed
mbondyra added a commit to mbondyra/kibana that referenced this pull request Mar 11, 2026
…e_fix

* commit '565f7545c422192218b803874fbdf93e8d8f08ee': (27 commits)
  [Lens API] ESQL schema for XY separately for Agent and some small token optimizations (elastic#256885)
  Fix "Accessing resource attributes before async attributes settled" telemetry error (elastic#256880)
  [Security Solution][Attacks/Alerts][Attacks page][Table section] Preserver "Sort by" state on Attacks page (elastic#256717) (elastic#256795)
  [APM] Improve redirect with default date range guard (elastic#256887)
  [Security Solution][Attacks/Alerts][Attacks page][Table section] Add assignees avatars to the group component (elastic#250126) (elastic#256901)
  [Docs] add xpack.alerting.rules.maxScheduledPerMinute setting description (elastic#257041)
  [SO] Fix non-deterministic ordering in nested find API integration tests (elastic#256447)
  [Write-restricted dashboards] Update user profile retrieval for getShouldAddAccessControl (elastic#255065)
  [One Workflow] Add Scout API test scaffold and execution tests (elastic#256300)
  [Fleet] add use_apm if dynamic_signal_types are enabled (elastic#256429)
  [Fleet] ignore data streams starting with `.` in Fleet API (elastic#256625)
  [ES|QL] METRICS_INFO support: columns_after & summary (elastic#256758)
  [Agent Builder] Agent plugins: initial installation support (elastic#256478)
  [Streams] Add field descriptions and documentation-only field overrides (elastic#255136)
  [api-docs] 2026-03-11 Daily api_docs build (elastic#257023)
  [Security Solution] fix alerts page infinite loading state due to data view error (elastic#256983)
  [Logging] Add `service.*` global fields (elastic#256878)
  [Canvas] Apply embeddable transforms to embeddable elements (elastic#252191)
  [table_list_view_table] stabilize jest test (elastic#254991)
  [Obs AI] get_index_info: add unit tests (elastic#256802)
  ...
sorenlouv pushed a commit that referenced this pull request Mar 17, 2026
…elemetry error (#256880)

This has been developed with Cursor, please check carefully. 

## Summary

- Fixes a race condition in `initTelemetry` where `initTracing` and
`initMetrics` were called with a `Resource` whose async detectors (host,
OS, env, process) hadn't resolved yet, causing repeated
`[ERROR][telemetry] Accessing resource attributes before async
attributes settled` errors at startup
- Defers tracer/metrics provider initialization until
`resource.waitForAsyncAttributes()` resolves, while keeping
auto-instrumentations registered synchronously so HTTP context
propagation isn't missed

## Details

`resources.detectResources()` uses async detectors that return promises.
The resulting `Resource` has `asyncAttributesPending = true` until those
promises settle. The previous code passed this resource directly to
`initTracing()`, which meant:

1. `BaseInferenceSpanProcessor.onEnd` and
`PhoenixSpanProcessor.processInferenceSpan` accessed
`span.resource.attributes` while promises were still pending
2. OpenTelemetry's `Resource.attributes` getter logged the error and
**silently skipped unsettled attribute values**, producing spans with
incomplete resource data
3. This could cause exported spans to not appear correctly in
Phoenix/Langfuse

The fix awaits `resource.waitForAsyncAttributes()` before creating the
tracer provider and span processors. Auto-instrumentations
(`maybeInitAutoInstrumentations`) remain synchronous since they need to
monkey-patch HTTP modules before any requests are made and don't depend
on resource attributes. The deferral is safe because inference tracing
spans only occur well after Kibana startup (~seconds), while async
resource detection resolves in <100ms.

## Test plan

- [ ] Start Kibana with `telemetry.tracing.enabled: true` and a Phoenix
exporter configured
- [ ] Verify no `Accessing resource attributes before async attributes
settled` errors in logs
- [ ] Trigger an inference call (e.g. via Agent Builder) and verify the
trace appears in Phoenix
- [ ] Existing unit test (`init_telemetry.test.ts`) continues to pass —
it validates `resourceFromAttributes` is called with correct attributes,
which still happens synchronously

---

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants