Skip to content

feat: Added host.name to OTel traces#2307

Merged
nimrod-teich merged 1 commit into
mainfrom
feat/added-hostname-to-traces
Jun 2, 2026
Merged

feat: Added host.name to OTel traces#2307
nimrod-teich merged 1 commit into
mainfrom
feat/added-hostname-to-traces

Conversation

@sotskov-do

@sotskov-do sotskov-do commented May 31, 2026

Copy link
Copy Markdown
Contributor

What

Attach the host's name as a resource attribute (host.name) to every OTel span emitted by lavap (consumer, provider, and smart router). This lets us map collected traces back to
the region a consumer/provider runs in.

How

  • protocol/tracing/trace_manager.go: factored resource construction into newResource() and added OTel's built-in host detector (resource.WithHost()), which sets the standard
    host.name attribute from os.Hostname(). Enabled unconditionally — no flag, no opt-in.
  • Ordered WithFromEnv() last so operator config wins on conflicts: OTEL_SERVICE_NAME overrides the in-code service name, and OTEL_RESOURCE_ATTRIBUTES=host.name=...
    overrides the detected hostname (e.g. to label a clean region instead of a pod hostname).

Override example

OTEL_SERVICE_NAME=lava-consumer
OTEL_RESOURCE_ATTRIBUTES=host.name=eu-west-1-consumer-3,deployment.environment=production

Tests

  • TestNewResource_HostName — default resource carries os.Hostname().
  • TestNewResource_HostNameOverridableViaEnvOTEL_RESOURCE_ATTRIBUTES overrides the detected hostname.
  • Added OTEL_RESOURCE_ATTRIBUTES to clearTracingEnv so the suite stays hermetic.

No protocol-surface or config changes; standard OTel SDK behavior only.

Example

image

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@codecov

codecov Bot commented May 31, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
protocol/tracing/trace_manager.go 85.71% 1 Missing ⚠️
Flag Coverage Δ
consensus 8.96% <ø> (ø)
protocol 35.57% <85.71%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
protocol/tracing/trace_manager.go 74.34% <85.71%> (+2.53%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Test Results

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
7 files   ±0   0 ❌ ±0 

Results for commit 15d25a6. ± Comparison against base commit b8cd292.

♻️ This comment has been updated with latest results.

@avitenzer avitenzer left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat: Added host.name to OTel traces

Reviewed the change end-to-end and verified the two load-bearing OTel-SDK behaviors (detector merge precedence and schema-URL handling) against the go.opentelemetry.io/otel/sdk v1.38.0 source. The full protocol/tracing suite passes, and go vet / gofmt are clean.

Verdict: clean, well-scoped, well-documented. The mechanism is correct. One robustness edge case is worth considering plus a few trivial notes — none blocking.

What's done well

  • The precedence comment is accurate and earns its place. Detector merge-ordering in resource.New is genuinely non-obvious. Confirmed: detect folds each detector via Merge(res, r) and Merge(a, b) lets b win on key collisions, so later detectors win — WithFromEnv() being last is what gives operators the override. TestNewResource_HostNameOverridableViaEnv exercises this.
  • No schema-URL conflict — checked, because it's the classic footgun here. Only WithHost() contributes a non-empty schema URL (semconv.SchemaURL); WithAttributes and WithFromEnv produce schemaless resources, and merging empty-with-non-empty never triggers ErrSchemaURLConflict. Verified empirically (newResource returns no error in the tests).
  • The hermeticity fix is correct and important. Adding OTEL_RESOURCE_ATTRIBUTES to clearTracingEnv stops the override test from leaking host.name=eu-west-1-consumer into sibling tests. Easy to forget.
  • The newResource extraction is the right move — it makes resource construction unit-testable without standing up the full exporter/SDK pipeline.

Potential issue (worth considering, non-blocking)

Adding WithHost() introduces a new, environmental trigger for a fatal startup path.

resource.WithHost() calls os.Hostname(). On error, the SDK's stringDetector.Detect returns a plain error (not ErrPartialResource); detect then joins it and continues with the error still set, so resource.New returns a usable partial resource plus a non-nil error. That error flows up through newResourcetracing.New, and all three call sites abort startup:

// rpcconsumer.go:684, rpcprovider.go:900, rpcsmartrouter.go:1589
traceManager, err := tracing.New(ctx, traceCfg)
if err != nil {
    return err   // returned from cobra RunE → the lavap binary fails to start
}

The point isn't probability — os.Hostname() essentially never fails on Linux (realistically only ENAMETOOLONG, a >64-byte hostname, which normal pod/host names won't hit). The point is asymmetry: before this PR, newResource could only fail on operator-malformed OTEL_RESOURCE_ATTRIBUTES (a config error). This adds the first environmental, non-config way for it to fail — and the consequence is a node that won't boot, with a failed to create OTel resource message that gives no hint the culprit is a hostname quirk. Since the PR makes this unconditional, the blast radius is every binary, all to protect an observability label that should never be able to take a node down.

Suggested fix (preferred): log-and-continue. This also matches OTel's guidance to treat resource.New errors as non-fatal. You already get a usable partial resource back, so in New():

res, err := newResource(ctx, cfg)
if err != nil {
    // resource.New returns a usable partial resource even on error (e.g. if
    // os.Hostname() fails). Warn and proceed rather than aborting node startup
    // over a missing observability attribute.
    utils.LavaFormatWarning("OTel resource detection incomplete; continuing", err)
}

(res is always non-nil here — resource.New allocates it before detection and mutates in place.) Keeps WithHost(), no new imports.

Narrower alternative: detect the hostname yourself with os.Hostname() and add it via WithAttributes only on success, dropping WithHost() — a failed lookup is then silently skipped and newResource keeps its old config-error-only failure profile. Touches only the new path, but the log-and-continue version is more robust since it also covers any future detector.

Minor / informational

  • Schema-URL side-effect (benign): before this PR the resource's schema URL was empty; after, it becomes the SDK's bundled semconv version (from WithHost()). Harmless — the attribute keys are stable across semconv versions — but backends keying on schema URL will now see one. Worth a note, not a change.
  • semconv version mix (cosmetic): the code pins semconv/v1.21.0 for ServiceName while WithHost() uses the SDK's newer internal semconv for the host attribute. Not a bug, just slightly inconsistent.
  • Test DRY: both new tests repeat the same for … range res.Attributes() scan; a small getAttr(res, "host.name") (string, bool) helper would tighten them. TestNewResource_HostNameOverridableViaEnv also asserts only the value, not presence — a dropped attr would give a slightly less clear "" != "eu-west-1-consumer" failure.
  • Coverage gap (optional): no test asserts that service.name and host.name coexist, nor that OTEL_SERVICE_NAME override still works now that a second detector is in the chain. Low value if service-name override is covered elsewhere.

Summary

Area Assessment
Correctness ✅ Precedence + schema-URL behavior verified against SDK source & tests
Conventions ✅ Matches the file's env-driven, well-commented style
Tests ✅ Pass; hermetic; minor DRY/coverage polish possible
Risk ⚠️ One new (low-probability, high-asymmetry) fatal startup path via os.Hostname()

I'd suggest the log-and-continue tweak before merge, but it's a judgment call — the change is otherwise solid.

Comment thread protocol/tracing/trace_manager.go
@sotskov-do sotskov-do force-pushed the feat/added-hostname-to-traces branch from 70189ee to 15d25a6 Compare June 2, 2026 07:54
@nimrod-teich nimrod-teich merged commit d566f83 into main Jun 2, 2026
30 checks passed
@nimrod-teich nimrod-teich deleted the feat/added-hostname-to-traces branch June 2, 2026 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants