feat(onboard): enable OpenClaw OTEL diagnostics#4686
Conversation
Add opt-in OpenClaw OTEL diagnostics wiring, policy presets, and build-time plugin support so NemoClaw environments can export local OTLP traces when requested.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds opt-in OpenTelemetry diagnostics for OpenClaw conversation traces: new env vars and docs, local OTEL policy preset, OTEL-aware preset merging and initial policy wiring, config generation/validation, diagnostics plugin install/version enforcement, Docker build-time transport patching, staged Dockerfile ARG injection, gateway lifecycle refinements, and tests. ChangesOpenClaw OTEL Conversation Diagnostics
Sequence Diagram(s) sequenceDiagram
participant CLI
participant PolicySelector
participant PresetMerger
participant ConfigGenerator
CLI->>PolicySelector: start setup (agent, env)
PolicySelector->>PresetMerger: compute suggestions + required OTEL presets
PresetMerger->>PolicySelector: return merged preset list
PolicySelector->>ConfigGenerator: generate OpenClaw config (env, presets)
ConfigGenerator-->>CLI: emit config + diagnostics plugin if enabled
Estimated code review effort: Suggested labels: Suggested reviewers:
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
🌿 Preview your docs: https://nvidia-preview-pr-4686.docs.buildwithfern.com/nemoclaw |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 1 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/policies.test.ts (1)
148-151:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreset count assertion is stale after adding the new OTEL preset.
You added a new preset name at Line 180, but the count assertion still expects
21at Line 150. This makes the test suite internally inconsistent and likely failing.✅ Suggested fix
- it("returns all 21 presets", () => { + it("returns all 22 presets", () => { const presets = policies.listPresets(); - expect(presets.length).toBe(21); + expect(presets.length).toBe(22); });Also applies to: 180-180
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/policies.test.ts` around lines 148 - 151, The test's hardcoded preset count is stale: update the assertion in the test that calls policies.listPresets() to reflect the added OTEL preset (change expected value from 21 to 22) or replace the fixed number check with a more robust assertion (e.g., assert presets includes the new preset name) to keep policies.test.ts (the it block invoking policies.listPresets()) consistent with the new preset added.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Dockerfile`:
- Line 508: The Dockerfile only declares ARG NEMOCLAW_OPENCLAW_OTEL while
patchStagedDockerfile() expects/rewrites NEMOCLAW_OPENCLAW_OTEL_ENDPOINT,
NEMOCLAW_OPENCLAW_OTEL_SERVICE_NAME, and NEMOCLAW_OPENCLAW_OTEL_SAMPLE_RATE; add
matching ARG declarations for NEMOCLAW_OPENCLAW_OTEL_ENDPOINT,
NEMOCLAW_OPENCLAW_OTEL_SERVICE_NAME, and NEMOCLAW_OPENCLAW_OTEL_SAMPLE_RATE
(with sensible defaults) near the existing ARG NEMOCLAW_OPENCLAW_OTEL so the
build-time rewrites in patchStagedDockerfile() take effect and OTEL
endpoint/service/sample-rate values propagate correctly.
In `@docs/reference/commands.mdx`:
- Around line 1860-1867: Replace the `console` fenced block containing the
Jaeger startup and onboard commands with a copyable bash block (no `$` prompts);
specifically change the block that includes the lines starting with "docker run
--rm --name nemoclaw-jaeger \\" and "NEMOCLAW_OPENCLAW_OTEL=1 nemoclaw onboard"
to use ```bash as the fence and remove any leading `$` prompt characters so the
two commands are directly copyable.
In `@src/lib/onboard.ts`:
- Line 3405: The added standalone line setting agentName causes a +1 net change;
remove this line and instead incorporate the defaulting without increasing file
size by either folding agentName into the existing object literal (use the same
object where the other fields are defined and place agentName: agent?.name ??
"openclaw" inline with those properties) or move the defaulting logic into the
onboarding policy initializer (update the symbol in initial-policy.ts that
constructs the agent payload to set agentName = agent?.name ?? "openclaw");
update references to the existing object or initializer (e.g., the object
literal where agentName was added and the initializer in initial-policy.ts) so
the default value is applied without adding new lines to this file.
In `@src/lib/onboard/dockerfile-patch.ts`:
- Around line 209-212: The current replacement of the Dockerfile ARG using
dockerfile = dockerfile.replace(new RegExp(`^ARG ${envKey}=.*$`, "m"), `ARG
${envKey}=${sanitizeDockerArg(rawValue)}`) can silently do nothing if no
matching ARG exists; update the code to detect whether the RegExp matched and
fail fast: run the RegExp.test() (or use String.prototype.match) against the
dockerfile for `^ARG ${envKey}=.*$` before performing the replace, and if there
is no match throw a descriptive Error (or return a failing result) that includes
envKey and rawValue so callers know the OTEL env key was not found; keep using
sanitizeDockerArg(...) for the replacement when a match exists.
In `@src/lib/onboard/gateway-destroy.ts`:
- Around line 43-52: The retry call to runOpenshell(["gateway","destroy",...])
should only run when the earlier probe indicates the CLI supports that legacy
subcommand; change the IIFE so after a failed runOpenshell(["gateway","remove",
gatewayName], ...) you check the probe (use the existing lifecycleCommands or
hasLifecycleCommands() value used earlier) and only invoke
runOpenshell(["gateway","destroy","-g", gatewayName], ...) when that probe shows
gateway destroy is available; otherwise return false immediately. Ensure you
reference the same probe variable/function used on line 40 (e.g.,
lifecycleCommands or hasLifecycleCommands()) and keep gatewayName and
runOpenshell unchanged.
In `@src/lib/onboard/initial-policy.test.ts`:
- Around line 239-254: The test for "merges openclaw-diagnostics-otel-local at
create time when OTEL is enabled" is flaky because it only toggles
NEMOCLAW_OPENCLAW_OTEL and doesn't control any OTEL endpoint env vars; ensure
hermetic behavior by saving and then clearing (or setting to a deterministic
value) NEMOCLAW_OPENCLAW_OTEL_ENDPOINT (and any other related OTEL endpoint vars
if present) before calling prepareInitialSandboxCreatePolicy, and restore the
originals in the finally block so the assertion about appliedPresets remains
deterministic.
In `@src/lib/onboard/openclaw-otel-policy-presets.ts`:
- Around line 19-24: requiredOpenclawOtelPolicyPresets currently always injects
OPENCLAW_OTEL_LOCAL_POLICY_PRESET when OTEL is enabled; change it to only add
the local preset when the configured OTLP/collector endpoint is the local
default (or not set). Update requiredOpenclawOtelPolicyPresets to read the OTLP
endpoint from the environment (the same env var your codebase uses for OTEL
endpoint), and only return [OPENCLAW_OTEL_LOCAL_POLICY_PRESET] when that value
is empty/undefined or explicitly points to the local host (e.g.,
host.openshell.internal:4318); otherwise return [] so non-local collectors
aren’t forced to use the local preset. Use or add a small helper like
isOpenclawOtelEndpointLocal(env) if one doesn’t exist to keep the check
readable.
In `@src/lib/onboard/policy-selection.ts`:
- Around line 168-170: The openclaw branch is still using process.env directly;
update the suggestion flow to use the injected env by threading deps.env into
the suggestion options and calling isOpenclawOtelEnabled(env) instead of the
global process.env. Locate the openclaw block in policy-selection.ts (the agent
=== "openclaw" branch), ensure the current SetupPresetSuggestionOptions receives
the env parameter from the caller (pass deps.env or the existing env variable)
and replace the isOpenclawOtelEnabled() call with isOpenclawOtelEnabled(env) so
OTEL suggestions reflect the merged/applied env. Ensure any types/signatures
that construct or accept SetupPresetSuggestionOptions are updated to include the
env argument.
---
Outside diff comments:
In `@test/policies.test.ts`:
- Around line 148-151: The test's hardcoded preset count is stale: update the
assertion in the test that calls policies.listPresets() to reflect the added
OTEL preset (change expected value from 21 to 22) or replace the fixed number
check with a more robust assertion (e.g., assert presets includes the new preset
name) to keep policies.test.ts (the it block invoking policies.listPresets())
consistent with the new preset added.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: be99779e-bfdd-45e1-b300-b5177629f060
📒 Files selected for processing (23)
.agents/skills/nemoclaw-user-reference/references/commands.mdDockerfiledocs/reference/commands.mdxnemoclaw-blueprint/policies/presets/openclaw-diagnostics-otel-local.yamlscripts/generate-openclaw-config.mtsscripts/openclaw-build-messaging-plugins.pysrc/lib/onboard.tssrc/lib/onboard/dockerfile-patch.test.tssrc/lib/onboard/dockerfile-patch.tssrc/lib/onboard/gateway-destroy.test.tssrc/lib/onboard/gateway-destroy.tssrc/lib/onboard/gateway-lifecycle.tssrc/lib/onboard/initial-policy.test.tssrc/lib/onboard/initial-policy.tssrc/lib/onboard/machine/handlers/policies.tssrc/lib/onboard/openclaw-otel-policy-presets.test.tssrc/lib/onboard/openclaw-otel-policy-presets.tssrc/lib/onboard/policy-presets.tssrc/lib/onboard/policy-selection.tstest/generate-openclaw-config.test.tstest/onboard-policy-suggestions.test.tstest/openclaw-build-messaging-plugins.test.tstest/policies.test.ts
Tighten OTEL Dockerfile propagation, policy preset selection, and gateway cleanup behavior so diagnostics setup remains opt-in and endpoint-aware.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard/policy-presets.ts`:
- Line 10: The branch that currently only checks for the literal "openclaw"
should use the same OpenClaw semantics as requiredOpenclawOtelPolicyPresets so
null/blank agents get the same suggestions; update the condition that adds
"openclaw-pricing" and OTEL suggestions to treat agent === null or agent.trim()
=== "" as OpenClaw (or call the existing helper/isOpenclaw check) so callers
relying on the default agent receive the OpenClaw suggestions consistent with
requiredOpenclawOtelPolicyPresets.
In `@src/lib/onboard/policy-selection.ts`:
- Around line 17-20: The suggestion branch currently checks for the literal
"openclaw" which diverges from the shared predicate; update the branch that
checks agent === "openclaw" to call the shared predicate
(requiredOpenclawOtelPolicyPresets(agent) or the common isOpenClaw/OpenClaw
predicate used elsewhere) so null/empty/whitespace agents are treated the same
and the OTEL preset and openclaw-pricing logic follow the later merge; make the
same replacement in the other occurrence (the branch around lines 172-177) so
both suggestion paths use the shared predicate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e95edbc0-245e-4f6b-9461-dd478f9cfd3d
📒 Files selected for processing (14)
Dockerfiledocs/reference/commands.mdxsrc/lib/onboard/dockerfile-patch.test.tssrc/lib/onboard/dockerfile-patch.tssrc/lib/onboard/gateway-destroy.test.tssrc/lib/onboard/gateway-destroy.tssrc/lib/onboard/initial-policy.test.tssrc/lib/onboard/initial-policy.tssrc/lib/onboard/openclaw-otel-policy-presets.test.tssrc/lib/onboard/openclaw-otel-policy-presets.tssrc/lib/onboard/policy-presets.tssrc/lib/onboard/policy-selection.tstest/onboard-policy-suggestions.test.tstest/policies.test.ts
🚧 Files skipped from review as they are similar to previous changes (8)
- src/lib/onboard/gateway-destroy.test.ts
- src/lib/onboard/dockerfile-patch.ts
- src/lib/onboard/initial-policy.ts
- src/lib/onboard/gateway-destroy.ts
- src/lib/onboard/initial-policy.test.ts
- src/lib/onboard/openclaw-otel-policy-presets.test.ts
- docs/reference/commands.mdx
- test/policies.test.ts
Use the shared OpenClaw agent predicate so default, blank, and explicit OpenClaw agents receive consistent policy suggestions. Signed-off-by: Angel Mata <amata@nvidia.com>
Account for default OpenClaw policy suggestions preserving openclaw-pricing during suggested re-onboard flows. Signed-off-by: Angel Mata <amata@nvidia.com>
|
✨ |
Summary
Closes feat(openclaw): add OTEL diagnostics setup for conversation latency traces #4368.
Test plan
Signed-off-by: Angel Mata amata@nvidia.com
Summary by CodeRabbit
New Features
Policy
Documentation
Tests