Skip to content

Commit 8ce92cc

Browse files
committed
feat(diagnostics): classify skill and tool usage
1 parent 7fc691a commit 8ce92cc

20 files changed

Lines changed: 657 additions & 19 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Docs: https://docs.openclaw.ai
66

77
### Changes
88

9+
- Diagnostics: export bounded skill usage metrics/spans and tool source/owner labels for core, plugin, MCP, and channel tool execution without exposing raw paths or session identifiers. (#80370) Thanks @gauravprasadgp.
910
- Agents/subagents: limit default sub-agent bootstrap context to `AGENTS.md` and `TOOLS.md`, keeping persona, identity, user, memory, heartbeat, and setup files out of delegated workers by default. (#85283) Thanks @100yenadmin.
1011
- Maintainer skills: exclude plugin SDK/API boundary work from `openclaw-landable-bug-sweep` so bugbash sweeps stay focused on small paper-cut fixes.
1112
- Plugin SDK: add a generic channel-message poll sender so channel plugins can expose poll delivery without depending on channel-specific SDK facades.

docs/gateway/opentelemetry.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,8 @@ openclaw plugins enable diagnostics-otel
7272

7373
| Signal | What goes in it |
7474
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
75-
| **Metrics** | Counters and histograms for token usage, cost, run duration, message flow, Talk events, queue lanes, session state/recovery, exec, and memory pressure. |
76-
| **Traces** | Spans for model usage, model calls, harness lifecycle, tool execution, exec, webhook/message processing, context assembly, and tool loops. |
75+
| **Metrics** | Counters and histograms for token usage, cost, run duration, skill usage, message flow, Talk events, queue lanes, session state/recovery, tool execution, exec, and memory pressure. |
76+
| **Traces** | Spans for model usage, model calls, harness lifecycle, skill usage, tool execution, exec, webhook/message processing, context assembly, and tool loops. |
7777
| **Logs** | Structured `logging.file` records exported over OTLP when `diagnostics.otel.logs` is enabled. |
7878

7979
Toggle `traces`, `metrics`, and `logs` independently. All three default to on
@@ -126,9 +126,9 @@ when `diagnostics.otel.enabled` is true.
126126
## Privacy and content capture
127127

128128
Raw model/tool content is **not** exported by default. Spans carry bounded
129-
identifiers (channel, provider, model, error category, hash-only request ids)
130-
and never include prompt text, response text, tool inputs, tool outputs, or
131-
session keys.
129+
identifiers (channel, provider, model, error category, hash-only request ids,
130+
tool source, tool owner, and skill name/source) and never include prompt text,
131+
response text, tool inputs, tool outputs, skill file paths, or session keys.
132132
Talk metrics export only bounded event metadata such as mode, transport,
133133
provider, and event type. They do not include transcripts, audio payloads,
134134
session ids, turn ids, call ids, room ids, or handoff tokens.
@@ -182,6 +182,7 @@ When any subkey is enabled, model and tool spans get bounded, redacted
182182
- `openclaw.model_call.request_bytes` (histogram, UTF-8 byte size of the final model request payload; no raw payload content)
183183
- `openclaw.model_call.response_bytes` (histogram, UTF-8 byte size of streamed model response events; no raw response content)
184184
- `openclaw.model_call.time_to_first_byte_ms` (histogram, elapsed time before the first streamed response event)
185+
- `openclaw.skill.used` (counter, attrs: `openclaw.skill.name`, `openclaw.skill.source`, `openclaw.skill.activation`, optional `openclaw.agent`, optional `openclaw.toolName`)
185186

186187
### Message flow
187188

docs/gateway/prometheus.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,9 @@ For traces, logs, OTLP push, and OpenTelemetry GenAI semantic attributes, see [O
9696
| `openclaw_model_tokens_total` | counter | `agent`, `channel`, `model`, `provider`, `token_type` |
9797
| `openclaw_gen_ai_client_token_usage` | histogram | `model`, `provider`, `token_type` |
9898
| `openclaw_model_cost_usd_total` | counter | `agent`, `channel`, `model`, `provider` |
99-
| `openclaw_tool_execution_total` | counter | `error_category`, `outcome`, `params_kind`, `tool` |
100-
| `openclaw_tool_execution_duration_seconds` | histogram | `error_category`, `outcome`, `params_kind`, `tool` |
99+
| `openclaw_skill_used_total` | counter | `activation`, `agent`, `skill`, `source` |
100+
| `openclaw_tool_execution_total` | counter | `error_category`, `outcome`, `params_kind`, `tool`, `tool_owner`, `tool_source` |
101+
| `openclaw_tool_execution_duration_seconds` | histogram | `error_category`, `outcome`, `params_kind`, `tool`, `tool_owner`, `tool_source` |
101102
| `openclaw_harness_run_total` | counter | `channel`, `error_category`, `harness`, `model`, `outcome`, `phase`, `plugin`, `provider` |
102103
| `openclaw_harness_run_duration_seconds` | histogram | `channel`, `error_category`, `harness`, `model`, `outcome`, `phase`, `plugin`, `provider` |
103104
| `openclaw_message_received_total` | counter | `channel`, `source` |
@@ -172,6 +173,9 @@ histogram_quantile(
172173
sum by (le, lane) (rate(openclaw_queue_lane_wait_seconds_bucket[5m]))
173174
) < 2
174175
176+
# Skill usage, split by bounded source
177+
sum by (skill, source) (increase(openclaw_skill_used_total[24h]))
178+
175179
# Dropped Prometheus series (cardinality alarm)
176180
increase(openclaw_prometheus_series_dropped_total[15m]) > 0
177181
```

extensions/diagnostics-otel/src/service.test.ts

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1453,6 +1453,49 @@ describe("diagnostics-otel service", () => {
14531453
await service.stop?.(ctx);
14541454
});
14551455

1456+
test("exports skill usage counter and span without raw identifiers", async () => {
1457+
const service = createDiagnosticsOtelService();
1458+
const ctx = createOtelContext(OTEL_TEST_ENDPOINT, { traces: true, metrics: true });
1459+
await service.start(ctx);
1460+
1461+
emitTrustedDiagnosticEvent({
1462+
type: "skill.used",
1463+
agentId: "main",
1464+
runId: "run-should-not-export",
1465+
sessionKey: "session-should-not-export",
1466+
skillName: "tiny-llm-brainstorm",
1467+
skillSource: "workspace",
1468+
activation: "read",
1469+
toolName: "read",
1470+
trace: {
1471+
traceId: TRACE_ID,
1472+
spanId: TOOL_SPAN_ID,
1473+
parentSpanId: CHILD_SPAN_ID,
1474+
traceFlags: "01",
1475+
},
1476+
});
1477+
await flushDiagnosticEvents();
1478+
1479+
const expectedAttrs = {
1480+
"openclaw.agent": "main",
1481+
"openclaw.skill.activation": "read",
1482+
"openclaw.skill.name": "tiny-llm-brainstorm",
1483+
"openclaw.skill.source": "workspace",
1484+
"openclaw.toolName": "read",
1485+
};
1486+
expect(telemetryState.counters.get("openclaw.skill.used")?.add).toHaveBeenCalledWith(
1487+
1,
1488+
expectedAttrs,
1489+
);
1490+
const skillSpanCall = telemetryState.tracer.startSpan.mock.calls.find(
1491+
(call) => call[0] === "openclaw.skill.used",
1492+
);
1493+
expect(skillSpanCall?.[1]).toMatchObject({ attributes: expectedAttrs });
1494+
expect(JSON.stringify(skillSpanCall)).not.toContain("run-should-not-export");
1495+
expect(JSON.stringify(skillSpanCall)).not.toContain("session-should-not-export");
1496+
await service.stop?.(ctx);
1497+
});
1498+
14561499
test("exports run, model call, and tool execution lifecycle spans", async () => {
14571500
const service = createDiagnosticsOtelService();
14581501
const ctx = createOtelContext(OTEL_TEST_ENDPOINT, { traces: true, metrics: true });
@@ -1587,6 +1630,7 @@ describe("diagnostics-otel service", () => {
15871630
const toolCall = startedSpanCall("openclaw.tool.execution");
15881631
const toolOptions = toolCall?.[1];
15891632
expect(toolOptions?.attributes?.["openclaw.toolName"]).toBe("read");
1633+
expect(toolOptions?.attributes?.["openclaw.tool.source"]).toBe("core");
15901634
expect(toolOptions?.attributes?.["openclaw.errorCategory"]).toBe("TypeError");
15911635
expect(toolOptions?.attributes?.["openclaw.errorCode"]).toBe("429");
15921636
expect(toolOptions?.attributes?.["openclaw.tool.params.kind"]).toBe("object");
@@ -1629,6 +1673,7 @@ describe("diagnostics-otel service", () => {
16291673
expect(Object.hasOwn(harnessDuration?.[1] ?? {}, "openclaw.sessionKey")).toBe(false);
16301674
const toolDuration = lastHistogramRecord("openclaw.tool.execution.duration_ms");
16311675
expect(toolDuration?.[0]).toBe(20);
1676+
expect(toolDuration?.[1]?.["openclaw.tool.source"]).toBe("core");
16321677
expect(Object.hasOwn(toolDuration?.[1] ?? {}, "openclaw.errorCode")).toBe(false);
16331678
expect(Object.hasOwn(toolDuration?.[1] ?? {}, "openclaw.runId")).toBe(false);
16341679

extensions/diagnostics-otel/src/service.ts

Lines changed: 43 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -972,6 +972,10 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
972972
unit: "1",
973973
description: "Detected repetitive tool-call loop events",
974974
});
975+
const skillUsedCounter = meter.createCounter("openclaw.skill.used", {
976+
unit: "1",
977+
description: "Skills used by agent runs",
978+
});
975979
const modelCallDurationHistogram = meter.createHistogram("openclaw.model_call.duration_ms", {
976980
unit: "ms",
977981
description: "Model call duration",
@@ -2234,10 +2238,44 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
22342238
>,
22352239
): Record<string, string | number | boolean> => ({
22362240
"openclaw.toolName": evt.toolName,
2241+
"openclaw.tool.source": lowCardinalityAttr(evt.toolSource, "core"),
22372242
"gen_ai.tool.name": evt.toolName,
2243+
...(evt.toolOwner ? { "openclaw.tool.owner": lowCardinalityAttr(evt.toolOwner) } : {}),
22382244
...paramsSummaryAttrs(evt.paramsSummary),
22392245
});
22402246

2247+
const skillUsedAttrs = (
2248+
evt: Extract<DiagnosticEventPayload, { type: "skill.used" }>,
2249+
): Record<string, string | number | boolean> => ({
2250+
"openclaw.skill.name": lowCardinalityAttr(evt.skillName, "skill"),
2251+
"openclaw.skill.source": lowCardinalityAttr(evt.skillSource),
2252+
"openclaw.skill.activation": lowCardinalityAttr(evt.activation),
2253+
...(evt.agentId ? { "openclaw.agent": lowCardinalityAttr(evt.agentId) } : {}),
2254+
...(evt.toolName ? { "openclaw.toolName": lowCardinalityAttr(evt.toolName, "tool") } : {}),
2255+
});
2256+
2257+
const recordSkillUsed = (
2258+
evt: Extract<DiagnosticEventPayload, { type: "skill.used" }>,
2259+
metadata: DiagnosticEventMetadata,
2260+
) => {
2261+
if (!metadata.trusted) {
2262+
return;
2263+
}
2264+
const attrs = skillUsedAttrs(evt);
2265+
skillUsedCounter.add(1, attrs);
2266+
if (!tracesEnabled) {
2267+
return;
2268+
}
2269+
const spanAttrs: Record<string, string | number | boolean> = { ...attrs };
2270+
addRunAttrs(spanAttrs, evt);
2271+
const span = spanWithDuration("openclaw.skill.used", spanAttrs, 0, {
2272+
parentContext: activeTrustedParentContext(evt, metadata),
2273+
endTimeMs: evt.ts,
2274+
});
2275+
setSpanAttrs(span, spanAttrs);
2276+
span.end(evt.ts);
2277+
};
2278+
22412279
const recordToolExecutionStarted = (
22422280
evt: Extract<DiagnosticEventPayload, { type: "tool.execution.started" }>,
22432281
metadata: DiagnosticEventMetadata,
@@ -2259,10 +2297,7 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
22592297
evt: Extract<DiagnosticEventPayload, { type: "tool.execution.completed" }>,
22602298
metadata: DiagnosticEventMetadata,
22612299
) => {
2262-
const attrs = {
2263-
"openclaw.toolName": evt.toolName,
2264-
...paramsSummaryAttrs(evt.paramsSummary),
2265-
};
2300+
const attrs = toolExecutionBaseAttrs(evt);
22662301
toolExecutionDurationHistogram.record(evt.durationMs, attrs);
22672302
if (!tracesEnabled) {
22682303
return;
@@ -2291,9 +2326,8 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
22912326
metadata: DiagnosticEventMetadata,
22922327
) => {
22932328
const attrs = {
2294-
"openclaw.toolName": evt.toolName,
2329+
...toolExecutionBaseAttrs(evt),
22952330
"openclaw.errorCategory": lowCardinalityAttr(evt.errorCategory, "other"),
2296-
...paramsSummaryAttrs(evt.paramsSummary),
22972331
};
22982332
toolExecutionDurationHistogram.record(evt.durationMs, attrs);
22992333
if (!tracesEnabled) {
@@ -2629,6 +2663,9 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
26292663
case "tool.execution.blocked":
26302664
recordToolExecutionBlocked(evt, metadata);
26312665
return;
2666+
case "skill.used":
2667+
recordSkillUsed(evt, metadata);
2668+
return;
26322669
case "exec.process.completed":
26332670
recordExecProcessCompleted(evt);
26342671
return;

extensions/diagnostics-prometheus/src/service.test.ts

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,12 +144,42 @@ describe("diagnostics-prometheus service", () => {
144144
const rendered = testApi.renderPrometheusMetrics(store);
145145

146146
expect(rendered).toContain(
147-
'openclaw_tool_execution_total{error_category="other",outcome="error",params_kind="unknown",tool="tool"} 1',
147+
'openclaw_tool_execution_total{error_category="other",outcome="error",params_kind="unknown",tool="tool",tool_owner="none",tool_source="core"} 1',
148148
);
149149
expect(rendered).not.toContain("Bearer");
150150
expect(rendered).not.toContain("sk-secret");
151151
});
152152

153+
it("records skill usage metrics without raw paths or session identifiers", () => {
154+
const store = testApi.createPrometheusMetricStore();
155+
156+
testApi.recordDiagnosticEvent(
157+
store,
158+
{
159+
...baseEvent(),
160+
type: "skill.used",
161+
agentId: "main",
162+
runId: "run-should-not-export",
163+
sessionKey: "session-should-not-export",
164+
skillName: "tiny-llm-brainstorm",
165+
skillSource: "workspace",
166+
activation: "read",
167+
toolName: "read",
168+
},
169+
trusted,
170+
);
171+
172+
const rendered = testApi.renderPrometheusMetrics(store);
173+
174+
expect(rendered).toContain("# TYPE openclaw_skill_used_total counter");
175+
expect(rendered).toContain(
176+
'openclaw_skill_used_total{activation="read",agent="main",skill="tiny-llm-brainstorm",source="workspace"} 1',
177+
);
178+
expect(rendered).not.toContain("run-should-not-export");
179+
expect(rendered).not.toContain("session-should-not-export");
180+
expect(rendered).not.toContain("SKILL.md");
181+
});
182+
153183
it("bounds messaging labels without exporting raw chat identifiers", () => {
154184
const store = testApi.createPrometheusMetricStore();
155185

extensions/diagnostics-prometheus/src/service.ts

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,8 @@ function toolExecutionLabels(evt: {
316316
errorCategory?: string;
317317
paramsSummary?: { kind: string };
318318
toolName: string;
319+
toolOwner?: string;
320+
toolSource?: string;
319321
type: string;
320322
}): LabelSet {
321323
return {
@@ -326,6 +328,22 @@ function toolExecutionLabels(evt: {
326328
outcome: evt.type === "tool.execution.error" ? "error" : "completed",
327329
params_kind: lowCardinalityLabel(evt.paramsSummary?.kind),
328330
tool: lowCardinalityLabel(evt.toolName, "tool"),
331+
tool_owner: lowCardinalityLabel(evt.toolOwner, "none"),
332+
tool_source: lowCardinalityLabel(evt.toolSource, "core"),
333+
};
334+
}
335+
336+
function skillLabels(evt: {
337+
activation: string;
338+
agentId?: string;
339+
skillName: string;
340+
skillSource?: string;
341+
}): LabelSet {
342+
return {
343+
activation: lowCardinalityLabel(evt.activation, "unknown"),
344+
agent: lowCardinalityLabel(evt.agentId),
345+
skill: lowCardinalityLabel(evt.skillName, "skill"),
346+
source: lowCardinalityLabel(evt.skillSource),
329347
};
330348
}
331349

@@ -497,6 +515,9 @@ function recordDiagnosticEvent(
497515
toolExecutionLabels(evt),
498516
);
499517
return;
518+
case "skill.used":
519+
store.counter("openclaw_skill_used_total", "Skills used by agent runs.", skillLabels(evt));
520+
return;
500521
case "harness.run.completed":
501522
case "harness.run.error":
502523
store.histogram(

0 commit comments

Comments
 (0)