Skip to content

feat(telemetry): add daemon OTel metrics and structured log records#4749

Merged
doudouOUC merged 11 commits into
daemon_mode_b_mainfrom
worktree-wiggly-meandering-lightning
Jun 5, 2026
Merged

feat(telemetry): add daemon OTel metrics and structured log records#4749
doudouOUC merged 11 commits into
daemon_mode_b_mainfrom
worktree-wiggly-meandering-lightning

Conversation

@doudouOUC

Copy link
Copy Markdown
Collaborator

Summary

  • Adds 11 OTel metric instruments to the daemon serve path covering HTTP request rate/latency, session/channel lifecycle, prompt queue wait & duration, bridge errors, cancellation, and active SSE connections
  • Uses ObservableGauge (not UpDownCounter) for session/SSE/heap gauges to prevent +1/-1 drift across complex lifecycle paths
  • Extends BridgeTelemetry interface with optional metrics sub-object for bridge decoupling
  • Generalizes emitDaemonLog to support custom event names and severity levels for structured lifecycle log records
  • Adds service.instance.id to OTel Resource for process incarnation detection
  • Adds forceFlushMetrics() for pre-shutdown metric snapshot

Closes #4554 §6.

Design Highlights

Metric Type Attributes
daemon.http.request.count Counter route, status_class
daemon.http.request.duration Histogram route
daemon.session.active ObservableGauge
daemon.session.lifecycle Counter action
daemon.channel.lifecycle Counter action, expected
daemon.prompt.queue_wait Histogram
daemon.prompt.duration Histogram
daemon.bridge.error.count Counter error_type
daemon.cancel.count Counter
daemon.sse.active ObservableGauge
daemon.process.heap_used ObservableGauge

Max time-series cardinality: ~200 (all attributes bounded by design).

Test plan

  • 17 unit tests in daemon-metrics.test.ts covering initialization, recording, gauge callbacks, error normalization, and no-op before init
  • TypeScript type-check passes for all 3 packages (core, acp-bridge, cli)
  • Pre-commit hooks (prettier + eslint) pass
  • Manual verification with OTLP collector

🤖 Generated with Qwen Code

Copilot AI review requested due to automatic review settings June 3, 2026 18:00
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR adds comprehensive OpenTelemetry metric instrumentation to the daemon serve path, implementing 11 metric instruments covering HTTP requests, session/channel lifecycle, prompt queue wait/duration, bridge errors, cancellations, and active SSE connections. The implementation is well-structured with proper separation of concerns between core metrics definitions and bridge integration points. Overall, this is a solid observability enhancement that follows OTel best practices.

🔍 General Feedback

  • Strong design decisions: Using ObservableGauge instead of UpDownCounter for session/SSE/heap gauges is the right call to prevent +1/-1 drift across complex lifecycle paths — this shows good understanding of OTel semantics
  • Good cardinality control: The ~200 max time-series cardinality estimate is well-reasoned with bounded attributes
  • Comprehensive test coverage: 17 unit tests covering initialization, recording, gauge callbacks, error normalization, and no-op behavior before init
  • Clean interface design: The BridgeTelemetry interface extension with optional metrics sub-object maintains proper decoupling
  • Defensive programming: Good use of initialization guards and no-op behavior before metrics are initialized

🎯 Specific Feedback

🟡 High

  • File: packages/core/src/telemetry/daemon-metrics.ts:58 - The normalizeErrorType function returns typeof err for non-Error objects, which will produce strings like "string" or "number". Consider returning "unknown" consistently for non-Error throws instead of the type name, as the type information isn't useful for monitoring.

    function normalizeErrorType(err: unknown): string {
      const name = err instanceof Error ? err.name : typeof err;
      return KNOWN_ERROR_TYPES.has(name) ? name : 'unknown';
    }

    Suggestion: Change to return err instanceof Error && KNOWN_ERROR_TYPES.has(name) ? name : 'unknown';

  • File: packages/core/src/telemetry/daemon-metrics.ts - The registerDaemonGaugeCallbacks function silently swallows all exceptions in gauge callbacks with empty catch {} blocks. While this prevents telemetry from affecting daemon behavior, consider adding debug logging so operators can diagnose why gauges might show stale/zero values.

    .addCallback((result) => {
      try {
        result.observe(callbacks.sessionCount());
      } catch {
        /* no-op */
      }
    });

    Suggestion: Add createDebugLogger('OTEL').debug('Gauge callback failed', err); inside catch blocks

🟢 Medium

  • File: packages/core/src/telemetry/daemon-metrics.ts:77-82 - The histogram bucket boundaries for daemon.http.request.duration jump from 30s to no upper bound. Consider adding a 60s bucket to catch pathological slow requests that don't hit timeout.

    explicitBucketBoundaries: [
      1, 2, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 30000,
    ],

    Suggestion: Add 60000 to the array for better tail latency visibility

  • File: packages/core/src/telemetry/daemon-metrics.ts:102-107 - The daemon.prompt.queue_wait histogram buckets may miss sub-millisecond queue waits common in low-load scenarios. Consider adding a 0.5 or 0.1 bucket.

    explicitBucketBoundaries: [
      1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 30000, 60000,
    ],

    Suggestion: Add 0.5 as the first bucket for sub-millisecond precision

  • File: packages/acp-bridge/src/bridge.ts:2272-2406 - The promptQueueWait and promptDuration metrics are recorded inside the then callback of entry.promptQueue.then(). If the promise rejects before dispatch, neither metric fires. Consider adding a catch block to record queue wait even on failures.
    Suggestion: Add .catch((err) => { telemetry.metrics?.promptQueueWait(Date.now() - queuedAt); throw err; }) to capture queue wait on early failures

🔵 Low

  • File: packages/core/src/telemetry/daemon-metrics.ts:1 - Copyright year is 2026, which is inconsistent with other files in the repo (2025). While technically forward-looking, this may cause confusion.
    Suggestion: Use 2025 to match existing files, or verify this is intentional

  • File: packages/core/src/telemetry/daemon-metrics.test.ts:60-66 - The mock setup for ValueType uses magic numbers (INT: 1, DOUBLE: 2). Consider importing the actual enum values from @opentelemetry/api for better maintainability.

    vi.mock('@opentelemetry/api', () => ({
      metrics: { getMeter: vi.fn().mockReturnValue(mockMeterInstance) },
      ValueType: { INT: 1, DOUBLE: 2 },
    }));

    Suggestion: Use ValueType: { INT: ValueType.INT, DOUBLE: ValueType.DOUBLE } if possible, or add a comment explaining the enum values

  • File: packages/core/src/telemetry/daemon-tracing.ts:200-214 - The emitDaemonLog function extension adds optional eventName and severityNumber parameters, but the default behavior still uses EVENT_DAEMON_ERROR for all logs. Consider whether structured lifecycle logs should use different event names by default.
    Suggestion: Add a comment or example in JSDoc showing when callers should override the event name

  • File: packages/core/src/telemetry/index.ts:185-196 - The exports are well-organized, but consider grouping related exports (e.g., all daemon metrics functions together with a comment separator) for better discoverability as the telemetry module grows.

✅ Highlights

  • Excellent metric naming conventions: All metric names follow the qwen-code.daemon.<domain>.<metric> pattern consistently
  • Thoughtful gauge design: Using ObservableGauge with callback-based observation prevents drift issues that plague +1/-1 counter patterns
  • Comprehensive error type normalization: The KNOWN_ERROR_TYPES set provides good coverage of bridge-specific errors while gracefully handling unknowns
  • Clean integration pattern: The optional metrics interface on BridgeTelemetry allows gradual adoption and easy testing with no-op implementations
  • Good test structure: Tests verify both happy paths and edge cases (non-Error throws, before-initialization behavior, gauge callback exceptions)
  • Proper initialization guards: All recording functions check initialized flag before accessing instrument references, preventing crashes during boot

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenTelemetry metrics and structured daemon lifecycle log records for qwen serve’s daemon path, improving operational visibility into HTTP traffic, session/channel lifecycle, prompt timing, cancellations, bridge errors, active SSE connections, and daemon heap usage.

Changes:

  • Adds daemon-specific metric instruments (counters, histograms, observable gauges) plus unit tests for recording/gauge behavior.
  • Extends daemon bridge telemetry to optionally emit metric events and richer structured log records (custom event names + severity).
  • Adds a metric pre-shutdown flush helper and emits service.instance.id as a resource attribute for daemon process incarnation.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/core/src/telemetry/sdk.ts Tracks the active metric reader and adds forceFlushMetrics() to flush metric exports on demand.
packages/core/src/telemetry/index.ts Re-exports daemon metrics APIs and forceFlushMetrics from core telemetry.
packages/core/src/telemetry/daemon-tracing.ts Extends daemon log emission options and allows daemon bridge telemetry to carry optional metrics hooks.
packages/core/src/telemetry/daemon-metrics.ts Introduces daemon metrics instruments and recording helpers (HTTP, lifecycle, prompt timing, errors, cancel, gauges).
packages/core/src/telemetry/daemon-metrics.test.ts Adds unit tests validating daemon metrics initialization, recording, normalization, and gauge callbacks.
packages/cli/src/serve/server.ts Records daemon HTTP metrics and tracks active SSE connection count; records bridge error metrics on sendBridgeError.
packages/cli/src/serve/runQwenServe.ts Initializes daemon telemetry+metrics, registers gauge callbacks, emits structured lifecycle logs, and flushes metrics pre-shutdown.
packages/acp-bridge/src/bridgeOptions.ts Extends the bridge telemetry contract with an optional metrics interface for decoupled metric emission.
packages/acp-bridge/src/bridge.ts Calls the optional bridge telemetry metrics hooks for channel/session lifecycle, queue wait, duration, and cancellation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/cli/src/serve/runQwenServe.ts
Comment thread packages/cli/src/serve/runQwenServe.ts Outdated
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review fix summary (233ca93)

Comment Verdict Action
Copilot: service.instance.id unconditional overwrite Agree Fixed — now a fallback default, operator values take precedence
Copilot: expected attribute in spawn log Agree Fixed — only emitted for exit events

🤖 Generated with Qwen Code

@doudouOUC doudouOUC requested review from chiga0 and wenshao June 4, 2026 01:51

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] packages/acp-bridge/src/bridge.ts:4557sessionLifecycle('die') not recorded during daemon shutdown

The shutdown loop at line 4557 publishes session_died events for each entry but does not call telemetry.metrics?.sessionLifecycle('die'). The channel-exit handler at line 1163 normally records this metric, but it's bypassed during shutdown because byId.clear() runs first (line 4549), making byId.get(...) return undefined.

Sessions alive at daemon shutdown time are never counted as "died", causing the counter-derived net (spawns − closes − dies) to permanently drift per restart cycle.

Fix: add telemetry.metrics?.sessionLifecycle('die') in the for (const e of entries) loop:

for (const e of entries) {
  telemetry.metrics?.sessionLifecycle('die');
  try {
    e.events.publish({
      type: 'session_died',
      data: { sessionId: e.sessionId, reason: 'daemon_shutdown' },
    });
  } catch {
    /* bus already closed */
  }
  e.events.close();
}

(Posted as a body-level comment because this line is not in the PR diff.)

— qwen3.7-max via Qwen Code /review

Comment thread packages/acp-bridge/src/bridgeOptions.ts Outdated
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review fix summary (e8b3949)

Comment Author Verdict Action
Duplicate interface BridgeTelemetryMetrics vs DaemonBridgeTelemetryMetrics wenshao Agree Fixed — BridgeTelemetryMetrics is now a type alias re-exporting from core
sessionLifecycle('die') missing in shutdown loop wenshao Agree Fixed — added in the for (const e of entries) loop in bridge.shutdown()

🤖 Generated with Qwen Code

@doudouOUC doudouOUC requested a review from wenshao June 4, 2026 08:09
Comment thread packages/cli/src/serve/runQwenServe.ts Outdated
Comment thread packages/core/src/telemetry/daemon-metrics.ts
Comment thread packages/cli/src/serve/server.ts
Comment thread packages/core/src/telemetry/daemon-metrics.ts
Comment thread packages/core/src/telemetry/daemon-metrics.ts
Comment thread packages/core/src/telemetry/daemon-metrics.ts
Comment thread packages/core/src/telemetry/daemon-tracing.ts
Comment thread packages/cli/src/serve/runQwenServe.ts
Comment thread packages/core/src/telemetry/sdk.ts
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review fix summary (49c64f3)

# Comment Author Verdict Action
1 [Critical] forceFlushMetrics fire-and-forget race wenshao Agree Fixed — voidawait
2 registerDaemonGaugeCallbacks no idempotency guard wenshao Agree Fixed — added gaugesRegistered flag
3 activeSseCount no double-fire guard wenshao Agree Fixed — added sseCounted flag
4 KNOWN_ERROR_TYPES no compile-time coupling wenshao Defer Follow-up PR (test infra)
5 Gauge callbacks silent swallow wenshao Defer Follow-up PR (diag logging)
6 Non-null assertions fragile wenshao Agree Fixed — !?. on all recording functions
7 emitDaemonLog options no test coverage wenshao Defer Follow-up PR
8 expected ?? true contradicts severity logic wenshao Agree Fixed — !expectedexpected === false
9 forceFlushMetrics no test coverage wenshao Defer Follow-up PR

5 fixed, 4 deferred to follow-up.

🤖 Generated with Qwen Code

Comment thread packages/cli/src/serve/runQwenServe.ts Outdated
Comment thread packages/core/src/telemetry/sdk.ts Outdated
Comment thread packages/acp-bridge/src/bridge.ts
Comment thread packages/core/src/telemetry/daemon-metrics.test.ts
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review fix summary R2 (77d3500)

# Comment Author Verdict Action
1 [Critical] TS1308: await in non-async executor wenshao Agree Fixed — restructured to .then() chain
2 forceFlushMetrics no timeout wenshao Agree Fixed — 5s timeout via Promise.race
3 promptDuration no status attribute wenshao Defer Scope expansion (interface change), follow-up
4 registerDaemonGaugeCallbacks no idempotency test wenshao Agree Fixed — added test

3 fixed, 1 deferred.

🤖 Generated with Qwen Code

Comment thread packages/cli/src/serve/runQwenServe.ts Outdated
Comment thread packages/cli/src/serve/server.ts
Comment thread packages/core/src/telemetry/daemon-metrics.ts Outdated
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Remaining threads resolved (3eae310)

Comment Author Verdict Action
die severity always ERROR on shutdown wenshao Agree Fixed — removed ERROR severity from die; unexpected exits covered by channelLifecycle WARN
recordDaemonBridgeError only reaches unknown wenshao Push back By design — error counter is for unclassified 500s; known 4xx use HTTP status_class
gaugesRegistered set before gauge creation wenshao Agree Fixed — moved to end of function

2 fixed, 1 pushed back. All threads resolved (0 remaining).

🤖 Generated with Qwen Code

@doudouOUC doudouOUC requested a review from wenshao June 4, 2026 14:18
Comment thread packages/core/src/telemetry/sdk.ts
Comment thread packages/cli/src/serve/runQwenServe.ts Outdated
Comment thread packages/acp-bridge/src/bridge.ts
Comment thread packages/cli/src/serve/server.ts
@doudouOUC doudouOUC requested a review from wenshao June 4, 2026 15:26
@doudouOUC doudouOUC force-pushed the worktree-wiggly-meandering-lightning branch from 3eae310 to 0062df9 Compare June 5, 2026 06:10
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review Round 4 — wenshao suggestions (4 threads)

# File Issue Action Commit
1 sdk.ts:657 flush promise unhandled rejection in forceFlushMetrics race Fixed — added flush.catch(() => {}) 0062df985
2 runQwenServe.ts:754 Log body ${expected} vs attribute expected ?? true inconsistency Fixed — changed to ${expected ?? true} 0062df985
3 bridge.ts:1231 channelLifecycle('exit') fires without matching spawn on handshake failure Fixed — added handshakeComplete flag, exit metric guarded 0062df985
4 server.ts:424 Route resolver doesn't cover heartbeat/GET routes Pushed back — intentional cardinality control; heartbeat is high-frequency noise, GET routes are read-only/low-latency

🤖 Generated with Qwen Code

@wenshao

wenshao commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

PR #4749 Local Verification Report

Branch: worktree-wiggly-meandering-lightning | Commits: 9 (456166c20062df98) | Base: daemon_mode_b_main

Changes Summary

  • packages/core/src/telemetry/daemon-metrics.ts (+236, new) — 11 OTel metric instruments: counters, histograms, ObservableGauges
  • packages/core/src/telemetry/daemon-metrics.test.ts (+357, new) — 18 unit tests covering init, recording, gauges, error normalization, idempotency
  • packages/core/src/telemetry/sdk.ts (+29) — forceFlushMetrics() with 5s timeout + service.instance.id resource attribute
  • packages/core/src/telemetry/daemon-tracing.ts (+14 −1) — emitDaemonLog generalized with custom event name/severity
  • packages/core/src/telemetry/index.ts (+14) — Export new metric functions and types
  • packages/cli/src/serve/runQwenServe.ts (+128 −52) — Wire metrics into daemon lifecycle (session/channel/prompt/cancel)
  • packages/cli/src/serve/server.ts (+32 −5) — HTTP request count/latency + bridge error recording
  • packages/acp-bridge/src/bridge.ts (+26 −7) — Channel lifecycle metrics + session die on shutdown
  • packages/acp-bridge/src/bridgeOptions.ts (+7 −1) — BridgeTelemetryMetrics type via re-export from core

Test Results

Check Result Details
Unit Tests ✅ PASS 18/18 passed (daemon-metrics.test.ts, --no-cache)
ESLint ✅ PASS 0 warnings, 0 errors (all 8 changed source files)
TypeCheck (core) ✅ PASS 0 errors
TypeCheck (acp-bridge) 🟡 +1 expected 3 errors (base has 2). +1 is DaemonBridgeTelemetryMetrics not in built dist/ — resolves after npm run build --workspace=packages/core
TypeCheck (cli) 🟡 +19 expected 112 errors (base has 93). +16 are new metric function imports not in built core dist/ (same build-order issue). +3 are from merged PR #4751 (preheat) + implicit any params. All resolve after rebuilding core.
Whitespace ✅ PASS git diff --check clean

TypeCheck Analysis

The new typecheck errors fall into two categories:

  1. Build-order dependency (17 errors): New exports (initializeDaemonMetrics, recordDaemon*, emitDaemonLog, forceFlushMetrics, DaemonBridgeTelemetryMetrics) exist in core source and pass core's own typecheck, but cli/bridge resolve against stale dist/. Standard for cross-package changes — resolves after npm run build --workspace=packages/core.
  2. Cross-PR (2 errors): preheat from merged feat(daemon): optimize ACP child lifecycle — skip relaunch, preheat, idle keep-alive #4751, implicit any flushErr param — both pre-existing on current daemon_mode_b_main HEAD.

Verdict

Ready to merge. All 18 unit tests pass with thorough coverage (init, recording, gauge callbacks, error normalization, idempotency, no-op before init). Core typecheck clean. Lint clean. The cross-package typecheck errors are expected build-order artifacts that resolve after rebuilding core — no actual type mismatches.


Verified by wenshao

Comment thread packages/cli/src/serve/runQwenServe.ts
wenshao
wenshao previously approved these changes Jun 5, 2026
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review Round 5 — wenshao post-approval suggestion (1 thread)

# File Issue Action Commit
1 runQwenServe.ts:1209 / sdk.ts forceFlushMetrics 5s timeout inflates worst-case shutdown budget past K8s 30s Fixed — reduced to 2s d8cc96844

wenshao APPROVED the PR. All threads resolved (0 unresolved).

🤖 Generated with Qwen Code

wenshao
wenshao previously approved these changes Jun 5, 2026
wenshao
wenshao previously approved these changes Jun 5, 2026
export async function forceFlushMetrics(): Promise<void> {
if (!telemetryInitialized || !activeMetricReader) return;
const flush = activeMetricReader.forceFlush();
flush.catch(() => {});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] flush.catch(() => {}) creates a detached promise chain that silently discards the actual error from activeMetricReader.forceFlush(). The original flush promise is still raced in Promise.race, but the no-op catch means when the flush rejects (collector down, TLS error, network partition), the real error is lost. The caller in runQwenServe.ts:1209 only sees either the generic timeout message "forceFlushMetrics timed out after 2000ms" or silent resolution — no diagnostic about why the flush failed.

At 3 AM when the OTLP collector is unreachable, the operator has no metric flush diagnostics to distinguish collector outage from DNS failure from exporter misconfiguration.

Suggested change
flush.catch(() => {});
flush.catch((err) => {
debugLogger.debug?.('[metrics] flush error:', err instanceof Error ? err.message : String(err));
});

— qwen3.7-max via Qwen Code /review

doudouOUC added 11 commits June 5, 2026 15:38
Adds 11 OTel metric instruments to the daemon serve path, covering:
- HTTP request count/latency by route and status class
- Session lifecycle (spawn/close/die) counter
- Channel lifecycle (spawn/exit) counter
- Prompt queue wait and end-to-end duration histograms
- Bridge error counter with normalized error type allowlist
- Cancel request counter
- ObservableGauges for active sessions, SSE connections, heap usage

Key design decisions:
- ObservableGauge (not UpDownCounter) for gauge-like values — immune to
  +1/-1 drift across complex lifecycle paths
- Error type normalization via allowlist (19 known types + 'unknown')
  prevents unbounded cardinality
- Explicit histogram bucket boundaries tuned for daemon latency profiles
- Bridge decoupled via BridgeTelemetry.metrics optional sub-object
- emitDaemonLog generalized with optional eventName/severityNumber
- service.instance.id added to Resource for process incarnation detection
- Pre-shutdown forceFlushMetrics for best-effort final metric export

Closes #4554 §6.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- service.instance.id now serves as a fallback default rather than
  unconditional override, so operators can set a stable instance id
  via telemetry.resourceAttributes
- channelLifecycle('spawn') log no longer carries the misleading
  'expected' attribute (only meaningful for 'exit' events)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…down die metric

- BridgeTelemetryMetrics now re-exports DaemonBridgeTelemetryMetrics
  from core instead of duplicating the interface definition
- Add sessionLifecycle('die') in bridge shutdown loop so sessions
  alive at daemon shutdown are counted in the lifecycle counter

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- channelExitExpected now checks info.isDying in addition to
  shuttingDown, so deliberate channel kills from closeSession/
  killSession are correctly recorded as expected=true
- Add diag stub to the @opentelemetry/api mock in daemon-metrics
  tests for robustness

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- [Critical] forceFlushMetrics: void → await to prevent race with
  shutdownTelemetry tearing down the metric reader mid-flush
- registerDaemonGaugeCallbacks: add idempotency guard (gaugesRegistered)
  to prevent duplicate ObservableGauge callbacks on re-entry
- activeSseCount: add double-fire guard to prevent negative counter
  from abnormal close events
- Non-null assertions (!) → optional chaining (?.) on all recording
  functions for resilience against SDK misconfiguration
- expected ?? true vs !expected severity logic: use explicit
  expected === false to avoid contradictory signals when undefined

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- [Critical] Fix TS1308: await inside non-async Promise executor.
  Restructured to .then() chain so forceFlushMetrics completes
  before bridge.shutdown() starts, without requiring async executor.
- forceFlushMetrics: add 5s timeout via Promise.race to prevent
  indefinite blocking on unreachable OTLP collector.
- Add idempotency test for registerDaemonGaugeCallbacks.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- sessionLifecycle('die') no longer emits ERROR severity — unexpected
  exits are already covered by channelLifecycle('exit', false) WARN
- gaugesRegistered = true moved to end of registerDaemonGaugeCallbacks
  for consistency with initializeDaemonMetrics and retry resilience

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Add flush.catch() in forceFlushMetrics to prevent unhandled rejection
  when timeout wins the Promise.race (sdk.ts)
- Fix log body inconsistency: use expected ?? true to match attribute
  (runQwenServe.ts)
- Guard channelLifecycle('exit') with handshakeComplete flag to prevent
  exit count exceeding spawn count on handshake failures (bridge.ts)
Keeps worst-case shutdown budget under Kubernetes default 30s grace
period. Healthy daemon flushes in <100ms; 2s is sufficient headroom.
@doudouOUC doudouOUC force-pushed the worktree-wiggly-meandering-lightning branch from d28a619 to 03a7310 Compare June 5, 2026 07:40

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one blocking metrics correctness issue.

Comment thread packages/cli/src/serve/server.ts

@BZ-D BZ-D left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doudouOUC doudouOUC merged commit 5164547 into daemon_mode_b_main Jun 5, 2026
6 checks passed
@doudouOUC doudouOUC deleted the worktree-wiggly-meandering-lightning branch June 5, 2026 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants