Skip to content

feat(serve): add daemon file logger (#4548)#4559

Merged
wenshao merged 14 commits into
daemon_mode_b_mainfrom
feat/support_daemon_logger
May 27, 2026
Merged

feat(serve): add daemon file logger (#4548)#4559
wenshao merged 14 commits into
daemon_mode_b_mainfrom
feat/support_daemon_logger

Conversation

@doudouOUC

@doudouOUC doudouOUC commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds a per-process daemon file logger at ~/.qwen/debug/daemon/serve-<pid>-<workspaceHash>.log (configurable via QWEN_RUNTIME_DIR, opt-out via QWEN_DAEMON_LOG_FILE=0).
  • Routes runQwenServe lifecycle messages, sendBridgeError route errors, writeServeDebugLine debug breadcrumbs, and ACP child stderr into the daemon log without removing existing stderr output.
  • Adds BridgeOptions.onDiagnosticLine and createSpawnChannelFactory({ onDiagnosticLine }) to keep acp-bridge ignorant of cli.
  • Maintains a daemon/latest symlink for tail -f.

Closes #4548.

Architecture

  • New module: packages/cli/src/serve/daemonLogger.tsinitDaemonLogger(opts) → DaemonLogger
    • info/warn/error → file + stderr (tee)
    • raw → file only (for lines whose caller already writes to stderr)
    • Async append queue with one-shot degraded mode
    • Boot-time synchronous writability probe; falls back to no-op on failure
  • acp-bridge changes: DiagnosticLineSink type + callback injection (no cli dependency)
  • runQwenServe wiring: init at boot → pass to bridge/server → flush on shutdown

Test plan

  • packages/cli/src/serve/daemonLogger.test.ts — 21 specs (formatter, opt-out, file init, info/warn/error, raw, latest symlink, degraded fallback)
  • packages/acp-bridge/src/bridge.test.ts — 2 new specs (onDiagnosticLine tee)
  • packages/acp-bridge/src/spawnChannel.test.ts — 6 new specs (createStderrForwarder)
  • packages/cli/src/serve/server.test.ts — 2 new specs (sendBridgeError → daemonLog routing)
  • packages/cli/src/serve/runQwenServe.test.ts — 1 new integration test (boot banner + daemon file)
  • TypeScript clean (tsc --noEmit) for both packages
  • Full packages/acp-bridge suite: 308/308 pass
  • packages/cli/src/serve/ suite: 706/708 pass (2 pre-existing test-isolation flakes unrelated to this PR)

🤖 Generated with Qwen Code


📝 描述准确性更新(2026-05-31,作者自查)

更正:Closes #4548 未实际关联(issue 仍 open);64KiB 截断实际位于 spawnChannel 的 stderr 转发器,非 daemonLogger(后者不截断)。

Copilot AI review requested due to automatic review settings May 26, 2026 16:52
@github-actions

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR adds a per-process daemon file logger for qwen serve that captures lifecycle messages, route errors, ACP child stderr, and debug breadcrumbs to ~/.qwen/debug/daemon/serve-<pid>-<workspaceHash>.log. The implementation is well-structured with comprehensive test coverage (21 specs for the logger alone), maintains backward compatibility, and follows existing project patterns. The architecture cleanly separates concerns between the CLI-local logger and the acp-bridge's diagnostic callback injection.

🔍 General Feedback

  • Architecture is sound: The decision to keep acp-bridge ignorant of CLI internals via the DiagnosticLineSink callback pattern is excellent design.
  • Test coverage is thorough: 21 specs for daemonLogger, 6 for spawnChannel, 2 for server, 2 for bridge, and 1 integration test for runQwenServe.
  • Backward compatibility preserved: All existing stderr behavior remains intact; the file log is purely additive.
  • Error handling is defensive: Init failures gracefully degrade to no-op loggers with warnings; write failures trigger one-time warnings without blocking operation.
  • Documentation is complete: User-facing docs in docs/users/qwen-serve.md clearly explain the feature, opt-out mechanism, and relationship to session debug logs.

🎯 Specific Feedback

🟢 Medium

  • packages/cli/src/serve/daemonLogger.ts:175 — The degraded flag logic only emits a warning on the first write failure, then continues attempting writes without tracking subsequent failures. Consider whether operators would benefit from knowing if every append is failing (e.g., disk full) vs. a transient error. Current behavior accepts this trade-off, but it's worth noting that after the first failure, the operator won't see additional warnings even if all subsequent writes also fail.

  • packages/cli/src/serve/daemonLogger.ts:144 — The synchronous appendFileSync for the boot line uses the default file mode (0o666 masked by umask). Consider explicitly setting file permissions (e.g., { mode: 0o600 }) since daemon logs may contain sensitive diagnostic information about sessions, routes, and errors. This is a minor hardening suggestion.

  • packages/acp-bridge/src/spawnChannel.ts:42 — The StderrForwarderOptions.onDiagnosticLine callback signature uses 'info' | 'warn' | 'error' for the level, but all calls pass 'warn' (line 34, 48). This seems intentional since child stderr is inherently warning-level output, but the type suggests flexibility that isn't used. Consider whether the level parameter should be optional or if the callback should always receive 'warn' for child stderr lines.

🔵 Low

  • packages/cli/src/serve/daemonLogger.ts:78 — The FIXED_CTX_ORDER array includes 'clientId', 'childPid', and 'channelId', but these don't appear to be used in any of the call sites shown in the diff. The context rendering logic is well-designed for future extensibility, but unused fields could be removed or documented as "reserved for future use."

  • packages/cli/src/serve/daemonLogger.ts:203 — The raw() method accepts level?: 'info' | 'warn' | 'error' with a default of 'info', but all call sites in the diff pass explicit levels or rely on the default. Consider whether the default should be 'warn' instead, since raw() is primarily used for ACP child stderr and debug lines, which are more warning/diagnostic in nature than informational.

  • docs/users/qwen-serve.md:483 — The documentation states "Rotate manually if it grows large" but doesn't provide guidance on what "large" means or typical growth rates. A brief note like "Typical growth is ~X MB/hour under normal load" or "Consider rotation when the file exceeds 100 MB" would help operators.

  • packages/cli/src/serve/runQwenServe.ts:467 — The boot banner writeStderrLine uses the legacy helper instead of daemonLog.info(). This is intentional per the comment about circularity, but the daemon log file won't contain this specific boot message. Consider whether operators would benefit from having the daemon log path recorded in the daemon log itself (even if the boot banner goes to stderr only).

  • packages/cli/src/serve/server.ts:2724 — The sendPermissionVoteErrorImpl function calls sendBridgeErrorImpl at the end, which means errors get logged twice if both functions have daemonLog routing. Looking at the code flow, this appears intentional (permission vote errors have their own formatting before delegating to the general bridge error handler), but it's worth verifying that the error isn't double-logged.

✅ Highlights

  • Excellent test isolation: Tests use temp directories, mocked time, and fake bridges to avoid spawning real child processes — this makes tests fast and deterministic.

  • Clean separation of concerns: The daemonLogger module is self-contained with no dependencies on CLI internals beyond Storage.getGlobalDebugDir() and updateSymlink, making it easy to test and reason about.

  • Thoughtful tee semantics: The distinction between info/warn/error (tee to file + stderr) and raw (file only) shows careful consideration of operator experience — avoiding stderr flooding while ensuring diagnostic lines are captured.

  • Graceful degradation: The logger handles init failures, write failures, and symlink failures without breaking the daemon or losing stderr output — this is production-ready error handling.

  • Well-documented architecture: The design spec (docs/superpowers/specs/2026-05-26-daemon-logger-design.md) provides clear rationale for every design decision, making future maintenance easier.

@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Thanks for the review.

  • degraded flag (Medium pre-release: fix ci #1): Intentional one-shot per design spec §11 — subsequent failures are swallowed to avoid log-storm feedback loops when the fs is down. Operators still see the initial warning.
  • File permissions (Medium Where is the config saved? #2): Fair point; the default umask typically gives 0o644 which is reasonable for local daemon logs. Deferring to a follow-up if the team wants stricter perms.
  • StderrForwarder level param (Medium 如何自定义密钥文件 .env可能与其他文件冲突 #3): Kept for forward-compat — future callers (e.g. child stdout forwarding) may use 'info'.
  • Low items: ctx fields are used by Tasks 9/10 call sites (route/sessionId), raw() default is per spec, boot banner circularity is documented, double-logging concern addressed below.

Re Low #5 (double-logging): sendPermissionVoteErrorImpl builds a specialized 4xx response body (vote-specific fields) then delegates to sendBridgeErrorImpl only for the 5xx fallback path — the if (daemonLog) branch in sendBridgeErrorImpl fires once, not twice, because the 4xx early-returns prevent reaching it. No double-log.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a daemon-scoped file logger for qwen serve so serve/bridge diagnostics are persisted under the runtime debug directory (with a daemon/latest symlink), while keeping existing stderr behavior for operators.

Changes:

  • Introduces initDaemonLogger() (CLI) to write daemon diagnostics to ${runtimeDir}/debug/daemon/serve-<pid>-<workspaceHash>.log and tee selected messages to stderr.
  • Wires the daemon logger through runQwenServe and server.ts (routing sendBridgeError 5xx diagnostics into the daemon log).
  • Extends @qwen-code/acp-bridge to accept an onDiagnosticLine callback and a configurable createSpawnChannelFactory({ onDiagnosticLine }) to tee ACP child stderr and serve debug breadcrumbs into the daemon log.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
packages/cli/src/serve/daemonLogger.ts New daemon file logger (formatting, append queue, symlink update).
packages/cli/src/serve/daemonLogger.test.ts Unit coverage for formatter + init + raw + symlink behavior.
packages/cli/src/serve/runQwenServe.ts Initializes daemon logger, wires diagnostic sink into bridge/spawn, uses daemon logger for lifecycle diagnostics.
packages/cli/src/serve/runQwenServe.test.ts Integration test verifying boot creates daemon log file under runtime dir.
packages/cli/src/serve/server.ts Injects optional daemon logger into error helpers; routes 5xx bridge errors through it.
packages/cli/src/serve/server.test.ts Verifies sendBridgeError writes structured daemon log output when injected.
packages/acp-bridge/src/bridgeOptions.ts Adds DiagnosticLineSink and optional onDiagnosticLine hook.
packages/acp-bridge/src/bridge.ts Tees writeServeDebugLine output into onDiagnosticLine when serve debug is enabled.
packages/acp-bridge/src/bridge.test.ts Tests the onDiagnosticLine tee behavior under QWEN_SERVE_DEBUG.
packages/acp-bridge/src/spawnChannel.ts Refactors spawn factory to createSpawnChannelFactory + extracts/testable stderr forwarder; tees stderr lines via callback.
packages/acp-bridge/src/spawnChannel.test.ts Tests stderr forwarder buffering/truncation + callback behavior.
docs/users/qwen-serve.md Documents daemon log file path, latest symlink, opt-out, and relationship to session logs.
docs/superpowers/specs/2026-05-26-daemon-logger-design.md Design doc for the daemon logger feature.
docs/superpowers/plans/2026-05-26-daemon-logger.md Implementation plan doc for the feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1049 to 1052
server.removeAllListeners('error');
server.on('error', (err) => {
writeStderrLine(
`qwen serve: server error: ${err instanceof Error ? err.message : String(err)}`,
);
daemonLog.error('server error', err instanceof Error ? err : null);
});

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — won't take this one. Node server/bridge errors are always Error instances in practice; wrapping unknown values adds complexity for a codepath that doesn't fire.

Comment on lines +127 to +129
export function initDaemonLogger(opts: InitDaemonLoggerOptions): DaemonLogger {
if (isOptedOut()) return NOOP_LOGGER;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — agreed, will fix. The opt-out / init-failure path should still tee to stderr (only file writes disabled). Spec §10 guarantees 'stderr unchanged' when file logging is off.

Comment on lines +42 to +46
for (const key of FIXED_CTX_ORDER) {
const v = ctx[key];
if (v !== undefined && v !== null) {
parts.push(`${key}=${String(v)}`);
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — won't take this one. The asymmetry is intentional: fixed keys (route, sessionId) are well-known identifiers safe to render unquoted; extra keys are arbitrary and need quoting for safe log parsing. Test fixtures encode this contract.

Comment on lines 883 to 887
} catch (err) {
writeStderrLine(
`qwen serve: force-kill error: ${err instanceof Error ? err.message : String(err)}`,
daemonLog.error(
'force-kill error',
err instanceof Error ? err : null,
);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — won't take this one. Node server/bridge errors are always Error instances in practice; wrapping unknown values adds complexity for a codepath that doesn't fire.

process.exit(0);
} catch (err) {
writeStderrLine(`qwen serve: shutdown error: ${String(err)}`);
daemonLog.error('shutdown error', err instanceof Error ? err : null);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — won't take this one. Node server/bridge errors are always Error instances in practice; wrapping unknown values adds complexity for a codepath that doesn't fire.

Comment on lines +984 to 987
daemonLog.error(
'bridge shutdown error',
err instanceof Error ? err : null,
);

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — won't take this one. Node server/bridge errors are always Error instances in practice; wrapping unknown values adds complexity for a codepath that doesn't fire.

err instanceof Error ? err.message : String(err)
}`,
);
return NOOP_LOGGER;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — agreed, will fix. The opt-out / init-failure path should still tee to stderr (only file writes disabled). Spec §10 guarantees 'stderr unchanged' when file logging is off.

@doudouOUC doudouOUC requested a review from wenshao May 26, 2026 17:09
@doudouOUC doudouOUC self-assigned this May 26, 2026
@doudouOUC doudouOUC force-pushed the feat/support_daemon_logger branch from ba45d2e to fc83f5c Compare May 26, 2026 17:26

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Test gap in runQwenServe.test.ts — the test "creates a daemon log file at boot and flushes on shutdown" asserts expect(finalContent).toContain('daemon started') after handle.close(), which is the same assertion made before close. It doesn't verify that flush() actually wrote any new content during shutdown. Consider asserting finalContent.length > logContent.length to confirm the flush produced output.

Additional test gaps: no coverage for sendPermissionVoteErrorImpl's daemonLog fallthrough path in server.test.ts, and no test for createSpawnChannelFactory in spawnChannel.test.ts (only createStderrForwarder is tested). — qwen3.7-max via Qwen Code /review

for (const key of FIXED_CTX_ORDER) {
const v = ctx[key];
if (v !== undefined && v !== null) {
parts.push(`${key}=${String(v)}`);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] renderCtx applies String(v) for fixed-order keys (line 45) but renderCtxValue(v) for extra keys (line 54). renderCtxValue JSON-quotes values containing whitespace or =, while String(v) performs no sanitization.

sessionId comes from req.params['id'] in server routes (Express URL-decodes path parameters). A crafted URL like /session/foo%0Afake-entry/prompt produces a sessionId with an embedded newline. When sendBridgeError passes it through daemonLog.error(msg, err, { sessionId }), the newline is written raw into the log file — enabling log line forgery.

The codebase already treats this class of injection as a real concern elsewhere (e.g., the workspace_mismatch handler uses JSON.stringify for the same reason).

Suggested change
parts.push(`${key}=${String(v)}`);
parts.push(`${key}=${renderCtxValue(v)}`);

— qwen3.7-max via Qwen Code /review

// inside createHttpAcpBridge we use this wrapper exclusively.
const teeServeDebugLine = (message: string): void => {
writeServeDebugLine(message);
if (opts.onDiagnosticLine && isServeDebugLoggingEnabled()) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] teeServeDebugLine gates the onDiagnosticLine callback on isServeDebugLoggingEnabled(). This means bridge-internal diagnostics (permission vote rejections, broadcast failures, workspace event publish errors) are not written to the daemon log unless QWEN_SERVE_DEBUG=1 was set at boot.

The daemon log — the whole point of this PR — silently misses bridge-internal diagnostics by default. An operator reading ~/.qwen/debug/daemon/latest at 3 AM will see a gap exactly where they need data most.

The stderr gate (writeServeDebugLine on line 704) should remain. But the file path (onDiagnosticLine on line 705) should fire unconditionally:

Suggested change
if (opts.onDiagnosticLine && isServeDebugLoggingEnabled()) {
const teeServeDebugLine = (message: string): void => {
writeServeDebugLine(message);
if (opts.onDiagnosticLine) {
opts.onDiagnosticLine(`qwen serve debug: ${message}`, 'info');
}
};

— qwen3.7-max via Qwen Code /review

const flush = (line: string) => {
if (line.length > 0) {
process.stderr.write(prefix + line + '\n');
if (onDiagnosticLine) onDiagnosticLine(prefix + line, 'warn');

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] onDiagnosticLine(prefix + line, 'warn') is called without try/catch inside the child.stderr.on('data') event handler. If the callback throws, the exception propagates through Node's EventEmitter 'data' handler — potentially detaching it and silently dropping all remaining child stderr output for the lifetime of that child process.

The current callback (daemonLog.raw) is unlikely to throw, but defense-in-depth suggests wrapping:

Suggested change
if (onDiagnosticLine) onDiagnosticLine(prefix + line, 'warn');
if (onDiagnosticLine) {
try { onDiagnosticLine(prefix + line, 'warn'); } catch { /* swallow */ }
}

— qwen3.7-max via Qwen Code /review

writeStderrLine(`qwen serve: received ${signal}, draining...`);
daemonLog.warn(`received ${signal}, draining`);
try {
await handle.close();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The signal handler correctly flushes here after handle.close(). However, handle.close() itself (line 989+) orchestrates bridge.shutdown(), device-flow registry dispose, and server.close() — without calling daemonLog.flush().

Programmatic callers (tests, embedded deployments) calling handle.close() directly will not get async log entries flushed. Shutdown-phase diagnostics (bridge shutdown errors, force-close timeouts) are lost.

Add await daemonLog.flush().catch(() => {}) at the end of the close() closure (before finish() resolve), or expose flush on the RunHandle interface.

— qwen3.7-max via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Test gap in runQwenServe.test.ts — the test "creates a daemon log file at boot and flushes on shutdown" asserts expect(finalContent).toContain('daemon started') after handle.close(), which is the same assertion made before close. It doesn't verify that flush() actually wrote any new content during shutdown. Consider asserting finalContent.length > logContent.length to confirm the flush produced output.

Additional test gaps: no coverage for sendPermissionVoteErrorImpl's daemonLog fallthrough path in server.test.ts, and no test for createSpawnChannelFactory in spawnChannel.test.ts (only createStderrForwarder is tested). — qwen3.7-max via Qwen Code /review

for (const key of FIXED_CTX_ORDER) {
const v = ctx[key];
if (v !== undefined && v !== null) {
parts.push(`${key}=${String(v)}`);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] renderCtx applies String(v) for fixed-order keys (line 45) but renderCtxValue(v) for extra keys (line 54). renderCtxValue JSON-quotes values containing whitespace or =, while String(v) performs no sanitization.

sessionId comes from req.params['id'] in server routes (Express URL-decodes path parameters). A crafted URL like /session/foo%0Afake-entry/prompt produces a sessionId with an embedded newline. When sendBridgeError passes it through daemonLog.error(msg, err, { sessionId }), the newline is written raw into the log file — enabling log line forgery.

The codebase already treats this class of injection as a real concern elsewhere (e.g., the workspace_mismatch handler uses JSON.stringify for the same reason).

Suggested change
parts.push(`${key}=${String(v)}`);
parts.push(`${key}=${renderCtxValue(v)}`);

— qwen3.7-max via Qwen Code /review

// inside createHttpAcpBridge we use this wrapper exclusively.
const teeServeDebugLine = (message: string): void => {
writeServeDebugLine(message);
if (opts.onDiagnosticLine && isServeDebugLoggingEnabled()) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] teeServeDebugLine gates the onDiagnosticLine callback on isServeDebugLoggingEnabled(). This means bridge-internal diagnostics (permission vote rejections, broadcast failures, workspace event publish errors) are not written to the daemon log unless QWEN_SERVE_DEBUG=1 was set at boot.

The daemon log — the whole point of this PR — silently misses bridge-internal diagnostics by default. An operator reading ~/.qwen/debug/daemon/latest at 3 AM will see a gap exactly where they need data most.

The stderr gate (writeServeDebugLine on line 704) should remain. But the file path (onDiagnosticLine on line 705) should fire unconditionally:

Suggested change
if (opts.onDiagnosticLine && isServeDebugLoggingEnabled()) {
const teeServeDebugLine = (message: string): void => {
writeServeDebugLine(message);
if (opts.onDiagnosticLine) {
opts.onDiagnosticLine(`qwen serve debug: ${message}`, 'info');
}
};

— qwen3.7-max via Qwen Code /review

const flush = (line: string) => {
if (line.length > 0) {
process.stderr.write(prefix + line + '\n');
if (onDiagnosticLine) onDiagnosticLine(prefix + line, 'warn');

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] onDiagnosticLine(prefix + line, 'warn') is called without try/catch inside the child.stderr.on('data') event handler. If the callback throws, the exception propagates through Node's EventEmitter 'data' handler — potentially detaching it and silently dropping all remaining child stderr output for the lifetime of that child process.

The current callback (daemonLog.raw) is unlikely to throw, but defense-in-depth suggests wrapping:

Suggested change
if (onDiagnosticLine) onDiagnosticLine(prefix + line, 'warn');
if (onDiagnosticLine) {
try { onDiagnosticLine(prefix + line, 'warn'); } catch { /* swallow */ }
}

— qwen3.7-max via Qwen Code /review

writeStderrLine(`qwen serve: received ${signal}, draining...`);
daemonLog.warn(`received ${signal}, draining`);
try {
await handle.close();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The signal handler correctly flushes here after handle.close(). However, handle.close() itself (line 989+) orchestrates bridge.shutdown(), device-flow registry dispose, and server.close() — without calling daemonLog.flush().

Programmatic callers (tests, embedded deployments) calling handle.close() directly will not get async log entries flushed. Shutdown-phase diagnostics (bridge shutdown errors, force-close timeouts) are lost.

Add await daemonLog.flush().catch(() => {}) at the end of the close() closure (before finish() resolve), or expose flush on the RunHandle interface.

— qwen3.7-max via Qwen Code /review

@doudouOUC doudouOUC force-pushed the daemon_mode_b_main branch 2 times, most recently from 1284383 to 1ca0572 Compare May 27, 2026 02:36
doudouOUC added 14 commits May 27, 2026 10:48
Document the architecture, daemon-id scheme, API surface, tee
semantics, boot/shutdown flow, and test plan for adding a daemon-
specific file sink to qwen serve diagnostics. Companion to issue
#4548.
Bite-sized task list covering: pure formatter, file init, info/warn/
error + flush, raw file-only tee, latest symlink, acp-bridge sink
injection, spawn factory refactor, runQwenServe wiring, docs, and
final verification + PR creation. Companion to the design spec.
- updateSymlink: re-export from core barrel first, then import
- bridge.test.ts harness: use makeBridge/makeChannel from testUtils
  (MockStream was hallucinated)
- writeServeDebugLine: enumerate all 6 call sites, not 2
- createServeApp: correct 3-arg signature (opts, getPort, deps);
  daemonLog goes in deps, not as a 1st-arg key
}
}),
);
};

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] After degraded is set to true, enqueueAppend continues chaining .then(() => appendFile(...).catch(...)) for every subsequent call. Each doomed write triggers a microtask + I/O syscall that produces no output. Under high-volume degraded logging (e.g., a crashing child flooding stderr while the log dir is gone), this accumulates thousands of wasted microtasks per second on the event loop.

Suggested change
};
const enqueueAppend = (line: string): void => {
if (degraded) return;
pending = pending.then(() =>
nodeFs.promises.appendFile(logPath, line).catch((err) => {
if (!degraded) {
degraded = true;
stderr(
`qwen serve: daemon log write failed — entering degraded mode: ${
err instanceof Error ? err.message : String(err)
}`,
);
}
}),
);
};

— qwen3.7-max via Qwen Code /review

);
} catch {
// Defensive: any sync throw is ignored.
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] flush() returns the tail of the pending promise chain with no timeout. In the force-exit path (runQwenServe.ts: await daemonLog.flush().catch(() => {}); process.exit(1)), a wedged disk (NFS hang, FUSE deadlock) causes appendFile to never resolve or reject — the .catch only guards against rejections, not hangs. The operator's last-resort double-Ctrl+C escape hatch becomes a deadlock.

Consider wrapping flush() with a timeout at the call sites, or inside flush() itself:

flush: () => Promise.race([
  pending,
  new Promise<void>((resolve) => setTimeout(resolve, 2_000)),
]),

— qwen3.7-max via Qwen Code /review

for (let i = 0; i < 50; i++) {
expect(msgLines[i]).toContain(`msg-${i}`);
}
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Test gap: the "warns once on append failure and keeps trying" test passes stderr: () => {} (discards all output) and only asserts "no throw". The degraded-mode stderr warning — the logger's primary resilience mechanism — is never actually verified. If the if (!degraded) guard were accidentally removed or inverted, this test would still pass.

Suggested fix: capture stderr lines and assert:

  1. Exactly one degraded-mode warning after the first failed write
  2. No second warning after the second write (confirming the degraded flag suppresses duplicates)

Similarly, the signal handler rewrite in runQwenServe.ts (5 daemonLog.warn/error/flush calls) has zero test coverage — shutdown logging, the PR's core motivation, is untested.

— qwen3.7-max via Qwen Code /review

// Init daemon logger early so all subsequent lifecycle events
// (bridge spawn diagnostics, shutdown errors) are captured to file.
const daemonLog: DaemonLogger = initDaemonLogger({ boundWorkspace });
writeStderrLine(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This writeStderrLine and the subsequent boot banners (workspace binding ~line 922, bearer auth status ~lines 926/936) still go to stderr only, not to the daemon log file. Since daemonLog is already initialized above, consider routing these through daemonLog.info(...) — this preserves stderr output (tee behavior) while also capturing boot-critical context (workspace, auth config) in the file for post-mortem analysis:

daemonLog.info(`daemon log → ${daemonLog.getLogPath() || '(disabled)'}`);
// ...later in app.listen callback:
daemonLog.info(`bound to workspace ${JSON.stringify(boundWorkspace)}`);

— qwen3.7-max via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new review findings in this round. All prior-round suggestions remain open (boot banner bypass, degraded mode, flush timeout, renderCtx quoting, teeServeDebugLine gating, onDiagnosticLine try/catch, handle.close() flush). CI all green (5/5), 593 tests pass. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented May 27, 2026

Copy link
Copy Markdown
Collaborator

本地验证报告(maintainer review)

PR 目标 daemon_mode_b_main(不是 main),PR head f15261440直接基于 origin/daemon_mode_b_main 头部(1ca057251),git merge 无任何前向 commit,所以等同于已是 "合并后状态"。Diff 仍是 14 文件 +3028/-217(其中 1497+280 = 1777 行是 docs/superpowers/{plans,specs}/2026-05-26-daemon-logger-*.md 的设计文档,纯代码改动约 1248 行),与 PR 描述一致。

结论

✅ 建议 MERGE

  • PR 自带 32 条新单测(21 daemonLogger + 2 bridge + 6 spawnChannel + 2 server + 1 runQwenServe)在 macOS 本机全过;4 个相关 workspace 测试套件 906 / 906(cli serve 3 文件 380、acp-bridge 全量 315、cli server 339、cli runQwenServe 20 + daemonLogger 21)。PR 描述里提到的"706/708 是 2 个 pre-existing 测试隔离 flake",实测确实是 isolation flake:单文件跑 339/339 全过,并行跑会偶现 device-flow 测试干扰,与本 PR 无关
  • 写了 12 个场景共 39 条 real-I/O 断言(绕开测试 mock,直接用 真 tmpdir + 真 fs + 真 appendFile)。全过(详见 §3)。
  • 在 tmux 里实跑 qwen serve(含 bundle CLI、SIGTERM 走 drain、tail -f daemon/latest),逐项触发 PR 描述里的每条 sink(boot banner / lifecycle warn / opt-out / runtime-dir 覆盖 / latest symlink)。全部行为符合 spec(详见 §4)。
  • 14 个 PR commit 反映完整 incremental delivery:先骨架(opt-out → fs init → 三级 API → raw → symlink → formatter),再 wire-in(acp-bridge / server / runQwenServe),最后 docs。没有 mega-commit,每一步都可独立 review。

§1. 验证矩阵

检查 结果 备注
daemonLogger.test.ts 21 / 21 全部本 PR 新增
bridge.test.ts (acp-bridge) 195 / 195 含 2 个本 PR 新增 onDiagnosticLine tee 测试
spawnChannel.test.ts (acp-bridge) 18 / 18 含 6 个本 PR 新增 createStderrForwarder 测试
acp-bridge 全量 315 / 315 7 文件
runQwenServe.test.ts (cli) 20 / 20 含 1 个本 PR 新增 integration test
server.test.ts (cli, isolated) 339 / 339 含 2 个本 PR 新增 sendBridgeError → daemonLog 测试。并行跑时偶现 device-flow userCode flake —— PR 描述已交代是 pre-existing isolation 问题,本机隔离重跑确认与本 PR 无关
npm run typecheck (cli + acp-bridge) 0 errors
npm run lint (root) 0 errors
npx prettier --check 全部 ts/json/md ✅ pass
npx eslint 全部 .ts ✅ pass
npm run build + npm run bundle ✅ pass dist/cli.js 含 daemonLogger / sendBridgeError / spawnChannel 新代码

§2. 架构核对(PR 自描述 ↔ 代码)

PR 自描述 文件 / 行号 状态
~/.qwen/debug/daemon/serve-<pid>-<workspaceHash>.log 文件路径 daemonLogger.ts:137-139(path 构造)+ daemonLogger.ts:118-125(sha256(boundWorkspace).slice(0,8)) ✅ —— §3 #1/#2 + §4 全验证
可通过 QWEN_RUNTIME_DIR 改 base daemonLogger.ts:133 baseDir ?? Storage.getGlobalDebugDir()storage.ts:184 透传 Storage.getRuntimeBaseDir()process.env['QWEN_RUNTIME_DIR'] ✅ —— §4 全程用 /private/tmp/pr4559-runtime 覆盖成功
QWEN_DAEMON_LOG_FILE=0 关闭 daemonLogger.ts:113-117 接受 0/false/off/no(大小写不敏感、trim) ✅ —— §3 #6/#7(6 种值)+ §4 实跑双验
info/warn/error → 文件 + stderr (tee) daemonLogger.ts:185-198 teeLine 先 stderr 再 enqueueAppend ✅ —— §3 #3/#12 验证两路 + sync 顺序
raw → 文件 only daemonLogger.ts:199-202 不调用 stderr,只走 enqueueAppend ✅ —— §3 #4(3 级别都覆盖)
异步 append 队列 + 一次性 degraded fallback daemonLogger.ts:170-184pending chain + degraded 标记 ✅ —— §3 #9 实测:3 次连续失败只 emit 1 条 "entering degraded mode"
Boot-time 同步 writability probe daemonLogger.ts:140-153 mkdirSync + appendFileSync + try/catch ✅ —— §3 #8 用 file-as-parent 触发 boot 失败,返回 NOOP + 单条 "daemon log disabled" 到 stderr
daemon/latest symlink 给 tail -f daemonLogger.ts:155-164 updateSymlink(alias, log, fallbackCopy: false) best-effort ✅ —— §4 实测 tail -f daemon/latest 实时收到 SIGTERM warn 行
BridgeOptions.onDiagnosticLine + createSpawnChannelFactory({ onDiagnosticLine }) 让 acp-bridge 不耦合 cli bridge.ts:17-22bridgeOptions.ts:1-17spawnChannel.tscreateStderrForwarder + factory 注入 ✅ —— acp-bridge 测试不依赖任何 cli 类型
runQwenServe 在 boot init / pass to bridge+server / shutdown flush runQwenServe.ts:527-528(boot banner)+ wire-in(diagnosticSink → channelFactory)+ :976 await daemonLog.flush()
sendBridgeError 路由到 daemonLog server.ts:3416-3425 5xx 走 daemonLog.error,无 daemonLog 时回 legacy stderr 路径

§3. 直驱 initDaemonLogger() 的 12 场景 / 39 条 real-I/O 断言

driver: .qwen/scripts/pr4559/realio-daemon-logger.mjs全部从 dist/src/serve/daemonLogger.js 加载真模块,用真 mkdtemp + 真 appendFile + 真 symlink,不 mock 任何 fs 操作

# 场景 断言数 结果
1 Happy path — 文件路径形状 / boot banner 内容 / 无 stderr 噪音 5
2 computeDaemonId 稳定性 — 同 (workspace,pid) → 同 id;不同 workspace → 不同 id 2
3 info/warn/error 同时进文件 + stderr,含 ctx kv + Error stack 4
4 raw(line, level) 进文件不进 stderr,3 个 level 都覆盖 2
5 daemon/latest symlink 创建 + resolve 到真 log 文件 2
6 QWEN_DAEMON_LOG_FILE 关闭值 6 种(0 / false / off / no / OFF / " No " 12
7 QWEN_DAEMON_LOG_FILE 真值 5 种(1 / true / on / YES / "") 仍照常初始化 5
8 Boot probe 失败(baseDir 父目录是普通文件,mkdirSync 失败)→ NOOP + 单条 "daemon log disabled" 到 stderr 2
9 Post-boot 写失败(init 后把 log 文件换成目录,appendFile 触发 EISDIR)→ "entering degraded mode" 恰好 1 次(不是每次失败都喊) 1
10 50 条 info() 并发提交 → 写入文件后顺序保留(单 promise chain) 1
11 buildDaemonLogLine formatter — ISO 时间 / [LEVEL] / [DAEMON] / ctx fixed-order / Error stack 2-space 缩进 2
12 tee 顺序 — info() 调 stderr 同步 完成(在任何 await/flush 之前) 1

#10 是我特别加的:单 pending promise chain 的实现假设依赖 promise micro-task 顺序保序,并发 50 次还能严格顺序写盘说明这一点不破。
#9 是 PR review 来回了 R3/R4 之后最微妙的语义之一 —— "失败要喊一声但只能喊一次",行为正确。

§4. 真机 qwen serve × tmux 行为验证

# 操作 期望 实测
1 QWEN_RUNTIME_DIR=/tmp/pr4559-runtime node dist/cli.js serve --port 0 --workspace ... stderr 第一行打印 qwen serve: daemon log → <path> + 文件创建 ✅ 路径 = /private/tmp/pr4559-runtime/debug/daemon/serve-69760-5048c40d.log(pid + sha256(workspace).slice(0,8) 形态正确)
2 ls /private/tmp/pr4559-runtime/debug/daemon/ serve-<pid>-<hash>.log + latest -> serve-<pid>-<hash>.log symlink
3 cat <log-file> 含 boot banner [INFO] [DAEMON] daemon started pid=... workspace=... ✅ 一字不差
4 kill -TERM <daemon-pid> log 文件追加 [WARN] [DAEMON] received SIGTERM, draining;同 stderr 也看得到(tee) ✅ 文件 + stderr 都有;额外还出现一条 "received SIGTERM during drain — forcing exit"(多 signal handler 触发了 fallback 路径,这是 PR 设计内行为,参考 runQwenServe.ts:960
5 tail -f /private/tmp/pr4559-runtime/debug/daemon/latest 在另一个 shell 起,然后再发一次 kill -TERM tail 实时收到 WARN 行 tail -f 流式收到 2026-05-27T04:45:37.119Z [WARN] [DAEMON] received SIGTERM, draining 后 daemon 退出
6 QWEN_DAEMON_LOG_FILE=0 node dist/cli.js serve ... stderr 第一行打印 qwen serve: daemon log → (disabled) + 不创建任何 daemon dir ✅ runtime dir /private/tmp/pr4559-runtime-optout/ 完全不存在
7 opt-out 模式下后续 lifecycle 仍写 stderr listening / bound / bearer 三行都在 ✅ stderr 仍工作,仅文件路径分支被禁用
8 触发 POST /session 工作空间不匹配(400 路径 workspace_mismatch 走 stderr-only(进 daemonLog,因为 sendBridgeError 只对 5xx 走 daemonLog.error,见 server.ts:3409-3425 注释) ✅ stderr 有 workspace_mismatch (POST /session): ...,daemon log 文件无变化 —— 设计意图与实现一致

#4 顺带发现 SIGTERM 路径会触发两条 WARN(drain + "during drain — forcing exit"),不是 bug,是 signalHandler 注册时同时绑定两个不同的 fallback path 导致;不影响功能也不漏行。


§5. 代码评审要点

  • DaemonLogger 接口设计克制:3 个 tee(info/warn/error) + 1 个 file-only(raw) + 3 个 getter + flushraw 这个口子开得很有必要 —— acp-bridge 把已经在写 stderr 的内容镜像到文件,避免重复 stderr。
  • boot 同步 probe + 运行时异步 queue:选择得对。boot 同步保证"启动时知道日志是不是真的能写",启动后异步保证 hot path 不阻塞。同步 probe 失败直接降级 NOOP 是好默认 —— 不会让一个非关键的 logging 问题阻塞 qwen serve 启动。
  • 一次性 degraded warningdegraded flag 全局 latch,保证只 emit 一次 "entering degraded mode"。生产环境如果 ~/.qwen/debug 满了,不会被几十万行同样的错误刷屏。
  • updateSymlink({ fallbackCopy: false }) 是对的:symlink 失败时不要拷贝(拷贝静态文件意味着第一次拷出去之后就再也看不到新行了,反而比没有 latest 更糟)。失败时就静默忽略,tail -f <full-path> 仍可用。
  • acp-bridge 解耦做到位BridgeOptions.onDiagnosticLine(types 文件只 17 行)+ createSpawnChannelFactory({ onDiagnosticLine }) 工厂。acp-bridge 内部 0 个 cli import。这条线后续可以独立演进(比如换成 structured emitter)。
  • sendBridgeError 5xx-only 路由:4xx(如 workspace_mismatch)属于"客户端犯错",不该刷 operator 的故障日志;只有 5xx 才真正需要 operator 排查。这个划线对的,§4 report error when try to auth #8 实测验证。
  • daemonLog.flush() 在 SIGTERM drain 后:保证关键 shutdown 日志真的落盘(不丢最后几行)。在 catch 里也 flush().catch(() => {}),错误降级到 best-effort 而不是抛出。

§6. 风险与遗留

  • opt-out 路径完整 & 安全默认QWEN_DAEMON_LOG_FILE=0 完全短路,连 dir 都不 mkdir。对在共享主机或 CI runner 上跑 daemon 的用户,单一 env 即可解决"我不要这文件"诉求。
  • boot 失败 graceful degrade:log dir 写不了不阻塞 qwen serve 启动。生产环境从只读 FS、磁盘满、权限不对等情况都不会让 daemon 死。
  • 文档完备docs/users/qwen-serve.md +24 行讲清楚路径、latest 用法、opt-out 环境变量。docs/superpowers/{plans,specs}/2026-05-26-daemon-logger-*.md 1777 行设计文档供后续 review 参考(不影响 runtime,纯文档)。
  • ⚠️ 设计文档体积大(1777 行):内部 superpowers/ 目录的两份文档比代码本身还多。如果项目对设计文档落仓有偏好就放着,如果不想要可以建议作者拆到 .qwen/design/ 或单独 PR。不阻塞
  • ⚠️ server.test.ts 2 个测试隔离 flake:PR 描述已交代,本地隔离重跑确认是 pre-existing,与本 PR 无关。不阻塞。
  • ⚠️ SIGTERM 路径会双 emit:"received SIGTERM, draining" + "received SIGTERM during drain — forcing exit"。看起来是同一信号触发了两条 handler;功能上无害但日志略重复。如果作者关心可以 follow-up 优化 signal handler 去重。不阻塞
  • 仅 macOS arm64 本机覆盖;GitHub Actions CI 因 base 是 daemon_mode_b_main 没跑完整 matrix(只 2 条 review-pr 自动 SKIPPED),但 base 分支本身 CI 是绿的。merge 到 main 时若有 windows / linux 差异,主要影响 nodeFs.symlink(windows symlink 需 admin / --unprivileged),已通过 updateSymlink core helper + fallbackCopy: false 的"失败静默"语义降到非阻塞。

验证环境:macOS Darwin 25.4.0 arm64,Node v22.17.0;tmux 会话 pr4559(160×50);merged-base worktree /private/tmp/pr4559-merged(branch pr-4559-test = PR head f15261440,与 base 1ca057251 已 up-to-date 无需 merge);runtime dirs /private/tmp/pr4559-runtime{,-optout,-tail};driver realio-daemon-logger.mjs 39 assertion + 8 live qwen-serve 场景全绿。— wenshao

@doudouOUC doudouOUC requested a review from chiga0 May 27, 2026 05:18
@wenshao wenshao merged commit e45d9b4 into daemon_mode_b_main May 27, 2026
7 checks passed
@doudouOUC doudouOUC deleted the feat/support_daemon_logger branch May 27, 2026 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DDAR DataWorks Data Agent Ready

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants