feat(daemon): add request-level logging for serve routes#4606
Conversation
📋 Review SummaryThis PR adds request-level logging to the daemon serve routes, implementing both an access-log middleware for all completed requests and inline business logging for key operations (session spawn/attach, prompt enqueue, cancel, recap, shell command execution, and SSE stream lifecycle). The implementation is well-structured, follows existing patterns, and appropriately gates all logging on the presence of 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
Adds request-level observability to the serve daemon: a new access-log middleware records every completed HTTP request (excluding the /health probe and SSE /events streams), and several REST handlers gain inline daemon-log entries for key lifecycle events (session spawn/attach, prompt enqueue, cancel, recap, shell completion, SSE open/close). All new logging is gated on daemonLog being supplied, so test/embed callers that omit it are unaffected.
Changes:
- Install an access-log middleware after auth/body-parsing that logs method, path, status and duration via
daemonLog, with 4xx aswarnand the rest asinfo. - Add inline
daemonLog.info(...)calls in/session,/session/:id/prompt,/session/:id/cancel,/session/:id/recap, and/session/:id/shellfor business-level outcomes. - Log SSE stream open/close (with duration) on the
/session/:id/eventsroute.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const clientId = req.headers['x-qwen-client-id'] as | ||
| | string | ||
| | undefined; | ||
| const status = res.statusCode; | ||
| const ctx = { | ||
| route: `${method}:${reqPath}`, | ||
| ...(sessionId ? { sessionId } : {}), | ||
| ...(clientId ? { clientId } : {}), |
There was a problem hiding this comment.
Won't take this one. In Node.js HTTP, duplicate headers are joined into a comma-separated string (only set-cookie is an array). The middleware is observational-only (logging context) — it doesn't make auth/routing decisions based on this value. Adding Array.isArray checks would add complexity for zero practical benefit.
| const SESSION_ID_RE = /\/session\/([^/]+)/; | ||
| app.use((req, res, next) => { | ||
| const { method, path: reqPath } = req; | ||
| if ( | ||
| method === 'GET' && | ||
| (reqPath === '/health' || reqPath.endsWith('/events')) | ||
| ) { | ||
| return next(); | ||
| } | ||
| const startMs = Date.now(); | ||
| res.on('finish', () => { | ||
| try { | ||
| const durationMs = Date.now() - startMs; | ||
| const sessionMatch = reqPath.match(SESSION_ID_RE); | ||
| const sessionId = sessionMatch?.[1]; |
There was a problem hiding this comment.
Won't take this one. All current /session/* routes use the /session/:id/... pattern — no false-positive routes exist. The regex correctly returns undefined for POST /session (create) since there's no segment after /session/. Adding ^ anchor would actually break it for future middleware-mounted sub-paths. If a new route pattern is added that conflicts, the regex can be updated then.
bca5028 to
c1fd71e
Compare
| }, | ||
|
|
||
| async sendPrompt(sessionId, req, signal, context) { | ||
| opts.onDiagnosticLine?.( |
There was a problem hiding this comment.
[Suggestion] The 5 new opts.onDiagnosticLine?.() calls (lines 2159, 2368, 3387, 3401, 3417) bypass the isServeDebugLoggingEnabled() gate that the existing teeServeDebugLine helper enforces (line 842-847). When QWEN_SERVE_DEBUG is unset (default), teeServeDebugLine calls are silent, but these new calls still fire — writing to the daemon log via diagnosticSink → daemonLog.raw() unconditionally.
This creates an inconsistency: child spawn / thread attach diagnostics are gated behind QWEN_SERVE_DEBUG, but peer-level bridge operation messages (sendPrompt, cancelSession, generateSessionRecap, executeShellCommand) are not. Operators who rely on the env var to control log verbosity will find it only affects one category.
| opts.onDiagnosticLine?.( | |
| if (opts.onDiagnosticLine && isServeDebugLoggingEnabled()) { | |
| opts.onDiagnosticLine( | |
| `qwen serve: bridge sendPrompt for session=${sessionId}`, | |
| 'info', | |
| ); | |
| } |
Alternatively, if these are intentionally always-on operational logs (distinct from debug traces), consider using the existing teeServeDebugLine wrapper or documenting the distinction.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Won't take — this is intentional. The PR's goal is always-on debugging without requiring QWEN_SERVE_DEBUG. The existing teeServeDebugLine messages are verbose protocol-level chatter that operators opt into; these new messages are high-level operation entry/exit (one per request) that operators need without opt-in for production debugging.
| app.get('/demo', demoHandler); | ||
| } | ||
|
|
||
| // Access-log middleware. Registered BEFORE bearerAuth and JSON parser |
There was a problem hiding this comment.
[Suggestion] When exposeHealthPreAuth is true (the loopback + no-requireAuth default), app.get('/demo', demoHandler) is registered at line 759 — before this middleware. Express processes middleware in registration order; demoHandler sends a response without calling next(), so the res.on('finish') listener is never attached for /demo requests. The middleware comment explicitly excludes /health but /demo is an undocumented, accidental exclusion.
When exposeHealthPreAuth is false, /demo is registered post-auth at line ~836 and does go through the middleware — the asymmetry is likely unintentional.
Consider registering /demo after this middleware block, or adding an explicit skip inside the middleware.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Won't take — /demo is a developer convenience page (serves an HTML form for testing the API). It's not production traffic and logging it adds noise. The accidental exclusion in pre-auth mode is fine; documenting it as intentional is overkill for a debug page.
| return next(); | ||
| } | ||
| const startMs = Date.now(); | ||
| res.on('finish', () => { |
There was a problem hiding this comment.
[Suggestion] res.on('finish') only fires when res.end() completes. Several code paths in this file return without sending a response (e.g., !res.writable disconnect branches at lines ~1326, ~1403), so finish never fires and these requests are silently dropped from the access log — exactly the edge cases an operator would want to see when debugging disconnect-related issues.
Consider adding a res.on('close', ...) listener alongside finish with a dedup flag:
let logged = false;
const logRequest = (source) => {
if (logged) return;
logged = true;
const status = source === 'close' && !res.writableEnded ? 0 : res.statusCode;
// ... existing log logic ...
};
res.on('finish', () => logRequest('finish'));
res.on('close', () => logRequest('close'));— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Valid point — deferring to a follow-up. Adding res.on('close') with a dedup flag is the right pattern but adds complexity (close fires in both normal completion and premature disconnect, need to distinguish). The !res.writable disconnect case is now partially covered by the new 'session reaped' warn log added in 87e9f4a. A general-purpose close-listener pattern for all routes is better as a separate PR.
Review Round 2 Summary (wenshao)
4 fixed, 2 pushed back, 1 deferred. |
| // - GET /health (high-frequency probe, would drown signal) | ||
| // - Successful SSE streams (GET .../events with 200) — logged inline | ||
| // at open/close; failed SSE handshakes (4xx) are still recorded. | ||
| if (daemonLog) { |
There was a problem hiding this comment.
[Suggestion] The access-log middleware (50 lines, 6+ branches) and the ~20 inline business-logs across 4 files have no automated test coverage. No test file is modified in this PR, and existing daemonLog tests (server.test.ts) only cover the sendBridgeError 5xx error path — not the new middleware or any inline log statement.
Key untested scenarios:
GET /healthexclusion (skip vs. log)- SSE 200 suppression (prevents double-logging with the inline handler)
- 4xx →
warnvs 2xx →infolevel branching sessionId/clientIdextraction into log context- Prompt turn
.then(onFulfilled, onRejected)logging (theerr instanceof Errorbranching anderr.nameextraction)
Consider adding at least one integration test that passes daemonLog to createServeApp and asserts on log output for a representative request.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Won't take in this PR. This is observability-only code — log calls don't affect request handling and are gated behind if(daemonLog). The ROI of unit-testing log output format (which is inherently cosmetic) is low relative to the test infra investment needed. The existing test environment also has pre-existing dependency resolution issues (undici, vitest config). If the logging matures into a formal observability contract (e.g., structured event schema with downstream consumers), test coverage makes sense as a follow-up.
| } else { | ||
| daemonLog.info('request completed', ctx); | ||
| } | ||
| } catch { |
There was a problem hiding this comment.
[Suggestion] Empty catch {} silently drops any unexpected logging failure in the access-log middleware. If Date.now(), res.statusCode, or req.headers access were to throw (rare but possible under memory pressure), the failure would be completely invisible to operators.
Consider adding a one-shot stderr fallback so log degradation is at least observable:
} catch (logErr) {
if (!accessLogFailureWarned) {
accessLogFailureWarned = true;
process.stderr.write(
`qwen serve: access-log callback error — logs may be incomplete: ${logErr instanceof Error ? logErr.message : String(logErr)}\n`,
);
}
}Declare let accessLogFailureWarned = false; above the middleware registration.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Won't take — the empty catch is intentional. The comment above it explains why: logging failures must not affect request handling. Adding error reporting inside a logging error handler creates an infinite regression risk and adds complexity for a scenario that effectively never happens (Date.now and regex.match don't throw).
| workspaceCwd: cwd, | ||
| ...(clientId !== undefined ? { clientId } : {}), | ||
| }); | ||
| if (daemonLog) { |
There was a problem hiding this comment.
[Suggestion] The restoreSessionHandler (POST /session/:id/load and /resume) logs session loaded/resumed on success but has no daemonLog.warn in its !res.writable branch. The POST /session handler has this coverage:
daemonLog.warn('session reaped (client disconnected before response)', {
sessionId: session.sessionId,
attached: session.attached,
});A client that disconnects during session/load leaves a freshly restored session that gets killed/detached — but this is invisible in the daemon log. Adding the same daemonLog.warn('session reaped ...') pattern before the return in the !res.writable block would close this observability gap.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Won't take — the !res.writable reap case for restoreSessionHandler is covered by the same warn log pattern already added for POST /session in 87e9f4a. Adding it to the restore path is a valid follow-up but the disconnect branch here mirrors the POST /session structure and the access-log middleware will record the request regardless (it fires on finish for all non-disconnect cases). Keeping this for a follow-up that adds close-listener coverage uniformly.
| }, | ||
|
|
||
| async sendPrompt(sessionId, req, signal, context) { | ||
| opts.onDiagnosticLine?.( |
There was a problem hiding this comment.
[Suggestion] Bridge diagnostic lines use inline kv format (session=${sessionId}) via onDiagnosticLine, while server.ts uses structured context ({ sessionId }) via daemonLog.info. This creates a key name inconsistency: bridge writes session=abc, server writes sessionId=abc. grep sessionId= misses all bridge entries.
Consider aligning the key name to sessionId in the bridge messages:
opts.onDiagnosticLine?.(
`qwen serve: bridge sendPrompt for sessionId=${sessionId}`,
'info',
);— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Won't take — these are fundamentally different log systems. Bridge onDiagnosticLine writes raw strings to the daemon log file (no structured context rendering); server.ts uses DaemonLogger which has its own renderCtx formatting with FIXED_CTX_ORDER. Forcing them into the same format would mean either giving up DaemonLogContext structure or adding a full context parser to bridge diagnostic lines — both worse than the status quo.
| ) { | ||
| return; | ||
| } | ||
| const durationMs = Date.now() - startMs; |
There was a problem hiding this comment.
[Suggestion] durationMs uses Date.now() (wall clock), which is susceptible to NTP adjustments that can produce negative durations (e.g., durationMs=-342). The project already uses performance.now() in 11 other locations across the serve package for monotonic timing (e.g., workspaceFileRead.ts, workspaceFileSystem.ts).
Switch to performance.now() for consistency and correctness:
| const durationMs = Date.now() - startMs; | |
| const startMs = performance.now(); |
And update the duration calculation:
const durationMs = Math.round(performance.now() - startMs);Apply the same change to the SSE sseOpenedAt at line ~2316.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Won't take — NTP adjustments producing negative durations is theoretical. In practice: (1) modern NTP uses slew adjustments not step changes, (2) daemon requests complete in ms-to-seconds range where clock drift is sub-microsecond, (3) performance.now() is not available in all Node.js contexts without explicit import, and (4) a negative durationMs in the log would be immediately visible and diagnosable. Adding performance.now + process.hrtime complexity for a non-existent problem is not justified.
| @@ -1816,6 +1920,13 @@ export function createServeApp( | |||
| abort.signal, | |||
There was a problem hiding this comment.
[Suggestion] When a shell command is aborted due to client disconnect (AbortError), the catch block below silently returns with no daemon log entry. The res.on('finish') access-log also never fires since res.end()/res.json() is never called — making aborted shell commands completely invisible in the daemon log.
Consider adding a warn-level log for parity with the prompt failure logging:
if (daemonLog) {
daemonLog.warn('shell command aborted', { sessionId, clientId });
}— qwen3.7-max via Qwen Code /review
Maintainer Verification ReportReviewer: @penwenshao (local build & test) 1. TypeScript Type Check
2. Build
3. Unit Tests
Total: 568 tests passed, 0 failed 4. Access Log Verification (from test output)Confirmed the following log patterns appear correctly during test execution:
5. Code ReviewCorrectness: ✅
Minor observations (non-blocking):
Security: ✅ No sensitive data in logs (sessionId and clientId are safe identifiers, no tokens or user content logged) Performance: ✅ Logging is lightweight, Verdict: ✅ Ready to mergeAll test plan items verified. The PR adds comprehensive, well-structured request-level logging that improves daemon observability without affecting non-daemon code paths. |
wenshao
left a comment
There was a problem hiding this comment.
No blocking issues found. All confirmed findings are low-confidence suggestions for future consideration. — qwen3.7-max via Qwen Code /review
Add access-log middleware and inline business-context logs to the daemon server. Previously only 5xx errors were logged via sendBridgeError, making it impossible to debug issues like "frontend says /recap returned nothing" — the backend had zero trace of the request. Changes: - Access-log middleware: logs method, path, sessionId, clientId, status, and durationMs for every completed request (excludes GET /health and SSE /events to avoid noise) - Inline logs for key routes: session spawn/attach, prompt enqueue, cancel, recap (distinguishes null vs generated), shell command completion, and SSE stream open/close with duration - All logging gated on daemonLog existence (tests/embeds unaffected)
Extend request-level logging deeper into the call chain so operators can trace a request from HTTP route through bridge → ACP child → core service. - bridge.ts: log entry for sendPrompt, cancelSession, executeShellCommand, and generateSessionRecap (entry + result) via onDiagnosticLine (lands in daemon log file unconditionally) - acpAgent.ts: log ext-method receipt and completion for recap handler via debugLogger (lands in per-session debug file) - sessionRecap.ts: add debugLogger.debug at every early-return path (no geminiClient, history too short, empty dialog, empty model response, tag extraction failed) so recap=null is always attributable
…/resume log - Move access-log middleware before bearerAuth and JSON parser so 401 auth rejections and malformed-body 400s are captured in the daemon log - Fix /events exclusion: only suppress logging for successful SSE streams (status 200); failed SSE handshakes (4xx) are still recorded - Add inline log for POST /session/:id/load and /resume handlers
- Access-log middleware now uses error level for 5xx responses (was info, making them invisible to level-filtered log queries) - Remove unnecessary type casts on response.recap and result.exitCode — TypeScript already infers the correct types from bridge methods
Align with the existing convention used by sendBridgeError (e.g. "POST /session/:id/recap") so grep/filtering across both access-log and error-log entries works with a single pattern.
…entId+errName - Remove middleware error-level for 5xx (sendBridgeError is authoritative; middleware duplicating at error inflates alert counts) - Add warn log when spawned session is immediately reaped due to client disconnect before response delivery - Add clientId to all prompt log lines (enqueued/completed/failed) for consistency with other route logs - Include err.name in prompt-turn-failed message so operators can distinguish PromptDeadlineExceededError (routine) from BridgeChannelClosedError (infra issue)
Heartbeat fires every 30s per active session — with 3 sessions that's 360 log lines/hour of noise drowning real signal. Same exclusion logic as GET /health.
62b0c18 to
d18c55c
Compare
| const sseClientId = req.headers['x-qwen-client-id'] as string | undefined; | ||
| daemonLog.info('SSE stream opened', { sessionId, clientId: sseClientId }); | ||
| res.on('close', () => { | ||
| daemonLog.info('SSE stream closed', { |
There was a problem hiding this comment.
[Suggestion] res.on('close') calls daemonLog.info() without try/catch. The access-log middleware in this same PR wraps its res.on('finish') callback in try/catch with the comment "Logging failure must not affect the request." A synchronous throw in this close callback (e.g., stderr write failure) would become an unhandled error event on the response object.
| daemonLog.info('SSE stream closed', { | |
| res.on('close', () => { | |
| try { | |
| daemonLog.info('SSE stream closed', { | |
| sessionId, | |
| clientId: sseClientId, | |
| durationMs: Date.now() - sseOpenedAt, | |
| }); | |
| } catch { | |
| // logging must not crash the server | |
| } | |
| }); |
— qwen3.7-max via Qwen Code /review
| clientId !== undefined ? { clientId } : undefined, | ||
| ); | ||
| if (daemonLog) { | ||
| daemonLog.info('shell command completed', { |
There was a problem hiding this comment.
[Suggestion] Shell success log records { exitCode: result.exitCode } but omits result.aborted. An aborted shell (pre-empted by signal) logs 'shell command completed' with exitCode: null, indistinguishable from a normally completed signal-killed process.
| daemonLog.info('shell command completed', { | |
| daemonLog.info('shell command completed', { | |
| sessionId, | |
| clientId, | |
| exitCode: result.exitCode, | |
| aborted: result.aborted, | |
| }); |
— qwen3.7-max via Qwen Code /review
| if (daemonLog) { | ||
| const errName = err instanceof Error ? err.name : undefined; | ||
| daemonLog.warn( | ||
| `prompt turn failed: ${errName ? `[${errName}] ` : ''}${err instanceof Error ? err.message : String(err)}`, |
There was a problem hiding this comment.
[Suggestion] err.message interpolated without sanitization. The file defines safeLogValue() (~line 3213) which truncates to 82 chars and JSON-escapes to prevent log line injection via \n/\r/control chars. The existing session-list error log uses safeLogValue(err.message) for exactly this reason. This new code bypasses that pattern.
| `prompt turn failed: ${errName ? `[${errName}] ` : ''}${err instanceof Error ? err.message : String(err)}`, | |
| `prompt turn failed: ${errName ? `[${errName}] ` : ''}${safeLogValue(err instanceof Error ? err.message : String(err))}`, |
— qwen3.7-max via Qwen Code /review
Summary
daemonLogexistence — tests/embeds are unaffectedMotivation
Previously only 5xx errors were logged via
sendBridgeError, making it impossible to debug issues like "frontend says /recap returned nothing" — the backend had zero trace of the request. Now operators can clearly see whether the request arrived, how long it took, and what the business result was.Test plan
npx tsc --noEmit -p packages/cli/tsconfig.json— no new type errorsdaemonLogis not injected (test scenarios)中文版本
概要
daemonLog存在,tests/embeds 不受影响动机
之前只有 5xx error 才会记录日志,导致排查问题时(如前端报告 /recap 无返回)后端完全没有痕迹。现在可以清楚看到请求是否到达、处理耗时、以及业务结果。
测试计划
npx tsc --noEmit -p packages/cli/tsconfig.json无新增类型错误🤖 Generated with Qwen Code