fix(API): capture staging commit errors in logs and stream all logs to journalctl#1592
Conversation
The commit endpoint's catch block used console.trace, which writes to stderr and never reaches the structured winston log files. This made 400 responses from POST /v1/staging/commit opaque in logs, forcing operators to correlate with pm2/journald output to find the root cause. Route the error through the existing logger so the reason lands in combined.log and error.log alongside the request metadata.
With NODE_ENV=production, the logger had no Console transport, so logger.error(...) calls only reached the on-disk log files and were invisible to journalctl/pm2/docker. The previous console.trace fallback in the staging commit handler was the only thing operators saw in systemd for that endpoint. Add a production-only Console transport at error level with stderrLevels so errors are written to process.stderr and picked up by the process supervisor, without pushing verbose/info/debug traffic into journalctl. File transports are unchanged, so the full log stream still lives on disk.
The previous production Console transport was filtered to error level, which kept journalctl quiet but left operators who consume logs via journalctl/pm2/docker without the info/warn context around failures. Drop the level filter so the transport inherits the logger's configured level (APP.LOG_LEVEL, default info). Errors still go to stderr via stderrLevels; everything else goes to stdout. File transports are unchanged, so the full on-disk log files continue to be written. Set LOG_LEVEL=debug in CADT config to also stream verbose/debug.
| stderrLevels: ['error'], | ||
| format: format.combine(format.json()), | ||
| }), | ||
| ); |
There was a problem hiding this comment.
Missing level: 'error' floods production console with info logs
High Severity
The production Console transport is missing the level: 'error' option that the PR description explicitly specifies. Without it, the transport inherits the logger's default level (getConfig().APP.LOG_LEVEL || 'info' from line 48), so all info/warn/error messages will be written to the console in production. The stderrLevels: ['error'] option only controls which output stream is used — it does not filter non-error messages. Info and warn traffic will flood stdout (journalctl/pm2/docker), contradicting the stated goal of "no new noise." The inline comment on lines 120–123 ("every log line") also reflects this unintended behavior.
Reviewed by Cursor Bugbot for commit 0adf362. Configure here.
Winston 3 treats a transport's `level` independently from the logger's level -- the logger's level is only the default for transports that don't specify their own. Without setting a level on the new production Console transport, it inherited the logger's level (APP.LOG_LEVEL, default info), so verbose/debug lines never reached journalctl even though the existing debug-%DATE%.log file captured them. Set `level: 'silly'` on the Console transport so journalctl receives every log line the app emits, regardless of APP.LOG_LEVEL. File transports are untouched.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0f54cc6. Configure here.
Address Bugbot feedback on #1592. The previous 'silly' level meant the per-query Sequelize logging in src/config/config.js (logger.silly) would flood journalctl/stdout on busy instances, regardless of APP.LOG_LEVEL. That contradicts the goal of giving operators a clean operational log stream. Tighten the transport to 'debug', which still guarantees that error, warn, info, verbose, and debug reach journalctl (matching the application-%DATE%.log file content), but excludes the silly SQL trace. Operators who actually need per-query output can flip the transport level locally.


Summary
Close two observability gaps:
POST /v1/staging/commit400 responses had no entry in the Winston JSON logs because the catch block usedconsole.trace.logger.*(...)calls only hit on-disk log files — operators consuming logs viajournalctl/ pm2 / docker saw nothing from the application.After this PR, error/warn/info/verbose/debug all go to both on-disk files and the process's stdout/stderr.
Changes
1.
src/controllers/staging.controller.js— log commit failures via WinstonError message and stack now land in
error.log/combined.log/application-*.log, and (because of change 2 below) also in journalctl.2.
src/config/logger.js— production Console transport at debug levelPreviously no Console transport was added in production. Now:
Behavior in production:
level: 'debug'— accepts error, warn, info, verbose, debug. Deliberately excludessilly, which is used insrc/config/config.jsfor per-query Sequelize output and would flood journalctl on a busy instance.APP.LOG_LEVEL.process.stderr; warn/info/verbose/debug →process.stdout.File-transport cleanup (consolidating to 3 rotated files and removing legacy unbounded files) is handled in #1597.
Verification
Smoke test against
winston@3.19.0Console transport withlevel: 'debug',stderrLevels: ['error'], with logger levelinfo:Correct routing: all normal levels reach journalctl, silly SQL query output is excluded.
Impact on existing deployments
debug-%DATE%.log; no per-query Sequelize spam.Scope
Intentionally narrow:
commithandler. Otherconsole.trace(error)sites inorganization,project,units,offercontrollers andsync-registries*tasks remain for a follow-up sweep.Test plan
npm run testpasses locallyNODE_ENV=production, trigger a 400 fromPOST /v1/staging/commit(e.g. empty staging table) and verify:error.log/combined.logunder~/.chia/mainnet/cadt/v1/logs/contain a structured entry withlevel: error,message: POST /v1/staging/commit failed: ..., and the stackjournalctl -u cadtshows the same error on stderrNODE_ENV=production, confirm info/warn/verbose/debug lines appear injournalctlon stdout; silly-level Sequelize queries do NOTNODE_ENV!=production, confirm the pretty-printed colorized dev console output still works (no regression)Notes on the failing CI run
The v2 integration test failure (
V2 Issuance API - Basic CRUD Tests > DELETE /v2/issuance/:id > should stage issuance deletion with no children) is unrelated to this PR:NODE_ENV=test, notproduction, so this PR's Console transport change is not active during test runs.waitForV2DataLayerSync()followed by an issuance create + delete; the 400 is coming from one of the preconditions inv2 issuance-v2.controller.jsdestroy(assertV2IfReadOnlyMode/assertV2HomeOrgExists/assertNoPendingCommitsExcludingTransfers) — likely stale staging state from a preceding test.src/controllers/staging.controller.jschange only touches the v1 commit handler.