Skip to content

fix(API): capture staging commit errors in logs and stream all logs to journalctl#1592

Merged
TheLastCicada merged 5 commits into
v2-rc2from
fix/log-staging-commit-errors
Apr 23, 2026
Merged

fix(API): capture staging commit errors in logs and stream all logs to journalctl#1592
TheLastCicada merged 5 commits into
v2-rc2from
fix/log-staging-commit-errors

Conversation

@TheLastCicada

@TheLastCicada TheLastCicada commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Close two observability gaps:

  1. POST /v1/staging/commit 400 responses had no entry in the Winston JSON logs because the catch block used console.trace.
  2. In production, the logger had no Console transport at all, so logger.*(...) calls only hit on-disk log files — operators consuming logs via journalctl / pm2 / docker saw nothing from the application.

After this PR, error/warn/info/verbose/debug all go to both on-disk files and the process's stdout/stderr.

Changes

1. src/controllers/staging.controller.js — log commit failures via Winston

- } catch (error) {
-   console.trace(error);
+ } catch (error) {
+   logger.error(`POST /v1/staging/commit failed: ${error.message}`, {
+     stack: error.stack,
+   });
    res.status(400).json({ ... });
  }

Error message and stack now land in error.log / combined.log / application-*.log, and (because of change 2 below) also in journalctl.

2. src/config/logger.js — production Console transport at debug level

Previously no Console transport was added in production. Now:

versionLogger.add(
  new transports.Console({
    level: 'debug',
    stderrLevels: ['error'],
    format: format.combine(format.json()),
  }),
);

Behavior in production:

  • level: 'debug' — accepts error, warn, info, verbose, debug. Deliberately excludes silly, which is used in src/config/config.js for per-query Sequelize output and would flood journalctl on a busy instance.
  • In winston 3 a transport's level is independent of the logger's, so this guarantees debug-level coverage in journalctl regardless of APP.LOG_LEVEL.
  • Errors → process.stderr; warn/info/verbose/debug → process.stdout.
  • All existing File / DailyRotateFile transports are unchanged.

File-transport cleanup (consolidating to 3 rotated files and removing legacy unbounded files) is handled in #1597.

Verification

Smoke test against winston@3.19.0 Console transport with level: 'debug', stderrLevels: ['error'], with logger level info:

STDOUT: warn, info, verbose, debug
STDERR: error
FILTERED: silly

Correct routing: all normal levels reach journalctl, silly SQL query output is excluded.

Impact on existing deployments

  • Disk logs: unchanged.
  • journalctl / pm2 / docker: operators now see error/warn/info/verbose/debug where they previously saw nothing from the application.
  • Log volume: bounded to ~the same set of messages that already hit debug-%DATE%.log; no per-query Sequelize spam.

Scope

Intentionally narrow:

Test plan

  • npm run test passes locally
  • With NODE_ENV=production, trigger a 400 from POST /v1/staging/commit (e.g. empty staging table) and verify:
    • error.log / combined.log under ~/.chia/mainnet/cadt/v1/logs/ contain a structured entry with level: error, message: POST /v1/staging/commit failed: ..., and the stack
    • journalctl -u cadt shows the same error on stderr
  • With NODE_ENV=production, confirm info/warn/verbose/debug lines appear in journalctl on stdout; silly-level Sequelize queries do NOT
  • With NODE_ENV!=production, confirm the pretty-printed colorized dev console output still works (no regression)
  • HTTP response body of the 400 is unchanged

Notes on the failing CI run

The v2 integration test failure (V2 Issuance API - Basic CRUD Tests > DELETE /v2/issuance/:id > should stage issuance deletion with no children) is unrelated to this PR:

  • Tests run with NODE_ENV=test, not production, so this PR's Console transport change is not active during test runs.
  • The test uses a fixed 500 ms waitForV2DataLayerSync() followed by an issuance create + delete; the 400 is coming from one of the preconditions in v2 issuance-v2.controller.js destroy (assertV2IfReadOnlyMode / assertV2HomeOrgExists / assertNoPendingCommitsExcludingTransfers) — likely stale staging state from a preceding test.
  • The earlier commits of this PR passed the v2 integration tests, and the src/controllers/staging.controller.js change only touches the v1 commit handler.

The commit endpoint's catch block used console.trace, which writes to
stderr and never reaches the structured winston log files. This made
400 responses from POST /v1/staging/commit opaque in logs, forcing
operators to correlate with pm2/journald output to find the root
cause. Route the error through the existing logger so the reason
lands in combined.log and error.log alongside the request metadata.
With NODE_ENV=production, the logger had no Console transport, so
logger.error(...) calls only reached the on-disk log files and were
invisible to journalctl/pm2/docker. The previous console.trace fallback
in the staging commit handler was the only thing operators saw in
systemd for that endpoint.

Add a production-only Console transport at error level with stderrLevels
so errors are written to process.stderr and picked up by the process
supervisor, without pushing verbose/info/debug traffic into journalctl.
File transports are unchanged, so the full log stream still lives on
disk.
@TheLastCicada TheLastCicada changed the title fix(API): log staging commit errors via winston instead of stderr fix(API): log staging commit errors via winston and mirror to stderr Apr 20, 2026
The previous production Console transport was filtered to error level,
which kept journalctl quiet but left operators who consume logs via
journalctl/pm2/docker without the info/warn context around failures.

Drop the level filter so the transport inherits the logger's configured
level (APP.LOG_LEVEL, default info). Errors still go to stderr via
stderrLevels; everything else goes to stdout. File transports are
unchanged, so the full on-disk log files continue to be written.

Set LOG_LEVEL=debug in CADT config to also stream verbose/debug.
@TheLastCicada TheLastCicada changed the title fix(API): log staging commit errors via winston and mirror to stderr fix(API): capture staging commit errors in logs and stream all logs to journalctl Apr 20, 2026
Comment thread src/config/logger.js
stderrLevels: ['error'],
format: format.combine(format.json()),
}),
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing level: 'error' floods production console with info logs

High Severity

The production Console transport is missing the level: 'error' option that the PR description explicitly specifies. Without it, the transport inherits the logger's default level (getConfig().APP.LOG_LEVEL || 'info' from line 48), so all info/warn/error messages will be written to the console in production. The stderrLevels: ['error'] option only controls which output stream is used — it does not filter non-error messages. Info and warn traffic will flood stdout (journalctl/pm2/docker), contradicting the stated goal of "no new noise." The inline comment on lines 120–123 ("every log line") also reflects this unintended behavior.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0adf362. Configure here.

Winston 3 treats a transport's `level` independently from the logger's
level -- the logger's level is only the default for transports that
don't specify their own. Without setting a level on the new production
Console transport, it inherited the logger's level (APP.LOG_LEVEL,
default info), so verbose/debug lines never reached journalctl even
though the existing debug-%DATE%.log file captured them.

Set `level: 'silly'` on the Console transport so journalctl receives
every log line the app emits, regardless of APP.LOG_LEVEL. File
transports are untouched.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0f54cc6. Configure here.

Comment thread src/config/logger.js
Address Bugbot feedback on #1592. The previous 'silly' level meant the
per-query Sequelize logging in src/config/config.js (logger.silly) would
flood journalctl/stdout on busy instances, regardless of APP.LOG_LEVEL.
That contradicts the goal of giving operators a clean operational log
stream.

Tighten the transport to 'debug', which still guarantees that error,
warn, info, verbose, and debug reach journalctl (matching the
application-%DATE%.log file content), but excludes the silly SQL
trace. Operators who actually need per-query output can flip the
transport level locally.
@TheLastCicada TheLastCicada changed the base branch from develop to v2-rc2 April 23, 2026 22:17
@TheLastCicada TheLastCicada merged commit bb6ffba into v2-rc2 Apr 23, 2026
26 checks passed
@TheLastCicada TheLastCicada deleted the fix/log-staging-commit-errors branch April 23, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant