feat(API): add /diagnostics endpoint for system-wide debug snapshot#1627
Merged
Conversation
Adds GET /diagnostics, a single JSON endpoint that summarizes the running CADT, Chia, DataLayer, and host machine state. The endpoint is intended for sysadmins debugging a CADT install: it reports CADT version, configured / actual chia network, wallet/full-node/datalayer reachability + sync status, wallet balance, trusted-peer cross-reference, DataLayer subscriptions with per-store sync status, governance body IDs, V1/V2 home org IDs, datalayer URLs, CADT config + database paths, CPU/RAM/disk numbers, a chia process scan, and a chia-tools probe. The endpoint is mounted on the root app (not /v1 or /v2) and lives in HEALTH_ENDPOINTS so it bypasses the rate limiter, startup gates, and the chia/datalayer assertions -- the whole point of a diagnostics endpoint is to be useful when those subsystems are broken. Every external call goes through a settle() wrapper with a per-call timeout and Promise.all fan-out, so one slow or wedged RPC can't block the rest of the response. Worst-case wall-clock is ~30s (subscription enumeration budget); healthy responses come back in well under a second. Authentication is enforced by the existing global API-key middleware (no duplicate check needed). When V1 or V2 READ_ONLY is set, the response is reduced to non-sensitive public data and short-circuits before the wallet / datalayer RPC fan-out, matching the precedent in wallet-health.js. Also adds isHealthEndpoint() skips to the wallet-synced, home-org-synced, and all-data-synced header middlewares so /diagnostics (and /health*) don't hang on the wallet RPC's 300s socket timeout or on waitForMigrations when the database layer is the broken subsystem.
…en network match Two fixes from bugbot review of #1627: 1. Move dynamic imports of wallet / fullNodeRpc / persistance / V1+V2 models / fullNode AFTER the read-only short-circuit. The PR description said the read-only path short-circuits before fetch, but the heavy imports were happening unconditionally, which (a) made public observer nodes load DB and wallet modules they don't need and (b) caused /diagnostics to fail in the exact scenarios it's designed to survive (e.g. V2 model module body failing to initialize). 2. Use exact equality (not substring containment) for the network match. `"testnet10".includes("testnet1")` is true, so the previous code reported a false match when the configured network was a prefix of the actual one. The diagnostics endpoint's job is to tell the truth; the existing assertChiaNetworkMatchInConfiguration still uses the substring rule, but the `actual` and `configured` fields in the response let operators spot whether the loose-match assertion would also have considered them equivalent.
The previous test only logged the response keys, which is useless for actually inspecting what /diagnostics returns. Pretty-print the full JSON body so the response is visible in every live-api workflow log -- there is no other sanitized artifact to eyeball the endpoint's real-environment output.
Defensive fix from bugbot review of #1627. Node's fs.statfs() currently only exposes bsize (libuv's uv_statfs_t doesn't copy frsize from the underlying statfs(2) syscall), and on Linux+ext4 'blocks * bsize' matches 'df -B1' byte-exactly. But POSIX statvfs denominates blocks in frsize, not bsize, and on exotic filesystems (e.g. VirtioFS on Docker) the two can differ. Using 'stats.frsize ?? stats.bsize' is a zero-cost hedge that automatically picks up frsize if a future Node version or a polyfill exposes it, while keeping today's behavior unchanged on every mainstream environment.
…tics
CADT's CHIA_NETWORK config is a binary mainnet-vs-testnet flag, not an
exact chia network name. Cross-referenced against every use in the
codebase:
- defaultConfig.js sets it to 'mainnet'
- config-loader.js forces it to the literal string 'testnet' when
USE_SIMULATOR=true, regardless of the actual underlying network
- coin-management.js branches on '=== mainnet ? XCH : TXCH'
- data-assertions.js accepts any chia network whose name contains
CHIA_NETWORK as a substring
The CI run on commit 286fcbb confirmed the previous strict-equality
check gave the wrong answer in the real world: chia reported 'testneta',
CADT config was 'testnet', diagnostics reported matches:false even
though CADT itself treats them as a match.
Normalize both sides to mainnet|testnet before comparing. This is both
strictly more correct than the original substring rule (no
testnet1/testnet10 false positive) and operationally aligned with how
the rest of CADT interprets CHIA_NETWORK.
The existing assertChiaNetworkMatchInConfiguration still uses the
substring rule -- harmonising the assertion with this normalised
comparison is a sensible follow-up but is left out of this PR to keep
the scope tight.
The syncMode ternary used a truthy check for state.sync?.synced while the synced field on the same line used === true. A non-boolean truthy value (e.g. 1) would produce synced: false alongside syncMode: 'synced'.
- Promote network match to top-level "network" key with "chia"/"cadt" sub-keys instead of "chia.network.actual"/"configured" - Rename "processes" to "runningProcesses", drop redundant "supported" and "platform" fields (already in system section) - Add "percentUsed" to both memory and disk sections - Rename disk "path" to "chiaRootPath" for clarity - Add "runningLocally" to fullNode section; skip full-node RPC calls when process scan confirms no local chia_full_node (falls back to optimistic probe when scan is unsupported or fails)
The diagnostics route handler used strict === true while the rest of middleware uses truthy checks (|| false). A non-boolean truthy config value (e.g. 1, "true") would bypass the read-only protection and serve the full response with sensitive fields.
Query the wallet RPC get_version endpoint and surface the installed Chia version in chia.version (null when the wallet is unreachable).
Remove the internal try/catch so RPC failures surface in the settle() debug log, consistent with every other wallet RPC function.
scanReliable only checked for processesValue.note (Windows) but not
processesValue.error (ps failure in containers). A caught ps failure
returns { matches: [], error } without a note, causing the scan to be
incorrectly treated as reliable and skipping full-node RPC probes.
getWalletConnections catches errors internally and returns
{ success: false, error } instead of throwing, so settle() always
sees ok: true. Check value.success before treating connections as
valid, and surface the inner error when the wallet call failed.
…eRpc Restore try-catch in getChiaVersion so it is safe for callers outside of settle(). Add explicit NODE_TLS_REJECT_UNAUTHORIZED to fullNodeRpc.js so it does not rely on wallet.js import order as a side effect.
…pping fields Return 403 when READ_ONLY is set rather than serving a reduced response. Removes buildReadOnlyResponse and the readOnly parameter from getDiagnosticsResponse since the middleware now gates access.
getWalletHealthResponse already calls walletIsSynced internally, so the parallel direct call doubled RPC load and raced on the module-level lastWalletSyncError variable (each call clears it to null). Derive synced from the wallet health response instead.
Add ok/warning/critical status with messages to diagnostics sections: disk, memory, cpu, chiaTools, datalayer, fullNode, wallet, network. Remove redundant services section. Log full diagnostics JSON at startup (fire-and-forget) so READ_ONLY nodes also have a baseline snapshot.
getDiagnosticsResponse handles all subsystem failures internally via settle(), so an exception reaching the outer catch is a genuine bug that should be surfaced to monitoring tools as a 500.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 28b0a1e. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Adds a new top-level
GET /diagnosticsendpoint that returns a single JSON object summarizing CADT, Chia, DataLayer, and host machine state — intended for sysadmins and users debugging a CADT install. Mounted as a sibling of/health(not under/v1or/v2); branched offv2-rc2.What the response includes
{ enabled, readOnly, isGovernanceBody, apiKeyConfigured, governanceBodyId, homeOrgId }.get_network_info) and amatchesflag.rpcUrl,reachable,connectionError(string when unreachable, including forAggregateErrors with empty messages),synced,balanceXch, pending-transactions block (reuseswallet-health.jsshaping), trusted-peer cross-reference againstwallet.trusted_peersfrom chia'sconfig.yaml.get_blockchain_state+get_connections) — degrades toreachable: falseif the certs aren't present.syncedflag (Number.isFiniteguard soundefined === undefineddoesn't falsely report synced); bounded-concurrency worker pool with a 30 s wall-clock budget that emitstruncated: trueif cut off.walletReachable/fullNodeReachable/datalayerReachable.chia-tools --version(thenversionas fallback) with a 5 s timeout; distinguishes "not installed" from "installed but broken".ps -eoscan for chia binaries with multi-version detection; reportssupported: falseon Windows.os.cpus()), total/free RAM, free/total disk on the partition holding${CHIA_ROOT}(viafs.promises.statfs).Design properties
settle(label, producer, timeoutMs)that races against a hard wall-clock deadline. Failures become{ ok: false, error }rather than throwing; one wedged subsystem can't block the rest of the response. Worst-case wall-clock ~30 s; healthy responses come back in well under a second./diagnosticslives inHEALTH_ENDPOINTS, andisHealthEndpointskips were added to the wallet-synced, home-org-synced, and all-data-synced header middlewares so the endpoint isn't blocked by the wallet RPC's 300 s socket timeout or bywaitForMigrations()when the DB layer itself is the broken subsystem.src/routes/wallet-health.jsso a public observer node doesn't make per-request authenticated wallet RPC calls on unauthenticated public hits.Files
src/routes/diagnostics.jssettle,collectSubscriptions,buildTrustedPeerView,normalizeNodeIdsrc/utils/system-info.jsos) + disk (fs.promises.statfs)src/utils/chia-process-scan.jspsscan + chia-binary regexsrc/utils/chia-tools-probe.jschia-tools --version/versionfallback,extractVersionregexsrc/datalayer/fullNodeRpc.jsget_blockchain_state/get_connectionssrc/datalayer/wallet.jsgetWalletConnections()for trusted-peer cross-referencesrc/middleware.jsGET /diagnostics, adds it toHEALTH_ENDPOINTS, adds health-endpoint skip to three header middlewarestests/v2/integration/diagnostics.spec.jstests/v2/integration/diagnostics-helpers.spec.jstests/v2/live-api/wallet-health.live.spec.jsTest plan
npm run test:v1— 146 passing, 0 failingnpm run test:v2— 1619 passing, 0 failing (includes 46 new diagnostics tests)npm run test:v2:live:wallet-healthagainst a real CADT + Chia install (runs in CI; the live test added a/diagnosticsdescribe block that asserts response shape, types, and reasonable value ranges without pinning specific environment values).Matrix coverage (READ_ONLY × API key configured × key provided)
accepts requests with the correct x-api-key headerrejects requests with a wrong-length x-api-key headerrejects requests with a wrong-but-equal-length x-api-key headerrejects unauthenticated requests when CADT_API_KEY is configuredpublic observer node (READ_ONLY=true, no API key): returns 200 with reduced fields without authREAD_ONLY=true + API key configured + correct key: returns 200 with reduced fieldsREAD_ONLY=true + API key configured + wrong key: rejects with 403READ_ONLY=true + API key configured + no key provided: rejects with 403Manual smoke check (suggested for reviewers)
Note
Medium Risk
Adds a new unauthenticated-by-default (unless API key configured) diagnostics surface that executes local process/CLI probes and multiple RPC calls; mistakes could leak sensitive operational data or add startup/runtime latency despite timeouts and read-only gating.
Overview
Adds a new top-level
GET /diagnosticsendpoint that returns a single JSON snapshot of CADT config/state plus Chia wallet/full-node/datalayer reachability and host system metrics, with per-sectionok/warning/criticalstatus aggregation and hard timeouts to degrade gracefully.To support this, introduces new helpers for OS info (
system-info.js), local Chia process scanning (chia-process-scan.js),chia-toolsdetection (chia-tools-probe.js), and a minimal mTLS full-node RPC client (fullNodeRpc.js), plus extendswallet.jswithget_versionandget_connectionshelpers.Middleware now treats
/diagnosticsas a health-style endpoint (bypassing rate limits, startup gates, and synced-header probes) while explicitly blocking the route whenREAD_ONLYis enabled; startup additionally logs a fire-and-forget diagnostics snapshot after migrations complete. Tests add broad unit/integration/live-api coverage for response shape, auth gating, and helper edge cases.Reviewed by Cursor Bugbot for commit 28b0a1e. Bugbot is set up for automated code reviews on this repo. Configure here.