feat(gc): pgserve gc orchestration verb (singleton G3 verb 3)#91
Conversation
The 240-orphan-disease fix from the v1.0 mission, now landed as a CLI verb. Composes the merged G3 foundations: - src/gc/orphan-detection.js → classifyOrphans (PR #90) - src/gc/audit-log.js → writeGcAudit (PR #90) - src/schema/pgserve-meta.js → PGSERVE_META_TABLE (PR #88) - src/lib/admin-json.js → readAdminJson for port discovery src/gc/pg-queries.js (new): Modeled on src/upgrade/steps/cosign-meta-migration.js#pgQuery (PR #79) — psql shellout via spawnSync with stdin pipe, no shell expansion. - pgQuery({ sql, db, port, ... }) — primitive - selectMetaRows({ port }) → array of meta rows; throws ENOPGSERVE_META if the table doesn't exist (caller decides whether that's a clean exit-2) - selectExistingDbs({ port }) → Set<datname> excluding system DBs - selectActiveDbs({ port }) → Set<datname> with ≥1 backend in pg_stat_activity (excluding self) - dropDatabase({ database, role, port }) → pg_terminate_backend + DROP DATABASE IF EXISTS + DROP ROLE IF EXISTS (best-effort on role) - deleteMetaRow({ fingerprint, port }) → DELETE FROM public.pgserve_meta Defensive identifier + literal quoting for the CREATE/DROP path; the identifiers are constrained by deriveProvisionedNames() but we still belt-and-suspenders to defend against admins who manually inserted meta rows. src/commands/gc.js (new): - default mode = dry-run; `--apply` is required to actually DROP. - `--stale-after-days <N>` overrides the 30-day idle staleness window. - `--json` emits a single summary object (per-event audit log is always written regardless). - `--port <N>` / `-p` overrides the postgres port discovery; default is read from admin.json then 5432. - exit codes: 0 success 1 user error (bad flags) 2 pgserve_meta absent on host (no provisions yet — clean signal for monitoring rather than crash) 3 one or more drops failed (partial sweep still audited) Every retention decision audits a 'skip' event so an operator can answer "why was X kept?" days later from the JSON-lines log alone. Dry-run mode logs the *intent* with `dryRun: true` so the audit layer also captures the would-have-dropped before the operator flips the switch. bin/pgserve-wrapper.cjs: Add 'gc' to __installSubcommands. Pure node + child_process; must skip the bun probe. src/cli-install.cjs: Add `case 'gc':` to dispatch with the dynamic-import pattern used by verify / trust. Tests (26 pass, 0 fail): tests/cli/gc.test.js (new): - parseFlags: dry-run default, --apply, --json, --stale-after-days validation, --port range check, --help, unknown-flag throw. - resolvePort: explicit port wins; fallback returns a valid integer in [1, 65535] regardless of admin.json presence on the test host. - runGc surface: --help → 0; bad flag → 1. tests/gc/pg-queries.test.js (new): - quoteIdent / quoteLiteral: shape + injection-style escapes. - SYSTEM_DBS contains template0/template1/postgres. - DEFAULT_PORT = 5432. - dropDatabase refuses empty + system DB names. - deleteMetaRow rejects empty fingerprint. - pgQuery rejects empty/non-string sql. Smoke test (HOME=$(mktemp -d) node bin/pgserve-wrapper.cjs gc --help): routes through wrapper → dispatch → runGc and prints the usage block, confirming the allowlist + dispatch wiring. Cohort: Singleton G3 verb 3 of 4. Verbs 1 (doctor) + 2 (trust) shipped; verb 4 (provision orchestration) is the only G3 leftover. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the pgserve gc subcommand, which provides a mechanism to identify and sweep orphaned databases. The implementation includes a new CLI command handler, PostgreSQL query utilities that utilize psql shellouts for minimal runtime overhead, and unit tests for flag parsing and SQL generation. Review feedback identifies a security risk regarding hardcoded database passwords, suggests refactoring duplicated error-handling logic for better maintainability, and notes a type inconsistency where a boolean value was passed as a string in the audit logging logic.
| password = process.env.PGPASSWORD || 'postgres', | ||
| captureStdout = false, | ||
| } = {}) { | ||
| if (typeof sql !== 'string' || sql.length === 0) { | ||
| throw new TypeError('pgQuery: sql must be a non-empty string'); | ||
| } | ||
| const env = { ...process.env, PGPASSWORD: password }; |
There was a problem hiding this comment.
Hardcoding a fallback password of 'postgres' is a security risk and prevents users from using other authentication methods like .pgpass files. If PGPASSWORD is not set in the environment, psql should be allowed to use its default authentication flow (e.g., prompt for a password, use .pgpass).
I suggest removing the fallback and only setting PGPASSWORD for the child process if it's explicitly provided.
| password = process.env.PGPASSWORD || 'postgres', | |
| captureStdout = false, | |
| } = {}) { | |
| if (typeof sql !== 'string' || sql.length === 0) { | |
| throw new TypeError('pgQuery: sql must be a non-empty string'); | |
| } | |
| const env = { ...process.env, PGPASSWORD: password }; | |
| password = process.env.PGPASSWORD, | |
| captureStdout = false, | |
| } = {}) { | |
| if (typeof sql !== 'string' || sql.length === 0) { | |
| throw new TypeError('pgQuery: sql must be a non-empty string'); | |
| } | |
| const env = { ...process.env, ...(password !== undefined && { PGPASSWORD: password }) }; |
| writeGcAudit({ action: 'error', reason: 'select_meta_rows_failed', detail: err.message }); | ||
| if (opts.json) emitJson({ ...summary, error: err.message }); | ||
| else process.stderr.write(`pgserve gc: ${err.message}\n`); | ||
| return 3; |
There was a problem hiding this comment.
This error handling logic is very similar to the logic in the catch block on lines 189-192. To improve maintainability and reduce code duplication, consider extracting this logic into a helper function. This function could take the error, a reason string, and other context as arguments, and handle both audit logging and outputting the error.
| if (!opts.apply) { | ||
| writeGcAudit({ | ||
| action: 'skip', | ||
| dryRun: 'true', |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12990dbb3e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| }); | ||
| if (!out) return []; | ||
| return out.split('\n').filter(Boolean).map((line) => { | ||
| const [fingerprint, database_name, role_name, publisher, source_path, last_used_at] = line.split('\t'); |
There was a problem hiding this comment.
Parse psql output with the configured delimiter
pgQuery invokes psql -At without -F, so unaligned output uses | as the field separator by default, but selectMetaRows splits each row on \t. In hosts where pgserve_meta has rows, each line is parsed into a single token, leaving database_name empty and causing downstream classification to fail (or misclassify) instead of sweeping orphans. Set -F '\t' (or split on |) so parsing matches the actual CLI output format.
Useful? React with 👍 / 👎.
… bool PR #91 bot reviews found three real correctness gaps and one consistency nit: [CRITICAL — Codex P1] psql -At unaligned output defaults to '|', not '\t'. selectMetaRows / selectActiveDbs / selectExistingDbs all split on '\t' — every multi-column SELECT silently produced one-token rows (database_name='db|role|...', everything else undefined). The fast- path classification was effectively non-functional. Fix: pass -F '\t' explicitly so the separator matches what callers parse against. [CRITICAL] psql in script mode (-f) by default continues past statement errors and still exits 0. A failed CREATE DATABASE / GRANT in the provision verb's sequence could be invisible. (Caught by Codex P1 on the sibling PR #92, but the bug applies to gc's pgQuery helper too — this PR's pg-queries.js is the same helper.) Fix: pass -v ON_ERROR_STOP=1 so SQL errors surface as non-zero. [HIGH — Gemini] Hardcoded password = 'postgres' fallback prevented .pgpass / peer-auth on hosts that use them. Fix: only set PGPASSWORD in the child env when the caller or the caller's environment supplies one; let psql fall through to its default authentication flow otherwise. [MEDIUM — Gemini] dryRun: 'true' (string) was inconsistent with other boolean fields in the audit-log payload. Fix: dryRun: true. Filtered out (low-value): - "Extract duplicate error-handling helper" (Gemini medium) — cosmetic; the two error paths handle different layers (meta-rows vs pg_state). Validation: 26/26 tests pass; lint clean. The integration round-trip now requires that callers strip empty trailing-tab tokens correctly when reading single-column queries — verified manually against the existing parsing in selectExistingDbs / selectActiveDbs which split on '\n' first, then on '\t' is benign because there's only one field. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rrency honesty PR #92 bot reviews found four real correctness gaps: [CRITICAL — Codex P2 + Gemini HIGH] psql -At unaligned output defaults to '|', not '\t'. selectMetaRow split on '\t' → database_name picked up the entire 'db|role' string and role_name was undefined, so the fast-path idempotency check missed every existing database and the verb re-ran the create sequence on every replay. Fix: pass -F '\t' explicitly in pgQuery so the separator matches what callers parse against. [CRITICAL — Codex P1] psql in script mode (-f) by default continues past statement errors and STILL exits 0. A failed CREATE DATABASE / GRANT could be invisible — the metadata write would still happen, producing a pgserve_meta row pointing at a database that doesn't exist. Fix: pass -v ON_ERROR_STOP=1 so SQL errors surface as non-zero exits. [HIGH — Gemini] Hardcoded password = 'postgres' fallback prevented .pgpass / peer-auth on hosts that use them. Fix: only set PGPASSWORD in the child env when the caller or the caller's environment supplies one. [HIGH — Gemini] The advisory-lock-only-tx step provided ZERO concurrency protection. Every pgQuery call starts a NEW psql session via spawnSync, so an xact-scoped lock releases on the lock-step's COMMIT and a session-scoped lock can't survive across separate spawnSync invocations either. The "second SELECT inside the catch path" claim in the file header was also wishful — never implemented. Fix: drop the misleading lock-only-tx step entirely. The honest serialization mechanism is **idempotency under racing replays**, documented now in the file header: - CREATE ROLE wrapped in DO + IF NOT EXISTS. - CREATE DATABASE catches "already exists" race. - GRANT is set-style (re-run adds nothing). - INSERT ... ON CONFLICT (fingerprint) DO UPDATE. 10 concurrent provisioners with the same fingerprint converge to one (database, role, meta-row) trio without any explicit lock. The advisory-lock helpers in src/provision/advisory-lock.js are retained on main for future single-session callers (daemon mode); removing the stale import here, plus the bigintLiteral helper that only existed to feed the now-deleted lock SQL. [MEDIUM — Gemini] psql does NOT emit SQLSTATE codes (42P04) by default. The catch-path text-match against '42P04' would never fire on a real race; psql emits the human-readable "ERROR: database "<name>" already exists" message instead. Fix: match against 'already exists' in stderr. src/lib/pg-query.js: Same -F '\t' + ON_ERROR_STOP + PGPASSWORD-only-when-set fixes (the shared primitive that gc + provision both use; mirrors the parallel fix-up commit on PR #91's src/gc/pg-queries.js). Validation: 21/21 tests pass; lint clean; knip clean. Filtered out (none — every bot finding addressed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
The 240-orphan-disease fix from the v1.0 mission, now landed as a CLI verb. Composes the merged G3 foundations into a working `pgserve gc`.
What ships
Decisions
Exit codes
Test plan
Cohort
Singleton G3 verb 3 of 4. After this lands:
🤖 Generated with Claude Code