fix(copilot-sweep batch 1): security + correctness across this week's epics by jayzalowitz · Pull Request #226 · jayzalowitz/skytwin

jayzalowitz · 2026-05-08T23:17:40Z

Summary

Audit of Copilot review comments across this week's 15 epic PRs (#198,
#206-#215, #218, #219, #221, #222) found ~70 unresolved REAL issues. This
PR ships batch 1: the security-critical and obviously-broken items.
Lower-severity items (frontend nits, observability rollup architecture,
adaptive-prompt input mismatches, vault rotation polish, daemon scripts)
are scoped into themed follow-up PRs.

Security

Shell injection in `@skytwin/memory-gbrain` searchSemantic — switched
from `execSync` (shell) to `execFileSync` (no shell) so the user
query cannot inject (`$()`, backticks, `;`, etc.). Regression test
added.
`redactPII` and `redactPayload` skipped arrays — PII inside
array-of-object payloads (e.g. email recipients) leaked to provenance
and API responses. Both helpers now recurse arrays.
`GET /api/capabilities/suggestions` was spreading the full row,
returning raw `evidence_sources` (unredacted source signals) along
with the redacted preview. Switched to explicit projection.
Email-redaction regex `[A-Z|a-z]{2,}` matched literal `|` as TLD
char — fixed.

Correctness

Credential vault encryption never engaged in production: the only
`DbTokenStore` instantiation (in worker) never called `setKeyCache`,
so lazy migration never fired. Worker now owns a `KeyCache` and wires
it. Cross-process unlock IPC is a feat(#183 follow-up): credential vault encryption #212 follow-up.
DXT routes returned 400 unless `?userId=` was passed: `getUserId`
read `req.user?.id` but production middleware sets
`req.authenticatedUserId`. Field switched; tests updated to mirror
production middleware.
Twin MCP provenance only fired on success — invariant says every
external-agent tool call must be audited. Wrapped each handler in
try/finally; provenance failures never mask the original tool error.
Migration 027 used inline `INDEX (...) WHERE` inside `CREATE TABLE`,
which CockroachDB does not accept. Pulled partial indexes out as
standalone `CREATE INDEX IF NOT EXISTS` — idempotent.
Onboarding always recorded `first_run_choice='about-me'` regardless of
path. Track `_wizardState.firstRunChoice` on welcome-screen choice;
read downstream.
Briefing generator capped at `LIMIT 500` silently dropped users.
Switched to paged scan over `mcp_servers`.
Zero-trust helpers exist but aren't wired into the decision pipeline.
Updated CHANGELOG and capability-detail copy to be honest about the
current scope; feat(#183 AC#4 partial): zero-trust mode policy + UI #222 follow-up tracks the wiring.

Follow-up PRs (batch 2+, themed sweeps)

DXT route hardening (feat(#180 partial): DXT export/import scaffold #219): `sourceInstanceId` semantics, skills
always empty, registryId/display_name fallback, unreachable base64 try/catch,
`listForUser` loading blobs.
Vault rotation polish (feat(#183 follow-up): credential vault encryption #212, feat(#183 vault follow-up): credential vault passphrase rotation #214): `unpackEncrypted` length validation,
`updateAccessToken` stale-encrypted column, rotation rejecting null
refresh tokens, `/status` endpoint missing keyVersion/lastRotated, tx
error log omits the error.
Adaptive prompt sweep (feat(#189): adaptive judgment layer — wire runPrompt into 10 deterministic paths #207): 8+ `runPrompt` callers pass camelCase
keys vs snake_case prompt templates; tests use schema-invalid mocks.
Observability rollup architecture (feat(#183): Observability + audit trail + cost ceilings (3/5 ACs) #210): worker rolls up metrics
buffered in API process — empty rollup in production. Plus
`writeBucket` overwriting latency percentiles.
Daemon (feat(#191 partial): always-on service — headless daemon scaffold #218): `parseInt` NaN port, port:0 logged wrong,
systemd `StartLimit*` in wrong section, launchd path inconsistency,
Windows `fileURLToPath`.
Frontend nits (feat(#177): trust ceremony + provenance lineage + reverse capability flow + Twin Briefing #206, feat(#184): Provenance graph + multi-modal evidence + DXT transfer doc #211, feat(#184 follow-up): capability changelog flow + new-skill opt-in #213, feat(#191 partial): always-on service — headless daemon scaffold #218 changelog): `escapeHtml` on
`textContent`, double-badges after accept/reject, `high_autonomy` copy
branch, `_promotionSseWired` flag set before wire confirmed,
duplicate `[unreleased]` headings.

Test plan

`pnpm build --concurrency=1` clean
`pnpm test` — 68 turbo tasks, all pass
`pnpm lint` — 56 turbo tasks, all pass
New regression tests for shell injection, recursive PII redaction
(3 cases each), and `capabilities()` empty-when-not-installed
Manual: verify migration 027 applies cleanly on a fresh CRDB
(cannot do from this workspace; will verify in deploy)

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR delivers “batch 1” fixes from a cross-epic audit, focusing on security-critical and clearly broken behaviors across the memory-gbrain CLI integration, PII redaction, capability suggestions API, worker credential-vault wiring, MCP provenance auditing, CockroachDB migration compatibility, onboarding telemetry, and briefing generation.

Changes:

Hardened security boundaries: removed shell execution from memory-gbrain search, fixed email-redaction regex, ensured PII redaction recurses into arrays, and prevented leaking raw evidence_sources from suggestions responses.
Fixed correctness gaps in production wiring: worker now wires a KeyCache into DbTokenStore, DXT routes now read req.authenticatedUserId, MCP provenance is written on both success and failure, onboarding records the correct first_run_choice, and briefing generation no longer silently caps at 500 users.
Improved migration/runtime honesty: adjusted migration 027 partial-index creation for CRDB compatibility and updated user-facing copy / CHANGELOG to reflect current zero-trust wiring status.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
pnpm-lock.yaml	Adds workspace linkage for `@skytwin/credential-vault` dependency.
packages/memory-gbrain/src/gbrain-port.ts	Switches to `execFileSync` for shell-injection safety; returns empty capabilities when gbrain isn’t installed.
packages/memory-gbrain/src/tests/gbrain-port.test.ts	Updates mocks for `execFileSync` and adds regression coverage for argv-based invocation.
packages/db/src/migrations/027-capability-acquisition.sql	Moves partial indexes to standalone `CREATE INDEX IF NOT EXISTS` statements for CRDB compatibility.
CHANGELOG.md	Documents the batch-1 sweep and clarifies zero-trust enforcement status.
apps/worker/src/jobs/briefing-generator.ts	Replaces hard LIMIT 500 with paging over active users.
apps/worker/src/index.ts	Wires a worker-local `KeyCache` into `DbTokenStore` for vault decrypt/lazy migration readiness.
apps/worker/package.json	Adds `@skytwin/credential-vault` dependency.
apps/web/public/js/pages/onboarding.js	Persists and reuses welcome-screen first-run choice instead of hard-coding `'about-me'`.
apps/web/public/js/pages/capability-detail.js	Updates copy to accurately reflect current zero-trust enforcement status.
apps/twin-mcp-server/src/server.ts	Ensures external-agent provenance writes fire via `try/finally` and don’t mask tool errors.
apps/twin-mcp-server/src/audit/provenance-writer.ts	Fixes recursive PII redaction to traverse arrays and nested structures.
apps/twin-mcp-server/src/tests/auth.test.ts	Adds regression coverage for array-of-object PII redaction.
apps/api/src/routes/dxt.ts	Fixes user identity extraction to prefer `req.authenticatedUserId` and fall back safely.
apps/api/src/routes/capabilities.ts	Fixes recursive redaction for arrays, tightens suggestions response projection, fixes email regex.
apps/api/src/tests/provenance-graph-routes.test.ts	Updates expectations and adds coverage for array redaction recursion.
apps/api/src/tests/dxt-routes.test.ts	Aligns tests with production auth shape (`req.authenticatedUserId`).

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 );
+-- Partial index — pulled out of CREATE TABLE because some CRDB versions
+-- choke on inline `INDEX (...) WHERE ...`. Standalone form (matches the
+-- pattern in 011-sessions.sql).


-  INDEX (user_id) WHERE status = 'pending'
+  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
 );
+-- Pulled out of CREATE TABLE for the same CRDB-compat reason as above.


+  let offset = 0;
+  for (;;) {
+    const result = await query<{ user_id: string }>(
+      `SELECT DISTINCT user_id
+       FROM mcp_servers
+       WHERE status IN ('active', 'installed', 'authorized')
+       ORDER BY user_id
+       LIMIT $1 OFFSET $2`,
+      [BRIEFING_USER_PAGE_SIZE, offset],
+    );
+    if (result.rows.length === 0) break;
+    for (const row of result.rows) allIds.push(row.user_id);
+    if (result.rows.length < BRIEFING_USER_PAGE_SIZE) break;
+    offset += BRIEFING_USER_PAGE_SIZE;


+ * `?userId=` then to a legacy `req.user.id`. Other route modules use the same
+ * order — keep them in sync.


+import { execFileSync } from 'node:child_process';
 import { isGbrainInstalled } from '../cli-detector.js';
 import { GbrainMemoryPort, NotImplementedError } from '../gbrain-port.js';
 import type { SemanticHit } from '@skytwin/memory-port';

-const mockExecSync = execSync as ReturnType<typeof vi.fn>;
+const mockExecSync = execFileSync as ReturnType<typeof vi.fn>;
 const mockIsInstalled = isGbrainInstalled as ReturnType<typeof vi.fn>;


…ents on #226 Per the new merge gate (CLAUDE.md), every Copilot inline comment is addressed before merge. Five comments on #226, all valid: - Migration 027 idempotency: if the inline `INDEX (...) WHERE ...` form ever ran successfully (CRDB version-dependent), it would have created an auto-named partial index. Added defensive `DROP INDEX IF EXISTS <table>@<auto-name>` for both mcp_servers_last_active_at_idx and app_suggestions_user_id_idx so a re-run doesn't leave duplicate partial indexes covering the same predicate. - briefing-generator: switched OFFSET pagination to keyset pagination (`AND user_id > $last`). OFFSET pagination on DISTINCT scales quadratically with user count — the briefing job's runtime would grow unboundedly. Keyset stays linear and the (user_id, status) index serves the range scan directly. - DXT route docstring: removed the "other route modules use the same order" claim — it was misleading (other routes still vary in precedence). Now just notes a shared-helper #226 follow-up if the ordering proves load-bearing. - gbrain-port test: renamed `mockExecSync` → `mockExecFileSync` so the variable name matches the API under test (execFileSync). Tests: all 421 api + 48 worker + 20 memory-gbrain tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Per CLAUDE.md "Review Discipline": post-/review fixes get their own commits AND a CHANGELOG subsection so the audit trail of "what review caught" is visible to future readers. VERSION not bumped — this batch lands as part of the stacked sweep (#226 → #232); a single VERSION bump will follow when the full chain merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… epics Audited Copilot review comments across PRs #198, #206-#215, #218, #219, Themed follow-ups will land separately. Security - Shell injection in @skytwin/memory-gbrain searchSemantic (#215): switch from execSync with shell to execFileSync (no shell), so query metacharacters cannot inject. Added regression test. - redactPII / redactPayload skipped arrays — PII in array-of-object payloads leaked to provenance and API responses (#209, #210, #211). Both helpers now recurse arrays. Tests updated (one previously asserted the bug as expected behavior). - /api/capabilities/suggestions spread the row, leaking raw evidence_sources alongside the redacted preview (#211). Switched to explicit field projection. - Email-redaction regex [A-Z|a-z] matched literal | as TLD char (#211). Fixed to [A-Za-z]. Correctness - Credential vault never engaged: only the worker creates DbTokenStore but never called setKeyCache, so at-rest encryption + lazy migration were dead weight (#212). Worker now owns a KeyCache and wires it; cross-process unlock IPC is a #212 follow-up. - DXT routes broken under real auth: getUserId read req.user?.id but production middleware sets req.authenticatedUserId (#219). Switched field; tests updated to mirror production middleware. - Twin MCP provenance only fired on success (#209). Wrapped each tool handler in try/finally so audit fires for both success and failure; provenance failures never mask the underlying tool result. - Migration 027 used inline INDEX (...) WHERE inside CREATE TABLE, which CockroachDB does not accept (#198). Pulled the partial indexes out as standalone CREATE INDEX IF NOT EXISTS — idempotent. - Onboarding hard-coded first_run_choice='about-me' at three call sites (#208). Track _wizardState.firstRunChoice on welcome-screen selection, read everywhere downstream. - Briefing generator capped at LIMIT 500 silently dropped users past the cap (#206). Switched to a 500-row paged scan. - Zero-trust mode helpers exist but aren't wired into the decision pipeline (#222). Updated CHANGELOG and capability-detail copy to be honest; #222 follow-up tracks the wiring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ents on #226 Per the new merge gate (CLAUDE.md), every Copilot inline comment is addressed before merge. Five comments on #226, all valid: - Migration 027 idempotency: if the inline `INDEX (...) WHERE ...` form ever ran successfully (CRDB version-dependent), it would have created an auto-named partial index. Added defensive `DROP INDEX IF EXISTS <table>@<auto-name>` for both mcp_servers_last_active_at_idx and app_suggestions_user_id_idx so a re-run doesn't leave duplicate partial indexes covering the same predicate. - briefing-generator: switched OFFSET pagination to keyset pagination (`AND user_id > $last`). OFFSET pagination on DISTINCT scales quadratically with user count — the briefing job's runtime would grow unboundedly. Keyset stays linear and the (user_id, status) index serves the range scan directly. - DXT route docstring: removed the "other route modules use the same order" claim — it was misleading (other routes still vary in precedence). Now just notes a shared-helper #226 follow-up if the ordering proves load-bearing. - gbrain-port test: renamed `mockExecSync` → `mockExecFileSync` so the variable name matches the API under test (execFileSync). Tests: all 421 api + 48 worker + 20 memory-gbrain tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Per CLAUDE.md "Review Discipline": post-/review fixes get their own commits AND a CHANGELOG subsection so the audit trail of "what review caught" is visible to future readers. VERSION not bumped — this batch lands as part of the stacked sweep (#226 → #232); a single VERSION bump will follow when the full chain merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 8, 2026 23:17

Copilot started reviewing on behalf of jayzalowitz May 8, 2026 23:18 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

jayzalowitz force-pushed the jayzalowitz/audit-copilot-comments branch from 3b28a89 to 4eda18d Compare May 9, 2026 02:19

jayzalowitz and others added 3 commits May 8, 2026 22:32

jayzalowitz force-pushed the jayzalowitz/audit-copilot-comments branch from 86e8d04 to fc40c27 Compare May 9, 2026 02:32

jayzalowitz merged commit f45cb3d into main May 9, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(copilot-sweep batch 1): security + correctness across this week's epics#226

fix(copilot-sweep batch 1): security + correctness across this week's epics#226
jayzalowitz merged 3 commits into
mainfrom
jayzalowitz/audit-copilot-comments

jayzalowitz commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		* `?userId=` then to a legacy `req.user.id`. Other route modules use the same
		* order — keep them in sync.

Conversation

jayzalowitz commented May 8, 2026

Summary

Security

Correctness

Follow-up PRs (batch 2+, themed sweeps)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants