Skip to content

fix(copilot-sweep batch 1): security + correctness across this week's epics#226

Merged
jayzalowitz merged 3 commits into
mainfrom
jayzalowitz/audit-copilot-comments
May 9, 2026
Merged

fix(copilot-sweep batch 1): security + correctness across this week's epics#226
jayzalowitz merged 3 commits into
mainfrom
jayzalowitz/audit-copilot-comments

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

Audit of Copilot review comments across this week's 15 epic PRs (#198,
#206-#215, #218, #219, #221, #222) found ~70 unresolved REAL issues. This
PR ships batch 1: the security-critical and obviously-broken items.
Lower-severity items (frontend nits, observability rollup architecture,
adaptive-prompt input mismatches, vault rotation polish, daemon scripts)
are scoped into themed follow-up PRs.

Security

  • Shell injection in `@skytwin/memory-gbrain` searchSemantic — switched
    from `execSync` (shell) to `execFileSync` (no shell) so the user
    query cannot inject (`$()`, backticks, `;`, etc.). Regression test
    added.
  • `redactPII` and `redactPayload` skipped arrays — PII inside
    array-of-object payloads (e.g. email recipients) leaked to provenance
    and API responses. Both helpers now recurse arrays.
  • `GET /api/capabilities/suggestions` was spreading the full row,
    returning raw `evidence_sources` (unredacted source signals) along
    with the redacted preview. Switched to explicit projection.
  • Email-redaction regex `[A-Z|a-z]{2,}` matched literal `|` as TLD
    char — fixed.

Correctness

  • Credential vault encryption never engaged in production: the only
    `DbTokenStore` instantiation (in worker) never called `setKeyCache`,
    so lazy migration never fired. Worker now owns a `KeyCache` and wires
    it. Cross-process unlock IPC is a feat(#183 follow-up): credential vault encryption #212 follow-up.
  • DXT routes returned 400 unless `?userId=` was passed: `getUserId`
    read `req.user?.id` but production middleware sets
    `req.authenticatedUserId`. Field switched; tests updated to mirror
    production middleware.
  • Twin MCP provenance only fired on success — invariant says every
    external-agent tool call must be audited. Wrapped each handler in
    try/finally; provenance failures never mask the original tool error.
  • Migration 027 used inline `INDEX (...) WHERE` inside `CREATE TABLE`,
    which CockroachDB does not accept. Pulled partial indexes out as
    standalone `CREATE INDEX IF NOT EXISTS` — idempotent.
  • Onboarding always recorded `first_run_choice='about-me'` regardless of
    path. Track `_wizardState.firstRunChoice` on welcome-screen choice;
    read downstream.
  • Briefing generator capped at `LIMIT 500` silently dropped users.
    Switched to paged scan over `mcp_servers`.
  • Zero-trust helpers exist but aren't wired into the decision pipeline.
    Updated CHANGELOG and capability-detail copy to be honest about the
    current scope; feat(#183 AC#4 partial): zero-trust mode policy + UI #222 follow-up tracks the wiring.

Follow-up PRs (batch 2+, themed sweeps)

Test plan

  • `pnpm build --concurrency=1` clean
  • `pnpm test` — 68 turbo tasks, all pass
  • `pnpm lint` — 56 turbo tasks, all pass
  • New regression tests for shell injection, recursive PII redaction
    (3 cases each), and `capabilities()` empty-when-not-installed
  • Manual: verify migration 027 applies cleanly on a fresh CRDB
    (cannot do from this workspace; will verify in deploy)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 8, 2026 23:17

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR delivers “batch 1” fixes from a cross-epic audit, focusing on security-critical and clearly broken behaviors across the memory-gbrain CLI integration, PII redaction, capability suggestions API, worker credential-vault wiring, MCP provenance auditing, CockroachDB migration compatibility, onboarding telemetry, and briefing generation.

Changes:

  • Hardened security boundaries: removed shell execution from memory-gbrain search, fixed email-redaction regex, ensured PII redaction recurses into arrays, and prevented leaking raw evidence_sources from suggestions responses.
  • Fixed correctness gaps in production wiring: worker now wires a KeyCache into DbTokenStore, DXT routes now read req.authenticatedUserId, MCP provenance is written on both success and failure, onboarding records the correct first_run_choice, and briefing generation no longer silently caps at 500 users.
  • Improved migration/runtime honesty: adjusted migration 027 partial-index creation for CRDB compatibility and updated user-facing copy / CHANGELOG to reflect current zero-trust wiring status.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pnpm-lock.yaml Adds workspace linkage for @skytwin/credential-vault dependency.
packages/memory-gbrain/src/gbrain-port.ts Switches to execFileSync for shell-injection safety; returns empty capabilities when gbrain isn’t installed.
packages/memory-gbrain/src/tests/gbrain-port.test.ts Updates mocks for execFileSync and adds regression coverage for argv-based invocation.
packages/db/src/migrations/027-capability-acquisition.sql Moves partial indexes to standalone CREATE INDEX IF NOT EXISTS statements for CRDB compatibility.
CHANGELOG.md Documents the batch-1 sweep and clarifies zero-trust enforcement status.
apps/worker/src/jobs/briefing-generator.ts Replaces hard LIMIT 500 with paging over active users.
apps/worker/src/index.ts Wires a worker-local KeyCache into DbTokenStore for vault decrypt/lazy migration readiness.
apps/worker/package.json Adds @skytwin/credential-vault dependency.
apps/web/public/js/pages/onboarding.js Persists and reuses welcome-screen first-run choice instead of hard-coding 'about-me'.
apps/web/public/js/pages/capability-detail.js Updates copy to accurately reflect current zero-trust enforcement status.
apps/twin-mcp-server/src/server.ts Ensures external-agent provenance writes fire via try/finally and don’t mask tool errors.
apps/twin-mcp-server/src/audit/provenance-writer.ts Fixes recursive PII redaction to traverse arrays and nested structures.
apps/twin-mcp-server/src/tests/auth.test.ts Adds regression coverage for array-of-object PII redaction.
apps/api/src/routes/dxt.ts Fixes user identity extraction to prefer req.authenticatedUserId and fall back safely.
apps/api/src/routes/capabilities.ts Fixes recursive redaction for arrays, tightens suggestions response projection, fixes email regex.
apps/api/src/tests/provenance-graph-routes.test.ts Updates expectations and adds coverage for array redaction recursion.
apps/api/src/tests/dxt-routes.test.ts Aligns tests with production auth shape (req.authenticatedUserId).
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

);
-- Partial index — pulled out of CREATE TABLE because some CRDB versions
-- choke on inline `INDEX (...) WHERE ...`. Standalone form (matches the
-- pattern in 011-sessions.sql).
INDEX (user_id) WHERE status = 'pending'
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Pulled out of CREATE TABLE for the same CRDB-compat reason as above.
Comment on lines +53 to +66
let offset = 0;
for (;;) {
const result = await query<{ user_id: string }>(
`SELECT DISTINCT user_id
FROM mcp_servers
WHERE status IN ('active', 'installed', 'authorized')
ORDER BY user_id
LIMIT $1 OFFSET $2`,
[BRIEFING_USER_PAGE_SIZE, offset],
);
if (result.rows.length === 0) break;
for (const row of result.rows) allIds.push(row.user_id);
if (result.rows.length < BRIEFING_USER_PAGE_SIZE) break;
offset += BRIEFING_USER_PAGE_SIZE;
Comment thread apps/api/src/routes/dxt.ts Outdated
Comment on lines +26 to +27
* `?userId=` then to a legacy `req.user.id`. Other route modules use the same
* order — keep them in sync.
Comment on lines +12 to 18
import { execFileSync } from 'node:child_process';
import { isGbrainInstalled } from '../cli-detector.js';
import { GbrainMemoryPort, NotImplementedError } from '../gbrain-port.js';
import type { SemanticHit } from '@skytwin/memory-port';

const mockExecSync = execSync as ReturnType<typeof vi.fn>;
const mockExecSync = execFileSync as ReturnType<typeof vi.fn>;
const mockIsInstalled = isGbrainInstalled as ReturnType<typeof vi.fn>;
jayzalowitz added a commit that referenced this pull request May 9, 2026
…ents on #226

Per the new merge gate (CLAUDE.md), every Copilot inline comment is
addressed before merge. Five comments on #226, all valid:

- Migration 027 idempotency: if the inline `INDEX (...) WHERE ...` form
  ever ran successfully (CRDB version-dependent), it would have created
  an auto-named partial index. Added defensive
  `DROP INDEX IF EXISTS <table>@<auto-name>` for both
  mcp_servers_last_active_at_idx and app_suggestions_user_id_idx so a
  re-run doesn't leave duplicate partial indexes covering the same
  predicate.
- briefing-generator: switched OFFSET pagination to keyset pagination
  (`AND user_id > $last`). OFFSET pagination on DISTINCT scales
  quadratically with user count — the briefing job's runtime would
  grow unboundedly. Keyset stays linear and the (user_id, status) index
  serves the range scan directly.
- DXT route docstring: removed the "other route modules use the same
  order" claim — it was misleading (other routes still vary in
  precedence). Now just notes a shared-helper #226 follow-up if the
  ordering proves load-bearing.
- gbrain-port test: renamed `mockExecSync` → `mockExecFileSync` so the
  variable name matches the API under test (execFileSync).

Tests: all 421 api + 48 worker + 20 memory-gbrain tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz jayzalowitz force-pushed the jayzalowitz/audit-copilot-comments branch from 3b28a89 to 4eda18d Compare May 9, 2026 02:19
jayzalowitz added a commit that referenced this pull request May 9, 2026
…ents on #226

Per the new merge gate (CLAUDE.md), every Copilot inline comment is
addressed before merge. Five comments on #226, all valid:

- Migration 027 idempotency: if the inline `INDEX (...) WHERE ...` form
  ever ran successfully (CRDB version-dependent), it would have created
  an auto-named partial index. Added defensive
  `DROP INDEX IF EXISTS <table>@<auto-name>` for both
  mcp_servers_last_active_at_idx and app_suggestions_user_id_idx so a
  re-run doesn't leave duplicate partial indexes covering the same
  predicate.
- briefing-generator: switched OFFSET pagination to keyset pagination
  (`AND user_id > $last`). OFFSET pagination on DISTINCT scales
  quadratically with user count — the briefing job's runtime would
  grow unboundedly. Keyset stays linear and the (user_id, status) index
  serves the range scan directly.
- DXT route docstring: removed the "other route modules use the same
  order" claim — it was misleading (other routes still vary in
  precedence). Now just notes a shared-helper #226 follow-up if the
  ordering proves load-bearing.
- gbrain-port test: renamed `mockExecSync` → `mockExecFileSync` so the
  variable name matches the API under test (execFileSync).

Tests: all 421 api + 48 worker + 20 memory-gbrain tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz added a commit that referenced this pull request May 9, 2026
Per CLAUDE.md "Review Discipline": post-/review fixes get their own
commits AND a CHANGELOG subsection so the audit trail of "what review
caught" is visible to future readers.

VERSION not bumped — this batch lands as part of the stacked sweep
(#226#232); a single VERSION bump will follow when the full chain
merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz and others added 3 commits May 8, 2026 22:32
… epics

Audited Copilot review comments across PRs #198, #206-#215, #218, #219,
Themed follow-ups will land separately.

Security
- Shell injection in @skytwin/memory-gbrain searchSemantic (#215):
  switch from execSync with shell to execFileSync (no shell), so query
  metacharacters cannot inject. Added regression test.
- redactPII / redactPayload skipped arrays — PII in array-of-object
  payloads leaked to provenance and API responses (#209, #210, #211).
  Both helpers now recurse arrays. Tests updated (one previously asserted
  the bug as expected behavior).
- /api/capabilities/suggestions spread the row, leaking raw
  evidence_sources alongside the redacted preview (#211). Switched to
  explicit field projection.
- Email-redaction regex [A-Z|a-z] matched literal | as TLD char (#211).
  Fixed to [A-Za-z].

Correctness
- Credential vault never engaged: only the worker creates DbTokenStore
  but never called setKeyCache, so at-rest encryption + lazy migration
  were dead weight (#212). Worker now owns a KeyCache and wires it;
  cross-process unlock IPC is a #212 follow-up.
- DXT routes broken under real auth: getUserId read req.user?.id but
  production middleware sets req.authenticatedUserId (#219). Switched
  field; tests updated to mirror production middleware.
- Twin MCP provenance only fired on success (#209). Wrapped each tool
  handler in try/finally so audit fires for both success and failure;
  provenance failures never mask the underlying tool result.
- Migration 027 used inline INDEX (...) WHERE inside CREATE TABLE,
  which CockroachDB does not accept (#198). Pulled the partial indexes
  out as standalone CREATE INDEX IF NOT EXISTS — idempotent.
- Onboarding hard-coded first_run_choice='about-me' at three call
  sites (#208). Track _wizardState.firstRunChoice on welcome-screen
  selection, read everywhere downstream.
- Briefing generator capped at LIMIT 500 silently dropped users
  past the cap (#206). Switched to a 500-row paged scan.
- Zero-trust mode helpers exist but aren't wired into the decision
  pipeline (#222). Updated CHANGELOG and capability-detail copy to be
  honest; #222 follow-up tracks the wiring.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ents on #226

Per the new merge gate (CLAUDE.md), every Copilot inline comment is
addressed before merge. Five comments on #226, all valid:

- Migration 027 idempotency: if the inline `INDEX (...) WHERE ...` form
  ever ran successfully (CRDB version-dependent), it would have created
  an auto-named partial index. Added defensive
  `DROP INDEX IF EXISTS <table>@<auto-name>` for both
  mcp_servers_last_active_at_idx and app_suggestions_user_id_idx so a
  re-run doesn't leave duplicate partial indexes covering the same
  predicate.
- briefing-generator: switched OFFSET pagination to keyset pagination
  (`AND user_id > $last`). OFFSET pagination on DISTINCT scales
  quadratically with user count — the briefing job's runtime would
  grow unboundedly. Keyset stays linear and the (user_id, status) index
  serves the range scan directly.
- DXT route docstring: removed the "other route modules use the same
  order" claim — it was misleading (other routes still vary in
  precedence). Now just notes a shared-helper #226 follow-up if the
  ordering proves load-bearing.
- gbrain-port test: renamed `mockExecSync` → `mockExecFileSync` so the
  variable name matches the API under test (execFileSync).

Tests: all 421 api + 48 worker + 20 memory-gbrain tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per CLAUDE.md "Review Discipline": post-/review fixes get their own
commits AND a CHANGELOG subsection so the audit trail of "what review
caught" is visible to future readers.

VERSION not bumped — this batch lands as part of the stacked sweep
(#226#232); a single VERSION bump will follow when the full chain
merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz jayzalowitz force-pushed the jayzalowitz/audit-copilot-comments branch from 86e8d04 to fc40c27 Compare May 9, 2026 02:32
@jayzalowitz jayzalowitz merged commit f45cb3d into main May 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants