Skip to content

fix(copilot-sweep batch 7): adaptive prompt input/output sweep (#207 follow-up)#232

Merged
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/copilot-sweep-7-adaptive
May 9, 2026
Merged

fix(copilot-sweep batch 7): adaptive prompt input/output sweep (#207 follow-up)#232
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/copilot-sweep-7-adaptive

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Top of the stack: #231#230#229#228#227#226.

The big one. The "adaptive judgment layer" introduced in #207 was
silently always falling back to deterministic logic because every
`runPrompt` caller passed camelCase input keys but the prompt templates
use snake_case placeholders. Unmatched placeholders rendered literally,
the LLM output failed schema validation, and the deterministic fallback
ran every time.

Files fixed (caller → template input rename)

  • `packages/policy-engine/src/trust-tier-engine.ts` → tier-promotion-judgment
  • `packages/registry-client/src/oauth-recovery.ts` → oauth-recovery
  • `packages/capability-engine/src/inference-engine.ts` → service-detection
    • capability-ranking
  • `apps/api/src/routes/risk-profile.ts` → risk-profile-interpretation
  • `apps/api/src/routes/about-me.ts` → self-portrait
  • `apps/api/src/routes/onboarding.ts` → onboarding-dialogue
  • `apps/api/src/routes/capabilities.ts` → recipe-recommendation +
    reverse-capability-intent (output type for recipe-recommendation
    also corrected to match the prompt's documented schema)
  • `apps/worker/src/jobs/briefing-generator.ts` → briefing-prose
    (output type was `{prose}`; prompt returns `{briefing}`)
  • `apps/worker/src/jobs/dormancy-check.ts` → dormancy-judgment

Performance

  • `CapabilityInferenceEngine` now caches `registry.getAll()` across one
    `run()` lifecycle (was called 2-3× per run from independent paths).

Test plan

  • Full suite passes (1453 tests across 14 packages)
  • Runtime verification that adaptive paths now actually engage
    cannot be done from this workspace; will need a manual smoke-test
    with a real LLM provider configured

🤖 Generated with Claude Code

@jayzalowitz jayzalowitz force-pushed the jayzalowitz/copilot-sweep-6-observability branch from 6eec655 to afea1d8 Compare May 9, 2026 02:24
@jayzalowitz jayzalowitz force-pushed the jayzalowitz/copilot-sweep-7-adaptive branch from 9f05e3e to 875cfa7 Compare May 9, 2026 02:24
jayzalowitz added a commit that referenced this pull request May 9, 2026
Per CLAUDE.md "Review Discipline": post-/review fixes get their own
commits AND a CHANGELOG subsection so the audit trail of "what review
caught" is visible to future readers.

VERSION not bumped — this batch lands as part of the stacked sweep
(#226#232); a single VERSION bump will follow when the full chain
merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz added a commit that referenced this pull request May 9, 2026
Per CLAUDE.md "Review Discipline": post-/review fixes get their own
commits AND a CHANGELOG subsection so the audit trail of "what review
caught" is visible to future readers.

VERSION not bumped — this batch lands as part of the stacked sweep
(#226#232); a single VERSION bump will follow when the full chain
merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz added a commit that referenced this pull request May 9, 2026
… epics (#226)

* fix(copilot-sweep batch 1): security + correctness across this week's epics

Audited Copilot review comments across PRs #198, #206-#215, #218, #219,
Themed follow-ups will land separately.

Security
- Shell injection in @skytwin/memory-gbrain searchSemantic (#215):
  switch from execSync with shell to execFileSync (no shell), so query
  metacharacters cannot inject. Added regression test.
- redactPII / redactPayload skipped arrays — PII in array-of-object
  payloads leaked to provenance and API responses (#209, #210, #211).
  Both helpers now recurse arrays. Tests updated (one previously asserted
  the bug as expected behavior).
- /api/capabilities/suggestions spread the row, leaking raw
  evidence_sources alongside the redacted preview (#211). Switched to
  explicit field projection.
- Email-redaction regex [A-Z|a-z] matched literal | as TLD char (#211).
  Fixed to [A-Za-z].

Correctness
- Credential vault never engaged: only the worker creates DbTokenStore
  but never called setKeyCache, so at-rest encryption + lazy migration
  were dead weight (#212). Worker now owns a KeyCache and wires it;
  cross-process unlock IPC is a #212 follow-up.
- DXT routes broken under real auth: getUserId read req.user?.id but
  production middleware sets req.authenticatedUserId (#219). Switched
  field; tests updated to mirror production middleware.
- Twin MCP provenance only fired on success (#209). Wrapped each tool
  handler in try/finally so audit fires for both success and failure;
  provenance failures never mask the underlying tool result.
- Migration 027 used inline INDEX (...) WHERE inside CREATE TABLE,
  which CockroachDB does not accept (#198). Pulled the partial indexes
  out as standalone CREATE INDEX IF NOT EXISTS — idempotent.
- Onboarding hard-coded first_run_choice='about-me' at three call
  sites (#208). Track _wizardState.firstRunChoice on welcome-screen
  selection, read everywhere downstream.
- Briefing generator capped at LIMIT 500 silently dropped users
  past the cap (#206). Switched to a 500-row paged scan.
- Zero-trust mode helpers exist but aren't wired into the decision
  pipeline (#222). Updated CHANGELOG and capability-detail copy to be
  honest; #222 follow-up tracks the wiring.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(copilot-sweep batch 1, post-/review): address Copilot inline comments on #226

Per the new merge gate (CLAUDE.md), every Copilot inline comment is
addressed before merge. Five comments on #226, all valid:

- Migration 027 idempotency: if the inline `INDEX (...) WHERE ...` form
  ever ran successfully (CRDB version-dependent), it would have created
  an auto-named partial index. Added defensive
  `DROP INDEX IF EXISTS <table>@<auto-name>` for both
  mcp_servers_last_active_at_idx and app_suggestions_user_id_idx so a
  re-run doesn't leave duplicate partial indexes covering the same
  predicate.
- briefing-generator: switched OFFSET pagination to keyset pagination
  (`AND user_id > $last`). OFFSET pagination on DISTINCT scales
  quadratically with user count — the briefing job's runtime would
  grow unboundedly. Keyset stays linear and the (user_id, status) index
  serves the range scan directly.
- DXT route docstring: removed the "other route modules use the same
  order" claim — it was misleading (other routes still vary in
  precedence). Now just notes a shared-helper #226 follow-up if the
  ordering proves load-bearing.
- gbrain-port test: renamed `mockExecSync` → `mockExecFileSync` so the
  variable name matches the API under test (execFileSync).

Tests: all 421 api + 48 worker + 20 memory-gbrain tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(CHANGELOG): add post-/review fixes subsection for #226

Per CLAUDE.md "Review Discipline": post-/review fixes get their own
commits AND a CHANGELOG subsection so the audit trail of "what review
caught" is visible to future readers.

VERSION not bumped — this batch lands as part of the stacked sweep
(#226#232); a single VERSION bump will follow when the full chain
merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz jayzalowitz force-pushed the jayzalowitz/copilot-sweep-6-observability branch from afea1d8 to bf54aaf Compare May 9, 2026 02:37
…follow-up)

Adaptive judgment paths fired runPrompt with camelCase input keys but
the prompt templates use snake_case placeholders ({{current_tier}},
{{risk_profile}}, …). The runner left unmatched placeholders literal,
the LLM returned garbage, schema validation failed, and the
deterministic fallback ran every time. Net effect: the "adaptive
judgment layer" introduced in #207 never engaged in any of the call
sites below.

Fixed (caller → template input rename):
- packages/policy-engine/src/trust-tier-engine.ts → tier-promotion-judgment
  ({current_tier, target_tier, risk_profile, feedback_history,
  decision_summary} from internal camelCase + ApprovalStats fields).
- packages/registry-client/src/oauth-recovery.ts → oauth-recovery
  ({failure_trace, server_auth_metadata}).
- packages/capability-engine/src/inference-engine.ts → service-detection
  ({signals, risk_profile}) and capability-ranking
  ({capabilities, user_profile_summary, language, risk_profile}; was
  passing the entire suggestions array under the wrong key).
- apps/api/src/routes/risk-profile.ts → risk-profile-interpretation
  ({risk_profile_text}).
- apps/api/src/routes/about-me.ts → self-portrait ({memory_facts}).
- apps/api/src/routes/onboarding.ts → onboarding-dialogue ({previous_turns,
  goal, language, risk_profile}; dropped redundant `history` alias and
  unused `current_capabilities`).
- apps/api/src/routes/capabilities.ts:
  - recipe-recommendation → ({registry, detected_services, risk_profile})
    plus output type now matches the prompt's documented shape and is
    transformed into a CapabilityRecipe before returning to clients.
  - reverse-capability-intent → ({user_message, installed_capabilities,
    risk_profile}).
- apps/worker/src/jobs/briefing-generator.ts → briefing-prose
  ({date, events, language, pending_tasks, risk_profile}; output type
  matches the prompt's documented {briefing} field, not {prose}).
- apps/worker/src/jobs/dormancy-check.ts → dormancy-judgment
  ({server_name, server_id, activity_history, user_activity,
  risk_profile}).

Performance:
- inference-engine.ts CapabilityInferenceEngine now caches
  registry.getAll() across the run() lifecycle; previously it was
  called from runDeterministic, verifyAgainstRegistry, and
  aggregateAndScore independently.

Tests: all suites pass (1453 total, 401 api / 144 policy-engine /
38 capability-engine / 47 mcp-host / 48 worker / etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz jayzalowitz force-pushed the jayzalowitz/copilot-sweep-7-adaptive branch from 875cfa7 to 124e282 Compare May 9, 2026 02:38
@jayzalowitz jayzalowitz changed the base branch from jayzalowitz/copilot-sweep-6-observability to main May 9, 2026 02:38
@jayzalowitz jayzalowitz merged commit 388f25e into main May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant