fix(copilot-sweep batch 7): adaptive prompt input/output sweep (#207 follow-up) by jayzalowitz · Pull Request #232 · jayzalowitz/skytwin

jayzalowitz · 2026-05-09T00:21:08Z

Top of the stack: #231 → #230 → #229 → #228 → #227 → #226.

The big one. The "adaptive judgment layer" introduced in #207 was
silently always falling back to deterministic logic because every
`runPrompt` caller passed camelCase input keys but the prompt templates
use snake_case placeholders. Unmatched placeholders rendered literally,
the LLM output failed schema validation, and the deterministic fallback
ran every time.

Files fixed (caller → template input rename)

`packages/policy-engine/src/trust-tier-engine.ts` → tier-promotion-judgment
`packages/registry-client/src/oauth-recovery.ts` → oauth-recovery
`packages/capability-engine/src/inference-engine.ts` → service-detection
- capability-ranking
`apps/api/src/routes/risk-profile.ts` → risk-profile-interpretation
`apps/api/src/routes/about-me.ts` → self-portrait
`apps/api/src/routes/onboarding.ts` → onboarding-dialogue
`apps/api/src/routes/capabilities.ts` → recipe-recommendation +
reverse-capability-intent (output type for recipe-recommendation
also corrected to match the prompt's documented schema)
`apps/worker/src/jobs/briefing-generator.ts` → briefing-prose
(output type was `{prose}`; prompt returns `{briefing}`)
`apps/worker/src/jobs/dormancy-check.ts` → dormancy-judgment

Performance

`CapabilityInferenceEngine` now caches `registry.getAll()` across one
`run()` lifecycle (was called 2-3× per run from independent paths).

Test plan

Full suite passes (1453 tests across 14 packages)
Runtime verification that adaptive paths now actually engage
cannot be done from this workspace; will need a manual smoke-test
with a real LLM provider configured

🤖 Generated with Claude Code

Per CLAUDE.md "Review Discipline": post-/review fixes get their own commits AND a CHANGELOG subsection so the audit trail of "what review caught" is visible to future readers. VERSION not bumped — this batch lands as part of the stacked sweep (#226 → #232); a single VERSION bump will follow when the full chain merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… epics (#226) * fix(copilot-sweep batch 1): security + correctness across this week's epics Audited Copilot review comments across PRs #198, #206-#215, #218, #219, Themed follow-ups will land separately. Security - Shell injection in @skytwin/memory-gbrain searchSemantic (#215): switch from execSync with shell to execFileSync (no shell), so query metacharacters cannot inject. Added regression test. - redactPII / redactPayload skipped arrays — PII in array-of-object payloads leaked to provenance and API responses (#209, #210, #211). Both helpers now recurse arrays. Tests updated (one previously asserted the bug as expected behavior). - /api/capabilities/suggestions spread the row, leaking raw evidence_sources alongside the redacted preview (#211). Switched to explicit field projection. - Email-redaction regex [A-Z|a-z] matched literal | as TLD char (#211). Fixed to [A-Za-z]. Correctness - Credential vault never engaged: only the worker creates DbTokenStore but never called setKeyCache, so at-rest encryption + lazy migration were dead weight (#212). Worker now owns a KeyCache and wires it; cross-process unlock IPC is a #212 follow-up. - DXT routes broken under real auth: getUserId read req.user?.id but production middleware sets req.authenticatedUserId (#219). Switched field; tests updated to mirror production middleware. - Twin MCP provenance only fired on success (#209). Wrapped each tool handler in try/finally so audit fires for both success and failure; provenance failures never mask the underlying tool result. - Migration 027 used inline INDEX (...) WHERE inside CREATE TABLE, which CockroachDB does not accept (#198). Pulled the partial indexes out as standalone CREATE INDEX IF NOT EXISTS — idempotent. - Onboarding hard-coded first_run_choice='about-me' at three call sites (#208). Track _wizardState.firstRunChoice on welcome-screen selection, read everywhere downstream. - Briefing generator capped at LIMIT 500 silently dropped users past the cap (#206). Switched to a 500-row paged scan. - Zero-trust mode helpers exist but aren't wired into the decision pipeline (#222). Updated CHANGELOG and capability-detail copy to be honest; #222 follow-up tracks the wiring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(copilot-sweep batch 1, post-/review): address Copilot inline comments on #226 Per the new merge gate (CLAUDE.md), every Copilot inline comment is addressed before merge. Five comments on #226, all valid: - Migration 027 idempotency: if the inline `INDEX (...) WHERE ...` form ever ran successfully (CRDB version-dependent), it would have created an auto-named partial index. Added defensive `DROP INDEX IF EXISTS <table>@<auto-name>` for both mcp_servers_last_active_at_idx and app_suggestions_user_id_idx so a re-run doesn't leave duplicate partial indexes covering the same predicate. - briefing-generator: switched OFFSET pagination to keyset pagination (`AND user_id > $last`). OFFSET pagination on DISTINCT scales quadratically with user count — the briefing job's runtime would grow unboundedly. Keyset stays linear and the (user_id, status) index serves the range scan directly. - DXT route docstring: removed the "other route modules use the same order" claim — it was misleading (other routes still vary in precedence). Now just notes a shared-helper #226 follow-up if the ordering proves load-bearing. - gbrain-port test: renamed `mockExecSync` → `mockExecFileSync` so the variable name matches the API under test (execFileSync). Tests: all 421 api + 48 worker + 20 memory-gbrain tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(CHANGELOG): add post-/review fixes subsection for #226 Per CLAUDE.md "Review Discipline": post-/review fixes get their own commits AND a CHANGELOG subsection so the audit trail of "what review caught" is visible to future readers. VERSION not bumped — this batch lands as part of the stacked sweep (#226 → #232); a single VERSION bump will follow when the full chain merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…follow-up) Adaptive judgment paths fired runPrompt with camelCase input keys but the prompt templates use snake_case placeholders ({{current_tier}}, {{risk_profile}}, …). The runner left unmatched placeholders literal, the LLM returned garbage, schema validation failed, and the deterministic fallback ran every time. Net effect: the "adaptive judgment layer" introduced in #207 never engaged in any of the call sites below. Fixed (caller → template input rename): - packages/policy-engine/src/trust-tier-engine.ts → tier-promotion-judgment ({current_tier, target_tier, risk_profile, feedback_history, decision_summary} from internal camelCase + ApprovalStats fields). - packages/registry-client/src/oauth-recovery.ts → oauth-recovery ({failure_trace, server_auth_metadata}). - packages/capability-engine/src/inference-engine.ts → service-detection ({signals, risk_profile}) and capability-ranking ({capabilities, user_profile_summary, language, risk_profile}; was passing the entire suggestions array under the wrong key). - apps/api/src/routes/risk-profile.ts → risk-profile-interpretation ({risk_profile_text}). - apps/api/src/routes/about-me.ts → self-portrait ({memory_facts}). - apps/api/src/routes/onboarding.ts → onboarding-dialogue ({previous_turns, goal, language, risk_profile}; dropped redundant `history` alias and unused `current_capabilities`). - apps/api/src/routes/capabilities.ts: - recipe-recommendation → ({registry, detected_services, risk_profile}) plus output type now matches the prompt's documented shape and is transformed into a CapabilityRecipe before returning to clients. - reverse-capability-intent → ({user_message, installed_capabilities, risk_profile}). - apps/worker/src/jobs/briefing-generator.ts → briefing-prose ({date, events, language, pending_tasks, risk_profile}; output type matches the prompt's documented {briefing} field, not {prose}). - apps/worker/src/jobs/dormancy-check.ts → dormancy-judgment ({server_name, server_id, activity_history, user_activity, risk_profile}). Performance: - inference-engine.ts CapabilityInferenceEngine now caches registry.getAll() across the run() lifecycle; previously it was called from runDeterministic, verifyAgainstRegistry, and aggregateAndScore independently. Tests: all suites pass (1453 total, 401 api / 144 policy-engine / 38 capability-engine / 47 mcp-host / 48 worker / etc.). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jayzalowitz mentioned this pull request May 9, 2026

docs(CLAUDE.md): add three-step PR merge gate (/review + Copilot + /document-release) #234

Merged

2 tasks

jayzalowitz force-pushed the jayzalowitz/copilot-sweep-6-observability branch from 6eec655 to afea1d8 Compare May 9, 2026 02:24

jayzalowitz force-pushed the jayzalowitz/copilot-sweep-7-adaptive branch from 9f05e3e to 875cfa7 Compare May 9, 2026 02:24

jayzalowitz force-pushed the jayzalowitz/copilot-sweep-6-observability branch from afea1d8 to bf54aaf Compare May 9, 2026 02:37

jayzalowitz force-pushed the jayzalowitz/copilot-sweep-7-adaptive branch from 875cfa7 to 124e282 Compare May 9, 2026 02:38

jayzalowitz changed the base branch from jayzalowitz/copilot-sweep-6-observability to main May 9, 2026 02:38

jayzalowitz merged commit 388f25e into main May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(copilot-sweep batch 7): adaptive prompt input/output sweep (#207 follow-up)#232

fix(copilot-sweep batch 7): adaptive prompt input/output sweep (#207 follow-up)#232
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/copilot-sweep-7-adaptive

jayzalowitz commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jayzalowitz commented May 9, 2026

Files fixed (caller → template input rename)

Performance

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant