fix(copilot-sweep batch 7): adaptive prompt input/output sweep (#207 follow-up)#232
Merged
Merged
Conversation
2 tasks
6eec655 to
afea1d8
Compare
9f05e3e to
875cfa7
Compare
jayzalowitz
added a commit
that referenced
this pull request
May 9, 2026
Per CLAUDE.md "Review Discipline": post-/review fixes get their own commits AND a CHANGELOG subsection so the audit trail of "what review caught" is visible to future readers. VERSION not bumped — this batch lands as part of the stacked sweep (#226 → #232); a single VERSION bump will follow when the full chain merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz
added a commit
that referenced
this pull request
May 9, 2026
Per CLAUDE.md "Review Discipline": post-/review fixes get their own commits AND a CHANGELOG subsection so the audit trail of "what review caught" is visible to future readers. VERSION not bumped — this batch lands as part of the stacked sweep (#226 → #232); a single VERSION bump will follow when the full chain merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz
added a commit
that referenced
this pull request
May 9, 2026
… epics (#226) * fix(copilot-sweep batch 1): security + correctness across this week's epics Audited Copilot review comments across PRs #198, #206-#215, #218, #219, Themed follow-ups will land separately. Security - Shell injection in @skytwin/memory-gbrain searchSemantic (#215): switch from execSync with shell to execFileSync (no shell), so query metacharacters cannot inject. Added regression test. - redactPII / redactPayload skipped arrays — PII in array-of-object payloads leaked to provenance and API responses (#209, #210, #211). Both helpers now recurse arrays. Tests updated (one previously asserted the bug as expected behavior). - /api/capabilities/suggestions spread the row, leaking raw evidence_sources alongside the redacted preview (#211). Switched to explicit field projection. - Email-redaction regex [A-Z|a-z] matched literal | as TLD char (#211). Fixed to [A-Za-z]. Correctness - Credential vault never engaged: only the worker creates DbTokenStore but never called setKeyCache, so at-rest encryption + lazy migration were dead weight (#212). Worker now owns a KeyCache and wires it; cross-process unlock IPC is a #212 follow-up. - DXT routes broken under real auth: getUserId read req.user?.id but production middleware sets req.authenticatedUserId (#219). Switched field; tests updated to mirror production middleware. - Twin MCP provenance only fired on success (#209). Wrapped each tool handler in try/finally so audit fires for both success and failure; provenance failures never mask the underlying tool result. - Migration 027 used inline INDEX (...) WHERE inside CREATE TABLE, which CockroachDB does not accept (#198). Pulled the partial indexes out as standalone CREATE INDEX IF NOT EXISTS — idempotent. - Onboarding hard-coded first_run_choice='about-me' at three call sites (#208). Track _wizardState.firstRunChoice on welcome-screen selection, read everywhere downstream. - Briefing generator capped at LIMIT 500 silently dropped users past the cap (#206). Switched to a 500-row paged scan. - Zero-trust mode helpers exist but aren't wired into the decision pipeline (#222). Updated CHANGELOG and capability-detail copy to be honest; #222 follow-up tracks the wiring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(copilot-sweep batch 1, post-/review): address Copilot inline comments on #226 Per the new merge gate (CLAUDE.md), every Copilot inline comment is addressed before merge. Five comments on #226, all valid: - Migration 027 idempotency: if the inline `INDEX (...) WHERE ...` form ever ran successfully (CRDB version-dependent), it would have created an auto-named partial index. Added defensive `DROP INDEX IF EXISTS <table>@<auto-name>` for both mcp_servers_last_active_at_idx and app_suggestions_user_id_idx so a re-run doesn't leave duplicate partial indexes covering the same predicate. - briefing-generator: switched OFFSET pagination to keyset pagination (`AND user_id > $last`). OFFSET pagination on DISTINCT scales quadratically with user count — the briefing job's runtime would grow unboundedly. Keyset stays linear and the (user_id, status) index serves the range scan directly. - DXT route docstring: removed the "other route modules use the same order" claim — it was misleading (other routes still vary in precedence). Now just notes a shared-helper #226 follow-up if the ordering proves load-bearing. - gbrain-port test: renamed `mockExecSync` → `mockExecFileSync` so the variable name matches the API under test (execFileSync). Tests: all 421 api + 48 worker + 20 memory-gbrain tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(CHANGELOG): add post-/review fixes subsection for #226 Per CLAUDE.md "Review Discipline": post-/review fixes get their own commits AND a CHANGELOG subsection so the audit trail of "what review caught" is visible to future readers. VERSION not bumped — this batch lands as part of the stacked sweep (#226 → #232); a single VERSION bump will follow when the full chain merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
afea1d8 to
bf54aaf
Compare
…follow-up)
Adaptive judgment paths fired runPrompt with camelCase input keys but
the prompt templates use snake_case placeholders ({{current_tier}},
{{risk_profile}}, …). The runner left unmatched placeholders literal,
the LLM returned garbage, schema validation failed, and the
deterministic fallback ran every time. Net effect: the "adaptive
judgment layer" introduced in #207 never engaged in any of the call
sites below.
Fixed (caller → template input rename):
- packages/policy-engine/src/trust-tier-engine.ts → tier-promotion-judgment
({current_tier, target_tier, risk_profile, feedback_history,
decision_summary} from internal camelCase + ApprovalStats fields).
- packages/registry-client/src/oauth-recovery.ts → oauth-recovery
({failure_trace, server_auth_metadata}).
- packages/capability-engine/src/inference-engine.ts → service-detection
({signals, risk_profile}) and capability-ranking
({capabilities, user_profile_summary, language, risk_profile}; was
passing the entire suggestions array under the wrong key).
- apps/api/src/routes/risk-profile.ts → risk-profile-interpretation
({risk_profile_text}).
- apps/api/src/routes/about-me.ts → self-portrait ({memory_facts}).
- apps/api/src/routes/onboarding.ts → onboarding-dialogue ({previous_turns,
goal, language, risk_profile}; dropped redundant `history` alias and
unused `current_capabilities`).
- apps/api/src/routes/capabilities.ts:
- recipe-recommendation → ({registry, detected_services, risk_profile})
plus output type now matches the prompt's documented shape and is
transformed into a CapabilityRecipe before returning to clients.
- reverse-capability-intent → ({user_message, installed_capabilities,
risk_profile}).
- apps/worker/src/jobs/briefing-generator.ts → briefing-prose
({date, events, language, pending_tasks, risk_profile}; output type
matches the prompt's documented {briefing} field, not {prose}).
- apps/worker/src/jobs/dormancy-check.ts → dormancy-judgment
({server_name, server_id, activity_history, user_activity,
risk_profile}).
Performance:
- inference-engine.ts CapabilityInferenceEngine now caches
registry.getAll() across the run() lifecycle; previously it was
called from runDeterministic, verifyAgainstRegistry, and
aggregateAndScore independently.
Tests: all suites pass (1453 total, 401 api / 144 policy-engine /
38 capability-engine / 47 mcp-host / 48 worker / etc.).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
875cfa7 to
124e282
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Top of the stack: #231 → #230 → #229 → #228 → #227 → #226.
The big one. The "adaptive judgment layer" introduced in #207 was
silently always falling back to deterministic logic because every
`runPrompt` caller passed camelCase input keys but the prompt templates
use snake_case placeholders. Unmatched placeholders rendered literally,
the LLM output failed schema validation, and the deterministic fallback
ran every time.
Files fixed (caller → template input rename)
reverse-capability-intent (output type for recipe-recommendation
also corrected to match the prompt's documented schema)
(output type was `{prose}`; prompt returns `{briefing}`)
Performance
`run()` lifecycle (was called 2-3× per run from independent paths).
Test plan
cannot be done from this workspace; will need a manual smoke-test
with a real LLM provider configured
🤖 Generated with Claude Code