Conversation
|
The following issues are detected both Claude Code and OpenAI Codex and reviewed by me as well. Critical Issues (Blocks Merge)1. Missing file:
|
|
@yi-john-huang thanks for the thoughtful detailed review and happy Lunar New Year! PR was intentionally a sketch to validate direction before diving into implementation. I’ll go through your comments carefully and think about a concrete plan for addressing them. |
|
Hi @yi-john-huang , I've addressed your feedback and wanted to summarize the changes and our architectural direction. SummaryFocused on making the plan a complete artifact. In this iteration:
The enhanced plan is embedded in the current workflow — it's created but not consumed. Tests added. Next steps: Focus on plan creation — embedding user info (paths, configs, authorization, etc.). Plan structure will evolve. May drop JSON schema eventually, but it's good enough for now. Critical Issues (1-7) — All Fixed
Major/Minor Issues — FixedArchitectural Feedback — Addressed"execute() overlaps with enforce()" → Separated into layers:
"EnhancedExecutionPlan doesn't integrate" → "Two execution paradigms conflict" → not dealing with that, execution layer is a stub for now. Focus is on plan creation. "JSON schema and Pydantic divergent" → Intentionally keeping |
|
I mirrored this branch upstream here: yi-john-huang#14 (same code), just in case. |
750f805 to
72a10d8
Compare
What's Good
Critical (Must Fix)1.
Fix: Move 2. Two problems in one call site: if self._enhancement_enabled:
self.create_enhanced_plan(plan, effective_session_id, user_id, token)
Fix: Wrap the call in try/except and log on failure. Separately, decide how the enhanced plan reaches the caller (persist to store, add an 3. self._planner = PlanGenerator(patterns_path)
Fix: Pass schema path explicitly through middleware settings, or resolve it to an absolute path at construction time. 4. Missing
High (Should Fix)5. Off-by-one in retry loop With Fix: Rename to 6. def _should_skip(self, sequence, plan):
for cond in plan.conditionals:
if sequence in cond.if_true or sequence in cond.if_false:
for result in plan.state.step_results:
if result.status == StepStatus.FAILED:
if sequence in cond.if_true:
return True
return False
This makes the conditional system logically incorrect for any plan with more than 2 steps. 7. Full
Fix: Scrub sensitive fields from 8. Unused imports in
9.
Medium
|
|
Hi @yi-john-huang, Thanks for the thorough review. I've addressed all the issues. General Notes:
Enhanced plan not persisted: Currently created but not saved. The goal is to eventually transfer meaningful context to the user and store a useful token — this requires refinement of what the plan actually contains first. Next steps: Plan refinement — using full schema and embedding user info (paths, configs, authorization, etc.). That's when the stubbed conditional logic comes into play. Here's the summary: Critical (1-4) — All Fixed
High (5-9) — All Fixed
Medium — Fixed
Tests updated to reflect all changes. |
General notes / clarifications Conditional logic (_should_skip) The previous implementation was broken and half-implemented. Rather than ship incorrect branching semantics, I’ve stubbed it to return False for now. Proper conditional execution will be implemented once we finalize the conditional expression semantics and the richer plan schema. execution-plan-big.json This is intentionally kept as a landmark for the full schema vision. The smaller execution-plan.json is the one used in practice today. Enhanced plan persistence Enhanced plans are now created safely (non-blocking) but are not yet persisted. The intention is to eventually store a refined enhanced plan and issue a meaningful token once the schema and user-context embedding (paths, configs, auth, etc.) are finalized. Next steps Refining the execution plan schema and enriching it with real user/runtime context — this is where proper conditional logic will come back into play. Summary of fixes Critical (1–4) — All fixed 1 RecoveryPath / ConditionalBranch defined after use Classes moved before EnhancedExecutionPlan 2 create_enhanced_plan() discarded + no error handling Wrapped in try/except, logged failures, returns EnhancedExecutionPlan (non-blocking) 3 Schema path CWD-relative Resolved to absolute path derived from patterns_path 4 Missing src/llm/__init__.py Added High (5–9) — All fixed 5 Retry off-by-one max_attempts = 1 + max_retries 6 _should_skip() broken Stubbed to return False until semantics are defined 7 Sensitive args sent to LLM Added _sanitize_for_llm() + security/audit logging 8 Unused imports in executor.py Removed 9 LLMClient hardening Timeout, env-configurable model, error handling, dependency declared Medium — Fixed Issue Fix Duration log always “incomplete” completed_at set before _calc_total_duration() session_id=None crashes ExecutionContext initialize_state() validates and errors early isort violation Import order fixed Two schema files execution-plan-big.json kept intentionally as reference Test asserting AttributeError Updated to assert ExecutionError Tests Updated to reflect all changes
Moves the Anthropic import behind LLMClient initialization so optional LLM enhancement no longer affects import-time behavior or core governance paths. This change ensures: - Governance evaluation and execution remain functional when LLM enhancement is disabled or misconfigured - Missing dependencies or API keys fail locally and explicitly, not at module import time - Clear error messages guide correct installation and configuration Adds targeted tests covering: - Missing anthropic dependency - Missing ANTHROPIC_API_KEY - Successful lazy initialization when enhancement is enabled This reduces blast radius for optional integrations and tightens the boundary between core governance logic and external LLM providers.
Tighten failure handling around LLM-based plan enhancement. Enhancement remains best-effort and non-authoritative: failures are caught locally and never affect core governance evaluation or execution. Improvements include: - Richer warning logs with plan_id, error type, and full traceback to aid debugging and post-mortem analysis - Removal of a redundant ExecutionError definition to avoid duplicate error types - Integration test coverage asserting that enhancement failures are logged with sufficient diagnostic detail This reduces silent failure modes while preserving strict isolation between optional LLM enhancement and core execution paths.
…lidation Tighten security and determinism around LLM-based plan enhancement. This change introduces an allowlist-based sanitization strategy for all data sent to external LLMs. Only explicitly permitted structural fields and safe argument keys are allowed through; everything else is redacted. This replaces the previous sensitive-key denylist with a stricter, governance-appropriate posture. Additional improvements: - Validate LLM output against the execution-plan JSON schema before constructing an EnhancedExecutionPlan - Make schema path resolution deterministic and independent of CWD - Move LLM dependencies behind an optional dependency group - Expand tests to cover allowlist sanitization, schema validation, and schema path resolution behavior The goal is to reduce blast radius, prevent credential or PII leakage, and ensure LLM enhancement remains auditable, deterministic, and non-authoritative.
Moved jsonschema to core dependencies (it's small, reasonable)
Remove try/except ImportError in complete() - anthropic is guaranteed available since __init__ succeeded. Delete corresponding test that simulated module disappearing mid-process (impossible scenario).
- Rename agentContext to agent_context (snake_case consistency) - Add EXPERIMENTAL docstring to engine.py, injected_context.py - Clarify ValidationError handling comment in middleware.py - Update tests to use snake_case key
Move execution-plan-big.json to docs/reference/ and rename to execution-plan-reference.json. This schema is not used in production - it documents the full vision for plan structure to guide future work.
Add an explicit note explaining why EnhancedExecutionPlan remains mutable in this iteration. The plan primitive is still being refined, and execution state is embedded temporarily to avoid prematurely locking lifecycle and ownership boundaries. Once the plan schema stabilizes, execution state will be extracted into a separate mutable container owned by the execution engine and the plan model will be frozen.
Add debug logging when skipping malformed recovery_paths and conditionals. Update tests to verify logging occurs.
Add @requires_llm marker to tests that need real Anthropic client. Tests auto-skip with clear reason when key is unavailable.
c454d22 to
ad4ae41
Compare
Guard asyncio.create_task for sync callers like evaluate_governance().
Summary
Introduces an explicit Execution Plan–driven execution engine in the governance layer.
The planner now produces a schema-defined
execution-plan.jsonartifact, and execution is routed through a newengine.py, supporting both deterministic execution and LLM-guided (agent) execution. Middleware responsibilities are split between evaluation and execution.This makes execution explicit, inspectable, and mode-selectable, without changing existing validation logic.
Motivation
Previously, execution semantics were implicit:
This change introduces a plan-first execution boundary:
Nothing executes unless it exists in an explicit execution plan.
What Changed
Execution Plan Schema
Introduced a schema-defined execution-plan.json that represents the complete execution setup
The schema captures:
This artifact is now the canonical contract between planning and execution
Planner → Execution Plan Artifact
New Governance Engine
New file:
src/governance/engine.pyResponsibilities:
Middleware Refactor (Evaluate vs Execute)
Middleware responsibilities are now explicitly split:
evaluate(...)execute(...)engine.pyThis removes execution logic from evaluation and prevents accidental execution during planning or validation.
Execution Modes
Execution mode is selected from plan constraints:
"constraints": {
"executionMode": "engine" | "agent"
}
Engine Mode (default)
Agent Mode
What Did NOT Change
This is a purely additive and structural change.
Architecture
New Flow
evaluate()
→ Planner (execution-plan.json)
→ Validator (unchanged)
→ Execution decision
execute()
→ Engine
→ Tools (engine mode)
→ LLM context injection (agent mode)
Why This Matters
Breaking Changes
None.
This change is additive and preserves existing behavior by default (
enginemode).