Execution plan by nayname · Pull Request #1 · nayname/openclaw-secure-stack

nayname · 2026-02-09T13:05:44Z

Summary

Introduces an explicit Execution Plan–driven execution engine in the governance layer.
The planner now produces a schema-defined execution-plan.json artifact, and execution is routed through a new engine.py, supporting both deterministic execution and LLM-guided (agent) execution. Middleware responsibilities are split between evaluation and execution.

This makes execution explicit, inspectable, and mode-selectable, without changing existing validation logic.

Motivation

Previously, execution semantics were implicit:

The planner produced internal structures rather than a stable artifact
Execution was tightly coupled to LLM tool calling

This change introduces a plan-first execution boundary:

Nothing executes unless it exists in an explicit execution plan.

What Changed

Execution Plan Schema

Introduced a schema-defined execution-plan.json that represents the complete execution setup

The schema captures:

execution intent
ordered operations
execution constraints
execution mode (engine vs agent)

This artifact is now the canonical contract between planning and execution

Planner → Execution Plan Artifact

Planner now produces a schema-conformant JSON instance
The plan is the single source of truth for execution intent
Validation logic is left unchanged and continues to operate as-is

New Governance Engine

New file:

src/governance/engine.py

Responsibilities:

Execute a plan according to its declared execution mode
Act as the single execution authority
Support both:
- Engine mode — tools executed outside the LLM (execution layer)
- Agent mode — plan injected into LLM context (no tool execution)

Middleware Refactor (Evaluate vs Execute)

Middleware responsibilities are now explicitly split:

evaluate(...)
- Planning
- Validation
- Execution mode selection
- No side effects
execute(...)
- Plan execution via engine.py
- Tool invocation or agent-context generation
- All side effects live here

This removes execution logic from evaluation and prevents accidental execution during planning or validation.

Execution Modes

Execution mode is selected from plan constraints:

"constraints": {
"executionMode": "engine" | "agent"
}

Engine Mode (default)

Tools executed deterministically outside the LLM
Governed by the execution engine
Suitable for auditable, replayable execution

Agent Mode

Plan is presented to the LLM as structured context
No tools are executed by the engine
Suitable for guided reasoning or interactive flows

What Did NOT Change

Existing policy enforcement semantics remain intact
No breaking changes to existing governance rules
No changes to tool implementations

This is a purely additive and structural change.

Architecture

New Flow

evaluate()
→ Planner (execution-plan.json)
→ Validator (unchanged)
→ Execution decision

execute()
→ Engine
→ Tools (engine mode)
→ LLM context injection (agent mode)

Why This Matters

Makes execution explicit and inspectable
Enables plan persistence, replay, and audit
Supports multiple execution strategies without branching logic in middleware
Aligns governance with a compiler/runtime model rather than agent heuristics

Breaking Changes

None.

This change is additive and preserves existing behavior by default (engine mode).

yi-john-huang · 2026-02-16T17:40:51Z

The following issues are detected both Claude Code and OpenAI Codex and reviewed by me as well.

Critical Issues (Blocks Merge)

1. Missing file: `engine_adapter.py` — ImportError on startup

middleware.py:21 imports:

from src.governance.engine_adapter import adapt_execution_plan

This file does not exist in the PR or the fork. The proxy will crash at import time.

2. `await` inside non-async function — SyntaxError

middleware.py:247 (the new execute() method):

def execute(self, evaluation, session, tool_executor):
    ...
    state = await engine.execute(...)  # SyntaxError: 'await' outside async function

The method is defined with def, not async def.

3. Breaking constructor: `PlanGenerator.init` now requires `llm`

planner.py:77:

def __init__(self, patterns_path: str, llm) -> None:

But GovernanceMiddleware.__init__ (line 87) still calls:

self._planner = PlanGenerator(patterns_path)  # TypeError: missing 'llm'

This breaks the existing code path. Every test and production instantiation of PlanGenerator will crash.

4. `GovernanceEnforcer()` called without required args

middleware.py:243:

engine = ExecutionEngine(
    enforcer=GovernanceEnforcer(),  # Missing db_path and secret!
    ...
)

GovernanceEnforcer.__init__ requires (db_path: str, secret: str). This is a TypeError.

5. `self.plan` doesn't exist on `GovernanceMiddleware`

middleware.py:226:

exec_plan = self.plan

The middleware has no plan attribute. The evaluate/enforce cycle works through PlanStore, not instance state. This is an AttributeError.

6. Two `generate()` methods with incompatible signatures

The PR adds a new generate() to PlanGenerator:

def generate(self, *, user_message, context, session_id, ttl_minutes):

This shadows the existing generate():

def generate(self, intent, request_body, session_id):

The middleware calls the OLD signature. The new method would take over and crash with TypeError.

7. `state=None` passed to `AgentContextInjector.generate_context()`

middleware.py:231:

agent_context = injector.generate_context(plan=exec_plan, state=None)

But generate_context() accesses state.current_sequence and state.step_results[...]. Passing None → AttributeError.

Major Issues

8. Placeholder prompt: `EXECUTION_PLAN_PROMPT = "\n...\n"`

planner.py:31-33:

EXECUTION_PLAN_PROMPT = """
...
"""

The new generate() method sends this literal "..." to the LLM. It can never produce a valid plan. This is dead code.

9. Deprecated `datetime.utcnow()`

planner.py:118:

now = datetime.utcnow()  # DeprecationWarning in Python 3.12+

The project already uses datetime.now(UTC) in engine.py. Inconsistent and deprecated.

10. Silent exception swallowing in engine

engine.py:520-522:

except Exception as e:
    state.status = StepStatus.FAILED
    # Log error but don't crash

The comment says "log error" but no logging happens. The exception is silently dropped. This makes debugging execution failures impossible and violates the project's error handling rules (.claude/rules/error-handling.md).

11. Zero tests — 1,094 lines of untested code

The project has 520+ Python tests with coverage fail_under=90. This PR adds zero tests for:

ExecutionEngine (the core new component)
AgentContextInjector
New Pydantic models (8 new model classes)
New generate() method on PlanGenerator
New execute() method on middleware

12. Attribute mismatch: `planId` vs `plan_id`

middleware.py:237,254: References exec_plan.planId (camelCase), but all Pydantic models use plan_id (snake_case). The JSON schema uses camelCase, but the Python models don't define aliases. This is an AttributeError.

Minor / Style Issues

13. Uses `Dict`, `List` from `typing` instead of builtins

planner.py:16:

from typing import Any, List, Dict

Python 3.12+ project — should use dict[str, Any] and list[...] directly (as done everywhere else in the codebase).

14. Formatting inconsistency in imports

middleware.py:25:

    ToolCall, ExecutionContext,

Multiple symbols on one line with trailing comma — doesn't match existing import style.

15. Trailing newline removed from `middleware.py`

The diff shows \ No newline at end of file — removed the trailing newline from the existing file.

16. Schema file never loaded or referenced by code

config/execution-plan.json is a JSON Schema but no code loads, validates against, or references it. It's a documentation artifact, not enforced infrastructure.

17. `EnhancedExecutionPlan.conditionals` and `recovery_paths` are commented out

models.py (new):

# conditionals: list[ConditionalBranch] = Field(default_factory=list)
# recovery_paths: list[RecoveryPath] = Field(default_factory=list)

But engine.py accesses plan.conditionals and plan.recovery_paths — these will be AttributeError.

Architectural Feedback

The design idea — separating evaluate (side-effect-free) from execute (side-effects) with an explicit plan artifact — is sound and aligns well with the existing governance architecture. However:

The existing codebase already separates evaluation from enforcement. evaluate() produces plans, enforce() validates tool calls at runtime. The PR's execute() introduces a third layer that overlaps with enforce() without clear boundaries.
The EnhancedExecutionPlan wraps ExecutionPlan but doesn't integrate. There's no code path that creates an EnhancedExecutionPlan from the existing planner output. The bridge (engine_adapter.py) is missing.
Two execution paradigms conflict. The existing system is a "validate-then-proxy" model — the proxy evaluates, gets a token, then forwards individual tool calls through enforcement. The new engine wants to drive execution itself. These aren't reconciled.
The JSON schema and Pydantic models are divergent. The schema defines one contract (camelCase, operations, allow/deny rules), the models define another (snake_case, PlannedAction, risk scores). There's no serialization bridge.

Recommendation

Before this can be considered for merge:

Fix all 7 critical bugs (items 1-7) — the code doesn't run
Add comprehensive tests — at minimum for ExecutionEngine, AgentContextInjector, and the new models
Reconcile the two generate() methods — either extend the existing one or create a separate entry point that doesn't shadow it
Write engine_adapter.py — or remove the import
Fill in EXECUTION_PLAN_PROMPT — or remove the LLM-based generate path
Decide how execute() relates to enforce() — document the boundary
Add logging in the exception handler (engine.py:520)
Fix the async/sync boundary — make execute() properly async

nayname · 2026-02-16T18:46:27Z

@yi-john-huang thanks for the thoughtful detailed review and happy Lunar New Year! PR was intentionally a sketch to validate direction before diving into implementation. I’ll go through your comments carefully and think about a concrete plan for addressing them.

nayname · 2026-02-18T22:00:51Z

Hi @yi-john-huang ,

I've addressed your feedback and wanted to summarize the changes and our architectural direction.

Summary

Focused on making the plan a complete artifact. In this iteration:

Enhanced plan creation — adapted from basic plan using LLM + JSON schema
Separated execution — moved out of middleware.py into src/execution/ (stub for now)

The enhanced plan is embedded in the current workflow — it's created but not consumed. Tests added.

Next steps: Focus on plan creation — embedding user info (paths, configs, authorization, etc.). Plan structure will evolve. May drop JSON schema eventually, but it's good enough for now.

Critical Issues (1-7) — All Fixed

#	Issue	Fix
1	Missing `engine_adapter.py`	implemented as `planner.enhance()`
2	`await` in non-async	`Executor.execute()` is `async def`
3	`PlanGenerator.__init__` requires `llm`	`llm` passed to `enhance()`, not `__init__`
4	`GovernanceEnforcer()` missing args	Dependency injection in `Executor`
5	`self.plan` doesn't exist	Removed
6	Two `generate()` methods	Separate `generate()` and `enhance()`
7	`state=None` crash	`initialize_state()` called before use

Major/Minor Issues — Fixed

Architectural Feedback — Addressed

"execute() overlaps with enforce()" → Separated into layers:

src/governance/ — evaluate, validate, enforce (existing)
src/execution/ — engine, executor, context injection (new, stub)

"EnhancedExecutionPlan doesn't integrate" → planner.enhance() creates it from basic plan + LLM output

"Two execution paradigms conflict" → not dealing with that, execution layer is a stub for now. Focus is on plan creation.

"JSON schema and Pydantic divergent" → Intentionally keeping _parse_* bridge methods for now. Schema is still evolving; adding Pydantic aliases would create maintenance burden. Will revisit when schema stabilizes.

nayname · 2026-02-19T15:55:40Z

I mirrored this branch upstream here: yi-john-huang#14 (same code), just in case.
Discussion and review still live in this PR - I’ll align any upstream follow-ups based on feedback here.

yi-john-huang · 2026-02-21T01:52:04Z

What's Good

Clean evaluate/execute split — evaluation is side-effect-free, execution owns all side effects. Strong boundary.
Schema-first design — defining the execution plan as a JSON Schema artifact is the right approach for audit/replay.
ToolExecutorAdapter ABC — clean abstraction for tool execution, easy to test and extend.
Recovery path modeling — retry/skip/fail-fast strategy is well-structured.
Test volume — good coverage across happy paths and error paths.

Critical (Must Fix)

1. EnhancedExecutionPlan references RecoveryPath / ConditionalBranch before they're defined
src/governance/models.py

EnhancedExecutionPlan has fields typed as list[RecoveryPath] and list[ConditionalBranch], but both classes are defined after EnhancedExecutionPlan in the file. Even with from __future__ import annotations, Pydantic v2 evaluates Field(default_factory=...) at class-definition time — it will raise NameError or PydanticUserError on import.

Fix: Move RecoveryPath and ConditionalBranch above EnhancedExecutionPlan, or call EnhancedExecutionPlan.model_rebuild() at module end after all classes are defined.

2. create_enhanced_plan() return value is discarded and its exceptions crash evaluate()
src/governance/middleware.py

Two problems in one call site:

if self._enhancement_enabled:
    self.create_enhanced_plan(plan, effective_session_id, user_id, token)

The EnhancedExecutionPlan is computed and immediately garbage-collected. EvaluationResult still returns the base plan's plan_id and token. Enhancement is a no-op at runtime.
create_enhanced_plan() has no error handling. Any LLM failure (network error, missing API key, bad JSON) propagates up and crashes the entire evaluate() call — blocking all governance evaluation for what's supposed to be an optional enhancement.

Fix: Wrap the call in try/except and log on failure. Separately, decide how the enhanced plan reaches the caller (persist to store, add an enhanced_plan_id field to EvaluationResult, etc.).

3. PlanGenerator instantiated with CWD-relative schema path
src/governance/middleware.py

self._planner = PlanGenerator(patterns_path)

PlanGenerator.__init__ now has a schema_path parameter defaulting to "config/execution-plan.json" — a relative path. In any environment where CWD isn't the project root (pytest tmp_path, production deployment), self._schema will be None, and enhance() will raise RuntimeError: Schema not found whenever enhancement is enabled.

Fix: Pass schema path explicitly through middleware settings, or resolve it to an absolute path at construction time.

4. Missing src/llm/__init__.py

src/llm/client.py exists, but there's no __init__.py. The import from src.llm.client import LLMClient in middleware.py will fail with ModuleNotFoundError at startup.

High (Should Fix)

5. Off-by-one in retry loop
src/execution/engine.py

With max_retries=3, while retry_count < max_retries runs 3 total attempts — not 3 retries after the first. A user setting max_retries=1 (the minimum, per ge=1) gets exactly one attempt with no retry at all.

Fix: Rename to max_attempts, or change semantics so max_retries=3 means 3 retries (4 total attempts).

6. _should_skip() conditional logic is broken
src/execution/engine.py

def _should_skip(self, sequence, plan):
    for cond in plan.conditionals:
        if sequence in cond.if_true or sequence in cond.if_false:
            for result in plan.state.step_results:
                if result.status == StepStatus.FAILED:
                    if sequence in cond.if_true:
                        return True
    return False

if_false branch is never evaluated — steps in if_false are never skipped
Any prior step failure triggers skip, not the specific step the conditional references
cond.condition (e.g., "step_0_success") is parsed but never evaluated

This makes the conditional system logically incorrect for any plan with more than 2 steps.

7. Full ToolCall.arguments sent unsanitized to external LLM
src/governance/planner.py

enhance() serializes the full ExecutionPlan — including all ToolCall.arguments — and sends it to the Anthropic API. Those arguments may contain passwords, tokens, file paths, or PII extracted from the user's request. This violates the project's security rules (OWASP A03/A09).

Fix: Scrub sensitive fields from ToolCall.arguments before including them in the prompt. Log a security audit event when calling the external API.

8. Unused imports in executor.py will fail ruff
src/execution/executor.py

Session and GovernanceDecision are imported but never used. The project runs ruff — this will fail CI.

9. LLMClient needs hardening
src/llm/client.py

Anthropic() raises AuthenticationError at construction if ANTHROPIC_API_KEY is unset — no helpful error message
No timeout — LLM calls can hang indefinitely
response.content[0] can IndexError on empty responses
Model is hardcoded (claude-sonnet-4-20250514) — should come from config

Medium

Duration log always shows "incomplete" — engine.py: _calc_total_duration() is called before plan.state.completed_at is set. Move the assignment up.
session_id: str | None passed to initialize_state() — middleware.py: ExecutionContext.session_id requires str, not None. Will raise ValidationError when no session exists.
isort violation — models.py: from datetime import UTC, datetime appears after third-party imports. Ruff will flag this.
Two divergent schema files — execution-plan-big.json (draft-2020-12) and execution-plan.json (draft-07) define different schemas for the same concept. Only the latter is used. Remove execution-plan-big.json or document its purpose.
test_execute_raises_when_state_not_initialized — asserts AttributeError from an uninitialized state, which is testing a bug rather than a contract. Should assert a meaningful ExecutionError instead.

nayname · 2026-02-21T16:06:47Z

Hi @yi-john-huang,

Thanks for the thorough review. I've addressed all the issues.

General Notes:

_should_skip() stubbed: The conditional logic was broken and half-implemented. Rather than ship buggy branching, stubbed it to return False. Will implement properly when we define conditional semantics.

execution-plan-big.json kept: Serves as a landmark for the full schema vision. The smaller execution-plan.json is used in practice.

Enhanced plan not persisted: Currently created but not saved. The goal is to eventually transfer meaningful context to the user and store a useful token — this requires refinement of what the plan actually contains first.

Next steps: Plan refinement — using full schema and embedding user info (paths, configs, authorization, etc.). That's when the stubbed conditional logic comes into play.

Here's the summary:

Critical (1-4) — All Fixed

#	Issue	Fix
1	`RecoveryPath`/`ConditionalBranch` defined after use	Moved classes before `EnhancedExecutionPlan`
2	`create_enhanced_plan()` return discarded, no error handling	Added try/except, logging, returns `EnhancedExecutionPlan`, still not saved
3	Schema path CWD-relative	Resolved to absolute path from `patterns_path`
4	Missing `src/llm/__init__.py`	Created

High (5-9) — All Fixed

#	Issue	Fix
5	Off-by-one in retry loop	`max_attempts = 1 + max_retries` (1 initial + N retries)
6	`_should_skip()` broken	Stubbed with `return False`
7	Sensitive args sent to LLM	Added `_sanitize_for_llm()` + security audit logging
8	Unused imports in `executor.py`	Removed `GovernanceDecision`, `Session`
9	`LLMClient` needs hardening	Added timeout, env var config, error handling

Medium — Fixed

Issue	Fix
Duration log "incomplete"	`completed_at` set before `_calc_total_duration()`
`session_id: None` crashes `ExecutionContext`	`initialize_state()` raises `ValueError` if None
isort violation	Fixed import order
Two schema files	`execution-plan-big.json` stays as reference/landmark for full schema vision
Test asserts `AttributeError`	Now asserts `ExecutionError`

Tests updated to reflect all changes.

General notes / clarifications Conditional logic (_should_skip) The previous implementation was broken and half-implemented. Rather than ship incorrect branching semantics, I’ve stubbed it to return False for now. Proper conditional execution will be implemented once we finalize the conditional expression semantics and the richer plan schema. execution-plan-big.json This is intentionally kept as a landmark for the full schema vision. The smaller execution-plan.json is the one used in practice today. Enhanced plan persistence Enhanced plans are now created safely (non-blocking) but are not yet persisted. The intention is to eventually store a refined enhanced plan and issue a meaningful token once the schema and user-context embedding (paths, configs, auth, etc.) are finalized. Next steps Refining the execution plan schema and enriching it with real user/runtime context — this is where proper conditional logic will come back into play. Summary of fixes Critical (1–4) — All fixed 1 RecoveryPath / ConditionalBranch defined after use Classes moved before EnhancedExecutionPlan 2 create_enhanced_plan() discarded + no error handling Wrapped in try/except, logged failures, returns EnhancedExecutionPlan (non-blocking) 3 Schema path CWD-relative Resolved to absolute path derived from patterns_path 4 Missing src/llm/__init__.py Added High (5–9) — All fixed 5 Retry off-by-one max_attempts = 1 + max_retries 6 _should_skip() broken Stubbed to return False until semantics are defined 7 Sensitive args sent to LLM Added _sanitize_for_llm() + security/audit logging 8 Unused imports in executor.py Removed 9 LLMClient hardening Timeout, env-configurable model, error handling, dependency declared Medium — Fixed Issue Fix Duration log always “incomplete” completed_at set before _calc_total_duration() session_id=None crashes ExecutionContext initialize_state() validates and errors early isort violation Import order fixed Two schema files execution-plan-big.json kept intentionally as reference Test asserting AttributeError Updated to assert ExecutionError Tests Updated to reflect all changes

Moves the Anthropic import behind LLMClient initialization so optional LLM enhancement no longer affects import-time behavior or core governance paths. This change ensures: - Governance evaluation and execution remain functional when LLM enhancement is disabled or misconfigured - Missing dependencies or API keys fail locally and explicitly, not at module import time - Clear error messages guide correct installation and configuration Adds targeted tests covering: - Missing anthropic dependency - Missing ANTHROPIC_API_KEY - Successful lazy initialization when enhancement is enabled This reduces blast radius for optional integrations and tightens the boundary between core governance logic and external LLM providers.

Tighten failure handling around LLM-based plan enhancement. Enhancement remains best-effort and non-authoritative: failures are caught locally and never affect core governance evaluation or execution. Improvements include: - Richer warning logs with plan_id, error type, and full traceback to aid debugging and post-mortem analysis - Removal of a redundant ExecutionError definition to avoid duplicate error types - Integration test coverage asserting that enhancement failures are logged with sufficient diagnostic detail This reduces silent failure modes while preserving strict isolation between optional LLM enhancement and core execution paths.

…lidation Tighten security and determinism around LLM-based plan enhancement. This change introduces an allowlist-based sanitization strategy for all data sent to external LLMs. Only explicitly permitted structural fields and safe argument keys are allowed through; everything else is redacted. This replaces the previous sensitive-key denylist with a stricter, governance-appropriate posture. Additional improvements: - Validate LLM output against the execution-plan JSON schema before constructing an EnhancedExecutionPlan - Make schema path resolution deterministic and independent of CWD - Move LLM dependencies behind an optional dependency group - Expand tests to cover allowlist sanitization, schema validation, and schema path resolution behavior The goal is to reduce blast radius, prevent credential or PII leakage, and ensure LLM enhancement remains auditable, deterministic, and non-authoritative.

Moved jsonschema to core dependencies (it's small, reasonable)

Remove try/except ImportError in complete() - anthropic is guaranteed available since __init__ succeeded. Delete corresponding test that simulated module disappearing mid-process (impossible scenario).

- Rename agentContext to agent_context (snake_case consistency) - Add EXPERIMENTAL docstring to engine.py, injected_context.py - Clarify ValidationError handling comment in middleware.py - Update tests to use snake_case key

Move execution-plan-big.json to docs/reference/ and rename to execution-plan-reference.json. This schema is not used in production - it documents the full vision for plan structure to guide future work.

Add an explicit note explaining why EnhancedExecutionPlan remains mutable in this iteration. The plan primitive is still being refined, and execution state is embedded temporarily to avoid prematurely locking lifecycle and ownership boundaries. Once the plan schema stabilizes, execution state will be extracted into a separate mutable container owned by the execution engine and the plan model will be frozen.

Add debug logging when skipping malformed recovery_paths and conditionals. Update tests to verify logging occurs.

Add @requires_llm marker to tests that need real Anthropic client. Tests auto-skip with clear reason when key is unavailable.

Guard asyncio.create_task for sync callers like evaluate_governance().

nayname mentioned this pull request Feb 9, 2026

Execution governance for delegated tool calls yi-john-huang/openclaw-secure-stack#5

Closed

nayname self-assigned this Feb 15, 2026

nayname mentioned this pull request Feb 19, 2026

Execution governance yi-john-huang/openclaw-secure-stack#14

Merged

This was referenced Feb 20, 2026

feat: execution plan governance plugin openclaw/openclaw#21648

Closed

Execution governance #4

Merged

nayname force-pushed the execution_governance branch from 750f805 to 72a10d8 Compare February 20, 2026 14:18

nayname mentioned this pull request Feb 25, 2026

Execution visibility when using tool calling/MCP programmatically sreenathmmenon/llmswap#10

Open

nayname added 17 commits February 27, 2026 12:04

Execution plan

ce7d109

Cleaning up

26f169a

Tests

549e24b

LLM in enhanced plan creation

fb60f11

This engine.py is not used

a808603

Planner oimport cleaning

304d15f

Delete src/execution/execution.py

97465a2

Delete src/llm/simle_call.py

5a545c8

fix: Move jsonschema to core dependencies

3e69b04

Moved jsonschema to core dependencies (it's small, reasonable)

fix: Making create_enhanced_plan async

55d20b7

docs: added docstring that Exutor as a dead code now

6318d0f

fix: moved docstring to correct place

06f9232

fix: remove redundant import guard and impossible test case

c007bd8

Remove try/except ImportError in complete() - anthropic is guaranteed available since __init__ succeeded. Delete corresponding test that simulated module disappearing mid-process (impossible scenario).

nayname added 5 commits February 27, 2026 12:04

style: fix naming conventions and add EXPERIMENTAL docstring

b960a9a

- Rename agentContext to agent_context (snake_case consistency) - Add EXPERIMENTAL docstring to engine.py, injected_context.py - Clarify ValidationError handling comment in middleware.py - Update tests to use snake_case key

docs: move reference schema out of config

6a9a3b2

Move execution-plan-big.json to docs/reference/ and rename to execution-plan-reference.json. This schema is not used in production - it documents the full vision for plan structure to guide future work.

fix(planner): log malformed LLM output entries instead of silent skip

3a022a0

Add debug logging when skipping malformed recovery_paths and conditionals. Update tests to verify logging occurs.

test: skip LLM integration tests when ANTHROPIC_API_KEY not set

ad4ae41

Add @requires_llm marker to tests that need real Anthropic client. Tests auto-skip with clear reason when key is unavailable.

nayname force-pushed the execution_governance branch from c454d22 to ad4ae41 Compare February 27, 2026 12:12

fix(middleware): skip async enhancement when no event loop running

59c30cc

Guard asyncio.create_task for sync callers like evaluate_governance().

nayname merged commit 1f35f57 into master Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution plan#1

Execution plan#1
nayname merged 23 commits intomasterfrom
execution_governance

nayname commented Feb 9, 2026 •

edited

Loading

Uh oh!

yi-john-huang commented Feb 16, 2026

Uh oh!

nayname commented Feb 16, 2026 •

edited

Loading

Uh oh!

nayname commented Feb 18, 2026

Uh oh!

nayname commented Feb 19, 2026

Uh oh!

yi-john-huang commented Feb 21, 2026

Uh oh!

nayname commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nayname commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

What Changed

Execution Plan Schema

Planner → Execution Plan Artifact

New Governance Engine

Middleware Refactor (Evaluate vs Execute)

Execution Modes

Engine Mode (default)

Agent Mode

What Did NOT Change

Architecture

New Flow

Why This Matters

Breaking Changes

Uh oh!

yi-john-huang commented Feb 16, 2026

Critical Issues (Blocks Merge)

1. Missing file: engine_adapter.py — ImportError on startup

2. await inside non-async function — SyntaxError

3. Breaking constructor: PlanGenerator.__init__ now requires llm

4. GovernanceEnforcer() called without required args

5. self.plan doesn't exist on GovernanceMiddleware

6. Two generate() methods with incompatible signatures

7. state=None passed to AgentContextInjector.generate_context()

Major Issues

8. Placeholder prompt: EXECUTION_PLAN_PROMPT = "\n...\n"

9. Deprecated datetime.utcnow()

10. Silent exception swallowing in engine

11. Zero tests — 1,094 lines of untested code

12. Attribute mismatch: planId vs plan_id

Minor / Style Issues

13. Uses Dict, List from typing instead of builtins

14. Formatting inconsistency in imports

15. Trailing newline removed from middleware.py

16. Schema file never loaded or referenced by code

17. EnhancedExecutionPlan.conditionals and recovery_paths are commented out

Architectural Feedback

Recommendation

Uh oh!

nayname commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nayname commented Feb 18, 2026

Summary

Critical Issues (1-7) — All Fixed

Major/Minor Issues — Fixed

Architectural Feedback — Addressed

Uh oh!

nayname commented Feb 19, 2026

Uh oh!

yi-john-huang commented Feb 21, 2026

What's Good

Critical (Must Fix)

High (Should Fix)

Medium

Uh oh!

nayname commented Feb 21, 2026

General Notes:

Critical (1-4) — All Fixed

High (5-9) — All Fixed

Medium — Fixed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nayname commented Feb 9, 2026 •

edited

Loading

1. Missing file: `engine_adapter.py` — ImportError on startup

2. `await` inside non-async function — SyntaxError

3. Breaking constructor: `PlanGenerator.init` now requires `llm`

4. `GovernanceEnforcer()` called without required args

5. `self.plan` doesn't exist on `GovernanceMiddleware`

6. Two `generate()` methods with incompatible signatures

7. `state=None` passed to `AgentContextInjector.generate_context()`

8. Placeholder prompt: `EXECUTION_PLAN_PROMPT = "\n...\n"`

9. Deprecated `datetime.utcnow()`

12. Attribute mismatch: `planId` vs `plan_id`

13. Uses `Dict`, `List` from `typing` instead of builtins

15. Trailing newline removed from `middleware.py`

17. `EnhancedExecutionPlan.conditionals` and `recovery_paths` are commented out

nayname commented Feb 16, 2026 •

edited

Loading