Skip to content

feat: implement LLM decomposition strategy and workspace isolation#174

Merged
Aureliolo merged 3 commits intomainfrom
feat/llm-decomposition-workspace-isolation
Mar 8, 2026
Merged

feat: implement LLM decomposition strategy and workspace isolation#174
Aureliolo merged 3 commits intomainfrom
feat/llm-decomposition-workspace-isolation

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • LLM decomposition strategy (engine/decomposition/llm.py): Tool-calling-based task decomposition with JSON content fallback, depth validation, blank model rejection, and configurable LLM parameters
  • Prompt builder (engine/decomposition/llm_prompt.py): Structured system/user prompt construction with JSON schema for subtask extraction, required field validation
  • Workspace isolation (engine/workspace/): Git worktree-based concurrent workspace isolation with PlannerWorktreeStrategy, MergeOrchestrator (completion/priority/manual merge order + conflict escalation), and WorkspaceIsolationService (lifecycle orchestration with rollback-on-failure setup, best-effort teardown)
  • Workspace models: Frozen Pydantic models with cross-field validation (success/conflicts/SHA consistency), ConflictEscalation enum for escalation field, datetime for created_at
  • Git safety: Asyncio lock serialization, _validate_git_ref() for argument injection prevention, branch cleanup on worktree failure
  • Observability: 19 new workspace event constants, integration with structured logging throughout
  • Error hierarchy: WorkspaceError base with WorkspaceSetupError, WorkspaceMergeError, WorkspaceCleanupError, WorkspaceLimitError
  • Documentation: Updated DESIGN_SPEC.md §6.8/§6.9/§15.3, CLAUDE.md package structure, README.md features

Test plan

  • 3847 tests pass (uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80)
  • 96% code coverage
  • mypy strict passes
  • ruff lint + format clean
  • Unit tests for all workspace components (models, config, protocol, git_worktree, merge, service)
  • Unit tests for LLM decomposition (strategy, prompt builder)
  • Integration test for workspace lifecycle
  • Cross-field validation tests (success+conflicts, success+no SHA, failure+SHA)
  • Rollback-on-failure and best-effort teardown tests
  • Merge error handling and workspace sort edge case tests
  • Event constants discovery and dot-pattern tests

Review coverage

Pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency). 42 findings addressed.

Closes #168
Closes #133

🤖 Generated with Claude Code

, #133)

Add LlmDecompositionStrategy using tool calling for structured LLM output
with content fallback and retry loop. Add workspace isolation module with
git-worktree-based PlannerWorktreeStrategy, MergeOrchestrator, and
WorkspaceIsolationService for concurrent agent execution.
…ecomposition

Pre-reviewed by 9 agents, 42 findings addressed:
- Add cross-field model_validator on MergeResult for success/conflicts/SHA consistency
- Change Workspace.created_at from str to datetime, MergeResult.escalation to ConflictEscalation enum
- Add git ref validation to prevent argument injection via dash-prefixed strings
- Fix returncode handling (None → -1), add explicit UTF-8 encoding with error replacement
- Serialize merge_workspace and teardown_workspace with asyncio.Lock
- Add branch cleanup on worktree creation failure (prevent orphaned branches)
- Check merge --abort return code, raise WorkspaceMergeError on failure
- Add rollback in setup_group, best-effort teardown in teardown_group
- Handle WorkspaceMergeError in merge_all (create failure result vs abort)
- Append unmentioned workspaces in _sort_workspaces (prevent silent drops)
- Add model validation for blank LLM model string
- Add structured logging for conflict collection, enum defaults, validation errors
- Use NotBlankStr for worktree_base_dir config field
- Update DESIGN_SPEC.md project structure with workspace/ and decomposition files
- Update CLAUDE.md engine description, README.md feature list
Copilot AI review requested due to automatic review settings March 8, 2026 18:34
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 8, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 8, 2026

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • LLM-based task decomposition strategy with intelligent subtask generation and error recovery
    • Concurrent workspace isolation enabling parallel task execution with Git-backed isolation
    • Configurable merge policies for workspace synchronization with conflict escalation options
    • Enhanced workspace lifecycle management with automatic rollback on setup failures
  • Documentation

    • Updated design specifications and README with new decomposition and workspace capabilities

Walkthrough

This pull request introduces two major features: an LLM-based task decomposition strategy that uses an LLM provider to autonomously decompose tasks into subtasks with tool-calling support and fallback JSON parsing, and a complete workspace isolation subsystem supporting git worktree-based concurrent development with sequential merge orchestration, conflict escalation, and configurable merge strategies.

Changes

Cohort / File(s) Summary
Documentation
CLAUDE.md, DESIGN_SPEC.md, README.md
Updated documentation to reflect M4+ capabilities for workspace isolation and LLM-based decomposition; added feature descriptions and implementation references.
Core Enums
src/ai_company/core/enums.py
Added MergeOrder (COMPLETION, PRIORITY, MANUAL) and ConflictEscalation (HUMAN, REVIEW_AGENT) enum classes for workspace merge configuration.
Decomposition Errors & Exports
src/ai_company/engine/__init__.py
Exposed five new workspace error types (WorkspaceCleanupError, WorkspaceError, WorkspaceLimitError, WorkspaceMergeError, WorkspaceSetupError) via public API exports.
LLM-Based Decomposition Strategy
src/ai_company/engine/decomposition/llm.py, src/ai_company/engine/decomposition/llm_prompt.py, src/ai_company/engine/decomposition/__init__.py
Implemented LlmDecompositionStrategy with configurable temperature and retry logic, tool-call parsing with JSON content fallback, depth/subtask validation, and comprehensive prompt builders for system/task/retry messages.
Workspace Error Hierarchy
src/ai_company/engine/errors.py
Created new WorkspaceError base exception and four specialized subclasses (WorkspaceSetupError, WorkspaceMergeError, WorkspaceCleanupError, WorkspaceLimitError) for workspace lifecycle failures.
Workspace Isolation Protocol & Configuration
src/ai_company/engine/workspace/protocol.py, src/ai_company/engine/workspace/config.py
Defined pluggable WorkspaceIsolationStrategy protocol with setup/merge/teardown/list methods; added frozen config models (PlannerWorktreesConfig, WorkspaceIsolationConfig) with merge order, conflict escalation, and concurrency constraints.
Workspace Domain Models
src/ai_company/engine/workspace/models.py
Introduced immutable domain models: WorkspaceRequest, Workspace, MergeConflict, MergeResult (with success/conflict/commit consistency validation), and WorkspaceGroupResult with computed aggregate fields (all\_merged, total\_conflicts).
Git Worktree Implementation
src/ai_company/engine/workspace/git_worktree.py
Implemented PlannerWorktreeStrategy with git worktree lifecycle (setup validates limits and creates isolated branches; merge performs sequential no-ff merges with conflict collection; teardown cleans up worktrees/branches); includes internal git command execution, ref validation, and conflict parsing.
Merge Orchestration
src/ai_company/engine/workspace/merge.py
Created MergeOrchestrator to coordinate sequential workspace merges with configurable order (completion, priority, manual), conflict escalation policies (human stops merge, review\_agent continues), and optional post-merge cleanup with event logging.
Workspace Service & Exports
src/ai_company/engine/workspace/service.py, src/ai_company/engine/workspace/__init__.py
Added WorkspaceIsolationService as high-level orchestrator for setup\_group/merge\_group/teardown\_group with rollback-on-failure and error accumulation; centralized package exports for all workspace subsystem components.
Observability Events
src/ai_company/observability/events/decomposition.py, src/ai_company/observability/events/workspace.py
Added event constants for LLM decomposition lifecycle (call start/complete, parse error, retry) and workspace operations (setup/merge/teardown/group/sorting/limit).
Unit Tests
tests/unit/engine/test_decomposition_llm.py, tests/unit/engine/test_decomposition_llm_prompt.py, tests/unit/engine/test_workspace_*.py, tests/unit/observability/test_events.py
Comprehensive unit test suite covering LLM strategy behavior, prompt generation, response parsing, workspace lifecycle, config validation, protocol conformance, merge orchestration, and event constants.
Integration Tests & Test Fixtures
tests/integration/engine/test_workspace_integration.py, tests/unit/engine/conftest.py
Added integration tests exercising real git operations (worktree setup/merge/conflict/cleanup) and extended conftest with workspace/merge result builders and mocked provider instrumentation (recorded\_messages, recorded\_tools).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LlmDecompositionStrategy
    participant CompletionProvider
    participant PromptHelpers
    participant ResponseParser
    participant DecompositionService

    User->>LlmDecompositionStrategy: decompose(task, context)
    LlmDecompositionStrategy->>LlmDecompositionStrategy: _check_depth(context)
    LlmDecompositionStrategy->>PromptHelpers: _build_initial_messages(task, context)
    PromptHelpers-->>LlmDecompositionStrategy: [system_msg, task_msg]
    
    loop retry_loop (max_retries)
        LlmDecompositionStrategy->>CompletionProvider: complete(messages, tool=decomposition_tool)
        CompletionProvider-->>LlmDecompositionStrategy: CompletionResponse
        LlmDecompositionStrategy->>ResponseParser: _parse_response(response)
        
        alt Tool Call Present
            ResponseParser->>ResponseParser: parse_tool_call_response()
            ResponseParser-->>LlmDecompositionStrategy: DecompositionPlan
        else Fallback to Content
            ResponseParser->>ResponseParser: parse_content_response()
            ResponseParser-->>LlmDecompositionStrategy: DecompositionPlan
        end
        
        alt Parse Success
            LlmDecompositionStrategy->>LlmDecompositionStrategy: _validate_plan(plan, context)
            break Return Plan
                LlmDecompositionStrategy-->>DecompositionService: DecompositionPlan
            end
        else Parse Failure
            LlmDecompositionStrategy->>PromptHelpers: build_retry_message(error)
            PromptHelpers-->>LlmDecompositionStrategy: retry_msg
        end
    end
    
    LlmDecompositionStrategy-->>DecompositionService: DecompositionError (retries exhausted)
Loading
sequenceDiagram
    participant Client
    participant WorkspaceIsolationService
    participant PlannerWorktreeStrategy
    participant MergeOrchestrator
    participant GitRuntime

    Client->>WorkspaceIsolationService: setup_group(requests)
    loop For Each Request
        WorkspaceIsolationService->>PlannerWorktreeStrategy: setup_workspace(request)
        PlannerWorktreeStrategy->>PlannerWorktreeStrategy: _validate_git_ref()
        PlannerWorktreeStrategy->>PlannerWorktreeStrategy: check max_concurrent limit
        PlannerWorktreeStrategy->>GitRuntime: create branch workspace/{task_id}
        GitRuntime-->>PlannerWorktreeStrategy: branch_ref
        PlannerWorktreeStrategy->>GitRuntime: add worktree at path
        GitRuntime-->>PlannerWorktreeStrategy: success
        PlannerWorktreeStrategy-->>WorkspaceIsolationService: Workspace
    end
    WorkspaceIsolationService-->>Client: tuple[Workspace]

    Client->>WorkspaceIsolationService: merge_group(workspaces)
    WorkspaceIsolationService->>MergeOrchestrator: merge_all(workspaces, order)
    
    loop For Each Workspace (sorted)
        MergeOrchestrator->>PlannerWorktreeStrategy: merge_workspace(workspace)
        PlannerWorktreeStrategy->>GitRuntime: checkout base_branch
        PlanplannerWorktreeStrategy->>GitRuntime: git merge --no-ff workspace/branch
        
        alt Merge Success
            GitRuntime-->>PlannerWorktreeStrategy: merge_commit_sha
            PlannerWorktreeStrategy-->>MergeOrchestrator: MergeResult(success=true)
        else Merge Conflict
            GitRuntime-->>PlannerWorktreeStrategy: conflict files
            PlannerWorktreeStrategy->>PlannerWorktreeStrategy: _collect_conflicts()
            PlannerWorktreeStrategy->>GitRuntime: git merge --abort
            PlannerWorktreeStrategy-->>MergeOrchestrator: MergeResult(conflicts=[...])
        end
        
        alt Escalation Policy
            MergeOrchestrator->>MergeOrchestrator: apply_escalation (HUMAN stops, REVIEW_AGENT continues)
        end
    end
    
    MergeOrchestrator-->>WorkspaceIsolationService: tuple[MergeResult]
    WorkspaceIsolationService-->>Client: WorkspaceGroupResult

    Client->>WorkspaceIsolationService: teardown_group(workspaces)
    loop For Each Workspace
        WorkspaceIsolationService->>PlannerWorktreeStrategy: teardown_workspace(workspace)
        PlannerWorktreeStrategy->>GitRuntime: remove worktree
        GitRuntime-->>PlannerWorktreeStrategy: success
        PlannerWorktreeStrategy->>GitRuntime: delete branch
        GitRuntime-->>PlannerWorktreeStrategy: success
    end
    WorkspaceIsolationService-->>Client: complete
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • PR #170: Modifies engine decomposition surface (models/service/validation) with overlapping changes to decomposition-related classes and validations.
  • PR #86: Introduces CompletionProvider types and provider models/protocols that the new LLM decomposition strategy directly depends on for completion calls.
  • PR #165: Extends engine decomposition subsystem with foundational decomposition infrastructure that this PR builds upon with LLM-based strategy and prompt parsing additions.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: implementing LLM decomposition strategy and workspace isolation, which are the two core features added.
Description check ✅ Passed The PR description comprehensively details both features (LLM decomposition and workspace isolation), test coverage, and references the linked issues, directly relating to the changeset.
Linked Issues check ✅ Passed The PR fully implements both linked issue requirements: LlmDecompositionStrategy with context validation, error handling, and unit tests (#168); WorkspaceIsolationStrategy protocol with PlannerWorktreeStrategy, MergeOrchestrator, merge ordering, conflict escalation, and comprehensive tests (#133).
Out of Scope Changes check ✅ Passed All changes are directly scoped to the two linked issues: decomposition and workspace isolation implementations, supporting enums/models/config, observability events, tests, and documentation updates. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 90.36% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/llm-decomposition-workspace-isolation
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/llm-decomposition-workspace-isolation

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly advances the AI company framework by introducing sophisticated LLM-driven task decomposition and a robust Git-based workspace isolation system. These features are crucial for enabling multiple AI agents to collaboratively and safely work on a codebase, improving the system's scalability, reliability, and overall intelligence in handling complex development tasks. The changes lay the groundwork for more advanced multi-agent coordination and efficient code management.

Highlights

  • LLM Decomposition Strategy: Implemented a tool-calling-based task decomposition strategy using LLMs, featuring JSON content fallback, depth validation, blank model rejection, and configurable LLM parameters. This enhances the system's ability to break down complex tasks automatically.
  • Workspace Isolation: Introduced a robust workspace isolation system utilizing Git worktrees, enabling concurrent multi-agent development. This includes a PlannerWorktreeStrategy, MergeOrchestrator for sequential merging with configurable conflict escalation, and a WorkspaceIsolationService for lifecycle management with rollback and best-effort teardown.
  • Enhanced Models and Enums: Added new Pydantic models for workspace requests, active workspaces, merge results, and group results, including cross-field validation for consistency. New enums MergeOrder and ConflictEscalation were introduced to manage merge behavior.
  • Git Safety and Observability: Improved Git operation safety with asyncio lock serialization and input validation to prevent argument injection. Integrated 19 new workspace event constants with structured logging for better monitoring and debugging.
  • Comprehensive Error Handling: Established a new WorkspaceError hierarchy, including WorkspaceSetupError, WorkspaceMergeError, WorkspaceCleanupError, and WorkspaceLimitError, to provide clearer error reporting for workspace-related failures.
  • Documentation Updates: Updated DESIGN_SPEC.md, CLAUDE.md, and README.md to reflect the new LLM decomposition and workspace isolation features, ensuring documentation aligns with the implemented functionality.
Changelog
  • CLAUDE.md
    • Updated the description of the engine/ directory to include workspace isolation.
    • Updated the package structure to reflect new workspace and LLM decomposition modules.
  • DESIGN_SPEC.md
    • Updated section 6.8 to reflect the implementation of WorkspaceIsolationStrategy, PlannerWorktreeStrategy, MergeOrchestrator, and WorkspaceIsolationService.
    • Updated section 6.9 to include LlmDecompositionStrategy in the description of task decomposition capabilities.
    • Added new file structure entries for llm.py, llm_prompt.py within engine/decomposition/.
    • Added new file structure entries for workspace/ and its submodules within engine/.
    • Added workspace.py to the observability events section.
  • README.md
    • Updated the 'Task Decomposition & Routing' feature description to include LLM-based decomposition.
    • Added a new feature entry for 'Workspace Isolation' describing Git worktree-based concurrent workspace isolation.
  • src/ai_company/core/enums.py
    • Added MergeOrder enum to define strategies for merging workspace branches.
    • Added ConflictEscalation enum to define strategies for handling merge conflicts.
  • src/ai_company/engine/init.py
    • Imported and exported new workspace-related error classes: WorkspaceCleanupError, WorkspaceError, WorkspaceLimitError, WorkspaceMergeError, and WorkspaceSetupError.
  • src/ai_company/engine/decomposition/init.py
    • Imported and exported LlmDecompositionConfig and LlmDecompositionStrategy.
  • src/ai_company/engine/decomposition/llm.py
    • Added LlmDecompositionStrategy for LLM-based task decomposition with tool calling and JSON fallback.
    • Added LlmDecompositionConfig for configuring the LLM decomposition strategy.
  • src/ai_company/engine/decomposition/llm_prompt.py
    • Added prompt building and response parsing utilities for LLM-based decomposition, including tool definitions and message construction.
  • src/ai_company/engine/errors.py
    • Added a new base exception WorkspaceError and its specific subclasses for setup, merge, cleanup, and limit errors.
  • src/ai_company/engine/workspace/init.py
    • Added __init__.py to export workspace isolation components like MergeConflict, MergeOrchestrator, MergeResult, PlannerWorktreeStrategy, PlannerWorktreesConfig, Workspace, WorkspaceGroupResult, WorkspaceIsolationConfig, WorkspaceIsolationService, and WorkspaceIsolationStrategy.
  • src/ai_company/engine/workspace/config.py
    • Added PlannerWorktreesConfig for Git worktree strategy configuration.
    • Added WorkspaceIsolationConfig for top-level workspace isolation configuration.
  • src/ai_company/engine/workspace/git_worktree.py
    • Added PlannerWorktreeStrategy implementing Git worktree-based workspace isolation.
    • Implemented methods for setting up, merging, and tearing down workspaces, including Git command execution and safety checks.
  • src/ai_company/engine/workspace/merge.py
    • Added MergeOrchestrator for sequencing workspace merges and handling conflict escalation based on configured strategies.
  • src/ai_company/engine/workspace/models.py
    • Added Pydantic models for WorkspaceRequest, Workspace, MergeConflict, MergeResult, and WorkspaceGroupResult, including cross-field validation.
  • src/ai_company/engine/workspace/protocol.py
    • Added WorkspaceIsolationStrategy protocol defining the interface for workspace isolation implementations.
  • src/ai_company/engine/workspace/service.py
    • Added WorkspaceIsolationService to coordinate the lifecycle of workspace groups, including setup, merge, and teardown with rollback and best-effort cleanup.
  • src/ai_company/observability/events/decomposition.py
    • Added new event constants for LLM decomposition, including call start/complete, parse errors, and retries.
  • src/ai_company/observability/events/workspace.py
    • Added a new file defining constants for various workspace-related events, such as setup, merge, teardown, and limit reached.
  • tests/integration/engine/test_workspace_integration.py
    • Added integration tests for PlannerWorktreeStrategy covering different file edits, same file conflicts, worktree cleanup, and limit enforcement.
  • tests/unit/engine/conftest.py
    • Updated MockCompletionProvider to record messages and tools passed to the complete method.
    • Added helper functions make_workspace and make_merge_result for creating test data for workspace models.
  • tests/unit/engine/test_decomposition_llm.py
    • Added unit tests for LlmDecompositionStrategy, covering happy paths, depth limits, subtask limits, retry mechanisms, and error propagation.
  • tests/unit/engine/test_decomposition_llm_prompt.py
    • Added unit tests for LLM decomposition prompt building and response parsing functions, including tool definition, system/task messages, and JSON parsing.
  • tests/unit/engine/test_workspace_config.py
    • Added unit tests for PlannerWorktreesConfig and WorkspaceIsolationConfig models, verifying defaults, custom values, immutability, and validation rules.
  • tests/unit/engine/test_workspace_git_worktree.py
    • Added unit tests for PlannerWorktreeStrategy, covering setup, merge, teardown, active workspace listing, and error handling for Git operations.
  • tests/unit/engine/test_workspace_merge.py
    • Added unit tests for MergeOrchestrator, verifying completion, priority, and manual merge orders, as well as conflict escalation behaviors.
  • tests/unit/engine/test_workspace_models.py
    • Added unit tests for WorkspaceRequest, Workspace, MergeConflict, MergeResult, and WorkspaceGroupResult models, including validation and computed properties.
  • tests/unit/engine/test_workspace_protocol.py
    • Added unit tests to ensure WorkspaceIsolationStrategy conforms to the runtime checkable protocol and defines expected methods.
  • tests/unit/engine/test_workspace_service.py
    • Added unit tests for WorkspaceIsolationService, covering group setup with rollback, group merge, and best-effort group teardown.
  • tests/unit/observability/test_events.py
    • Updated event discovery tests to include the new workspace module.
    • Added tests to verify the existence and values of new workspace event constants.
Activity
  • Pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency).
  • 42 findings addressed during pre-review.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two major features: an LLM-based task decomposition strategy and a git worktree-based workspace isolation system. While the implementation demonstrates high quality, robust design, and thorough error handling, two significant security vulnerabilities were identified: a potential for Prompt Injection in the task decomposition prompt construction, and a Git Ref Namespace Escape vulnerability that could lead to the unintended deletion of critical branches like main during workspace teardown. Addressing these security concerns is essential. Additionally, consider improving the retry logic in the LLM decomposition strategy to provide better context to the model on failures.

Comment on lines +45 to +60
_SAFE_REF_RE = re.compile(r"^[A-Za-z0-9._/-]+$")


def _validate_git_ref(value: str, label: str) -> None:
"""Validate that a string is safe for use as a git ref argument.

Args:
value: The string to validate.
label: Human-readable label for error messages.

Raises:
WorkspaceSetupError: If the value is unsafe for git.
"""
if not value or value.startswith("-") or not _SAFE_REF_RE.match(value):
msg = f"Unsafe {label} for git: {value!r}"
raise WorkspaceSetupError(msg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The _validate_git_ref function uses a regular expression (_SAFE_REF_RE) that allows dots (.) and slashes (/), and it only checks if the value starts with a dash (-). This allows an attacker to provide a task_id containing path traversal sequences like ... When used to construct a branch name (e.g., f"workspace/{request.task_id}"), this can result in a branch name that escapes the intended workspace/ namespace.

Most critically, the teardown_workspace method uses git branch -D on the constructed branch name. If an attacker provides a task_id such as ../main, the resulting branch name workspace/../main may be resolved by git to main, leading to the unintended deletion of the main branch (or other critical branches) during workspace cleanup.

Recommendation: Update the _SAFE_REF_RE regular expression to disallow consecutive dots (..) and ensure that the task_id does not contain path traversal sequences. Alternatively, use git check-ref-format to validate branch names or strictly enforce that the resulting ref stays within the refs/heads/workspace/ hierarchy.

Comment on lines +175 to +180
f"Title: {task.title}",
f"Description: {task.description}",
]
if task.acceptance_criteria:
lines.append("Acceptance Criteria:")
lines.extend(f" - {c.description}" for c in task.acceptance_criteria)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The build_task_message function constructs a user message for the LLM by directly concatenating fields from the Task object (title, description, and acceptance_criteria) into the prompt string. If these fields contain user-supplied data, an attacker can craft a malicious task that performs a prompt injection attack. This could allow an attacker to manipulate the LLM's behavior during task decomposition, potentially injecting unauthorized or malicious subtasks (e.g., "Exfiltrate credentials", "Delete repository") into the workflow, which are subsequently executed by the system.

Recommendation: Sanitize and escape user-provided content before including it in prompts. Use clear delimiters (e.g., XML-style tags or triple quotes) and explicit instructions to the LLM to treat the content as untrusted data. Consider implementing a validation step or human-in-the-loop review for generated decomposition plans.

Comment on lines +135 to +208
for attempt in range(attempts):
if attempt > 0 and last_error is not None:
logger.info(
DECOMPOSITION_LLM_RETRY,
task_id=task.id,
attempt=attempt,
error=last_error,
)
messages = [
*messages,
build_retry_message(last_error),
]

logger.debug(
DECOMPOSITION_LLM_CALL_START,
task_id=task.id,
model=self._model,
attempt=attempt,
)

response = await self._provider.complete(
messages,
self._model,
tools=[tool_def],
config=comp_config,
)

logger.debug(
DECOMPOSITION_LLM_CALL_COMPLETE,
task_id=task.id,
finish_reason=response.finish_reason.value,
)

try:
plan = self._parse_response(response, task.id)
except DecompositionError as exc:
last_error = str(exc)
logger.warning(
DECOMPOSITION_LLM_PARSE_ERROR,
task_id=task.id,
attempt=attempt,
error=last_error,
)
continue

try:
self._validate_plan(plan, context)
except DecompositionError as exc:
last_error = str(exc)
logger.warning(
DECOMPOSITION_VALIDATION_ERROR,
task_id=task.id,
error=last_error,
)
continue

logger.debug(
DECOMPOSITION_COMPLETED,
task_id=task.id,
strategy="llm",
subtask_count=len(plan.subtasks),
)
return plan

msg = (
f"LLM decomposition retries exhausted after "
f"{attempts} attempts for task {task.id!r}"
)
logger.warning(
DECOMPOSITION_FAILED,
task_id=task.id,
error=msg,
)
raise DecompositionError(msg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The retry logic in this loop can be improved. The message history sent to the LLM on a retry does not include the assistant's previous failed response. This prevents the LLM from seeing its own mistake, increasing the chance of it repeating the same error.

A more robust approach is to build a complete conversation history. I recommend refactoring the loop to correctly manage the messages list by appending both the failed assistant response and the user retry prompt.

Here is a conceptual example of how this could be structured:

# In the `except DecompositionError as exc:` block:
# 1. Log the error.
# 2. Append the failed assistant response to the `messages` list.
from ai_company.providers.enums import MessageRole
messages.append(
    ChatMessage(
        role=MessageRole.ASSISTANT,
        content=response.content,
        tool_calls=response.tool_calls,
    )
)
# 3. Continue to the next iteration.

# In the `if attempt > 0:` block at the top of the loop:
# 1. Log the retry attempt.
# 2. Append the user's retry message.
messages.append(build_retry_message(last_error))

This ensures a correct conversational context is maintained for retries, helping the LLM to self-correct more effectively.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR delivers two substantial features: an LLM-based task decomposition strategy with tool-calling and JSON content fallback (engine/decomposition/llm.py + llm_prompt.py), and a complete git-worktree workspace isolation subsystem (engine/workspace/) that allows concurrent agents to operate on isolated branches with sequential merge-back and configurable conflict escalation. Both are well-tested (96% coverage, 3847 tests passing) and integrate cleanly with the existing error hierarchy, structured logging conventions, and frozen Pydantic model patterns.

Key findings:

  • _validate_git_ref raises the wrong exception type in merge and teardown contexts — it always raises WorkspaceSetupError regardless of whether it is called from merge_workspace or teardown_workspace, which would confuse callers that pattern-match on WorkspaceMergeError / WorkspaceCleanupError.
  • Git subprocess leaked on asyncio.CancelledError in _run_git — only TimeoutError is caught after asyncio.wait_for; task cancellation bypasses proc.kill(), potentially leaving orphaned git processes holding .git/index.lock or MERGE_HEAD and blocking all subsequent git operations.
  • The LLM decomposition retry conversation history, best-effort teardown broad exception catching, and "unknown" SHA validator bypass from previous review rounds are all correctly addressed in this version.

Confidence Score: 3/5

  • PR is functional and well-tested (96% coverage, 3847 tests pass) but has two production-impacting bugs in the git subprocess backend that should be resolved before use in a live multi-agent environment.
  • The architecture is sound and most components are solid (5/5 confidence). However, two bugs in git_worktree.py lower overall confidence: (1) _validate_git_ref raises the wrong exception type in merge/teardown contexts, breaking structured error handling for callers; (2) _run_git leaks subprocesses on asyncio.CancelledError, which can cause git lock files to persist and block future operations—a hard failure scenario in production. These are concrete, reproducible issues in a critical path.
  • src/ai_company/engine/workspace/git_worktree.py — both the _validate_git_ref exception type issue and the subprocess leak on CancelledError originate here.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant S as WorkspaceIsolationService
    participant M as MergeOrchestrator
    participant P as PlannerWorktreeStrategy
    participant G as Git (subprocess)

    C->>S: setup_group(requests)
    loop For each request
        S->>P: setup_workspace(request)
        P->>G: git branch workspace/task-id/uuid base_branch
        G-->>P: rc=0
        P->>G: git worktree add <path> <branch>
        G-->>P: rc=0
        P-->>S: Workspace
    end
    S-->>C: tuple[Workspace, ...]

    note over C,G: Agents work in isolated worktrees concurrently

    C->>S: merge_group(workspaces)
    S->>M: merge_all(workspaces)
    loop Sequential merge (serialized by asyncio.Lock)
        M->>P: merge_workspace(workspace)
        P->>G: git checkout base_branch
        G-->>P: rc=0
        P->>G: git merge --no-ff branch_name
        alt Success
            G-->>P: rc=0
            P->>G: git rev-parse HEAD
            G-->>P: commit_sha
            P-->>M: MergeResult(success=True, sha=...)
            opt cleanup_on_merge
                M->>P: teardown_workspace(workspace)
            end
        else Conflict
            G-->>P: rc=1
            P->>G: git diff --name-only --diff-filter=U
            G-->>P: conflicting files
            P->>G: git merge --abort
            G-->>P: rc=0
            P-->>M: MergeResult(success=False, conflicts=[...])
            alt HUMAN escalation
                M-->>S: stop (partial results)
            else REVIEW_AGENT escalation
                M-->>M: continue next workspace
            end
        end
    end
    M-->>S: tuple[MergeResult, ...]
    S-->>C: WorkspaceGroupResult

    C->>S: teardown_group(workspaces)
    loop Best-effort teardown
        S->>P: teardown_workspace(workspace)
        P->>G: git worktree remove <path> --force
        P->>G: git branch -D <branch>
        P-->>S: (errors collected, not raised)
    end
    S-->>C: (raises WorkspaceCleanupError if any failed)
Loading

Last reviewed commit: 5376f3f

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements two major features: an LLM-based task decomposition strategy that uses tool calling with a JSON content fallback, and a git worktree-based workspace isolation system that allows concurrent agents to work independently with sequential merge and configurable conflict escalation. It also adds 19 new workspace observability event constants, a workspace error hierarchy, MergeOrder/ConflictEscalation enums, and documentation updates.

Changes:

  • engine/decomposition/llm.py + llm_prompt.py: LLM decomposition strategy with tool-call-based structured output, JSON content fallback, retry logic, and depth/constraint validation
  • engine/workspace/: New workspace isolation subsystem — git worktree lifecycle (PlannerWorktreeStrategy), sequential merge orchestration (MergeOrchestrator), and group lifecycle service (WorkspaceIsolationService)
  • Supporting infrastructure: event constants, error hierarchy, enums, comprehensive unit + integration tests, and doc updates

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/ai_company/engine/decomposition/llm.py LLM strategy: provider calls, retry loop, depth guard
src/ai_company/engine/decomposition/llm_prompt.py Prompt/tool building and response parsing
src/ai_company/engine/decomposition/__init__.py Exports LlmDecompositionStrategy/Config
src/ai_company/engine/workspace/__init__.py Package exports for workspace subsystem
src/ai_company/engine/workspace/config.py Frozen Pydantic config models
src/ai_company/engine/workspace/models.py Domain models with cross-field validation
src/ai_company/engine/workspace/protocol.py Runtime-checkable protocol
src/ai_company/engine/workspace/git_worktree.py Git worktree CRUD with lock serialization and git-ref validation
src/ai_company/engine/workspace/merge.py Sequential merge orchestration
src/ai_company/engine/workspace/service.py Group lifecycle service with rollback/best-effort teardown
src/ai_company/engine/errors.py WorkspaceError hierarchy
src/ai_company/engine/__init__.py Workspace error exports
src/ai_company/core/enums.py MergeOrder/ConflictEscalation enums
src/ai_company/observability/events/workspace.py 19 workspace event constants
src/ai_company/observability/events/decomposition.py 4 LLM decomposition event constants
tests/unit/engine/conftest.py Workspace test helpers (make_workspace, make_merge_result)
tests/unit/engine/test_workspace_*.py Unit tests for all workspace components
tests/unit/engine/test_decomposition_llm*.py Unit tests for LLM decomposition
tests/integration/engine/test_workspace_integration.py Integration tests with real git operations
tests/unit/observability/test_events.py Workspace event constant discovery and value tests
DESIGN_SPEC.md, CLAUDE.md, README.md Documentation updates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +10
"""Integration tests for workspace isolation using real git operations.

These tests create temporary git repositories and exercise the full
PlannerWorktreeStrategy lifecycle with real git commands.
"""

import subprocess
from pathlib import Path

import pytest
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration test file is missing the pytestmark module-level marker. All other integration test files in tests/integration/engine/ consistently set pytestmark = [pytest.mark.integration, pytest.mark.timeout(30)] (e.g., test_agent_engine_integration.py:48, test_crash_recovery.py:35, test_multi_agent_delegation.py:85). Without the pytest.mark.integration marker, these tests will not be correctly filtered when running only integration tests, and without pytest.mark.timeout(30) they lack the module-level timeout guard that the rest of the integration suite has.

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +143
except WorkspaceCleanupError as exc:
logger.warning(
WORKSPACE_MERGE_FAILED,
workspace_id=workspace.workspace_id,
error=f"Post-merge cleanup failed: {exc}",
)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrong event constant is used here. WORKSPACE_MERGE_FAILED is emitted when a post-merge cleanup operation (workspace teardown) fails, but this event semantically represents a merge failure, not a teardown failure. The correct event is WORKSPACE_TEARDOWN_FAILED, which is the dedicated constant for teardown failures and is already imported in service.py. Observability dashboards filtering on workspace.merge.failed would incorrectly surface these teardown failures as merge failures.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/engine/decomposition/llm_prompt.py`:
- Around line 239-247: The code constructs SubtaskDefinition using
raw["subtasks"], raw.get("dependencies") and raw.get("required_skills") without
validating their types, which lets objects or strings be iterated into character
tuples; update the parsing logic around SubtaskDefinition (and the similar block
around lines handling subtasks at 268-273) to explicitly check that subtasks is
a list/tuple and that dependencies and required_skills are list/tuple (or None),
and if not raise a DecompositionError with a clear message; convert validated
lists into tuples for the SubtaskDefinition initializer
(dependencies=tuple(...), required_skills=tuple(...)) so malformed model output
yields a clean DecompositionError instead of character-by-character tuples or
obscure failures.
- Around line 327-341: The code currently raises DecompositionError on several
early-exit paths without emitting the DECOMPOSITION_LLM_PARSE_ERROR log; update
each failure path (the except DecompositionError re-raise in the _args_to_plan
call, the "No tool call for submit_decomposition_plan" path, and other
parse/JSON/content missing paths referenced around _args_to_plan and the later
block at 364-378) to call logger.warning(DECOMPOSITION_LLM_PARSE_ERROR,
error=<brief error message or exc>, exc_type=<type name>, context=<identifier
like parent_task_id or tc.arguments>) immediately before raising; ensure you
include the same structured fields (error and exc_type) used elsewhere so
telemetry captures every raise of DecompositionError.

In `@src/ai_company/engine/decomposition/llm.py`:
- Around line 277-302: The _validate_plan function currently only enforces
max_subtasks; additionally call the dependency-graph validator to reject cyclic
dependencies by invoking DependencyGraph.validate on the plan's graph (e.g.,
DependencyGraph.validate(plan.dependency_graph) or
plan.dependency_graph.validate()), catch any validation exception, log it with
DECOMPOSITION_VALIDATION_ERROR including subtask_count and max_subtasks, and
re-raise a DecompositionError with the validation message so cyclic plans are
rejected alongside the existing max_subtasks check.

In `@src/ai_company/engine/workspace/config.py`:
- Around line 20-21: The config models currently use ConfigDict(frozen=True)
which leaves Pydantic's extra="ignore" behavior and silently drops unknown keys;
update both ConfigDict instances used for PlannerWorktreesConfig and
WorkspaceIsolationConfig to include extra="forbid" (i.e.,
ConfigDict(frozen=True, extra="forbid")) so unknown fields are rejected and user
config typos surface as errors.

In `@src/ai_company/engine/workspace/git_worktree.py`:
- Around line 268-289: The code logs an error when "_run_git('rev-parse',
'HEAD')" fails but still returns a successful MergeResult with a synthetic
merged_commit_sha ("unknown"); change this so failure is propagated: when rc_sha
!= 0 (check the result from _run_git), do not set sha_out to "unknown" and
return success=True—either return a MergeResult with success=False and
merged_commit_sha=None (or raise/propagate an exception) so callers know the
merge did not yield a valid commit SHA. Update the block handling rc_sha in the
function that performs the merge (references: _run_git, rev-parse HEAD,
MergeResult, WORKSPACE_MERGE_FAILED, WORKSPACE_MERGE_COMPLETE, workspace) to log
the error and return a non-success MergeResult (or raise) instead of fabricating
"unknown".
- Around line 157-159: The branch name currently set as branch_name =
f"workspace/{request.task_id}" can collide across workspaces; change branch_name
to incorporate the unique workspace_id (or a deterministic short suffix of it)
so the ref becomes workspace/{request.task_id}-{workspace_id[:8]} (or similar)
and update any code that expects the old branch format; locate this in
git_worktree.py where workspace_id is generated and _resolve_worktree_path is
called and construct the branch name using workspace_id to ensure uniqueness.
- Around line 48-60: The _validate_git_ref function raises WorkspaceSetupError
for unsafe refs but doesn't log; before raising, emit a structured warning/error
log with context (include label, the rejected value, and the
_SAFE_REF_RE.pattern or reason) so failures produce telemetry; update
_validate_git_ref to call the module logger (e.g., logger.warning/error) or the
workspace event emitter with these fields and then raise WorkspaceSetupError as
before.

In `@src/ai_company/engine/workspace/merge.py`:
- Around line 133-143: The post-merge teardown currently swallows
WorkspaceCleanupError and only logs WORKSPACE_MERGE_FAILED while leaving
MergeResult as successful; extract the per-workspace merge+cleanup logic into a
small helper (e.g., _merge_and_cleanup_workspace(workspace)) that performs the
merge, calls self._strategy.teardown_workspace(workspace=workspace) when
self._cleanup_on_merge is True, and returns an explicit success/failure
indicator or raises an error; ensure the helper surfaces cleanup failures
(propagate or convert to a failed MergeResult entry) so MergeResult reflects
cleanup failures instead of reporting success, and keep the helper/function
under 50 lines.
- Around line 201-209: Deduplicate the requested order while preserving its
original sequence and avoid merging the same workspace twice by normalizing
`order` into a deterministic, unique list (e.g., build `normalized_order` by
iterating `order` and skipping repeats using a `seen` set) and use that instead
of `[ws_map[wid] for wid in order if wid in ws_map]`; for the fallback append,
preserve the original `workspaces` input order when extending omitted entries
instead of `sorted(missing)` by iterating the original `workspaces` list and
adding those whose ids are in `missing`, and update the code that builds
`result` (references: `order`, `ws_map`, `missing`, the list comprehension
producing `result`, and the `result.extend(...sorted(missing))` call).

In `@src/ai_company/engine/workspace/models.py`:
- Around line 29-32: The file_scope Field currently allows empty or
whitespace-only strings; tighten its element type so each tuple entry is a
non-empty, whitespace-stripped string (e.g., use
pydantic.constr(strip_whitespace=True, min_length=1) or an equivalent
NonEmptyStr type) to reject values like "" or "   ". Update the file_scope
declaration in src/ai_company/engine/workspace/models.py to use the constrained
string type for the tuple element (keeping the Field default and description) so
validation fails at the model boundary for blank hints.

In `@src/ai_company/engine/workspace/service.py`:
- Around line 67-117: The except path in setup_group currently only logs cleanup
failures and lets the original setup exception escape without group-level
context; add a log (logger.warning or logger.error) inside the except that
includes the group context (e.g., count=len(requests), identifiers if available)
and the caught exception before re-raising, and extract the rollback loop into a
new helper method (e.g., _rollback_workspaces(self, workspaces:
list[Workspace])) that iterates over workspaces and calls
self._strategy.teardown_workspace while handling WorkspaceCleanupError and
logging WORKSPACE_TEARDOWN_FAILED; update setup_group to call this helper so the
function stays concise and under 50 lines while preserving existing logging
constants (WORKSPACE_GROUP_SETUP_START, WORKSPACE_GROUP_SETUP_COMPLETE) and use
the same logger and error variables when re-raising.

In `@tests/unit/engine/test_workspace_git_worktree.py`:
- Around line 339-361: The test and reviewer point out that
PlannerWorktreeStrategy.merge_workspace currently converts a rev-parse failure
into a magic string "unknown"; instead change the merge handling so that when
PlannerWorktreeStrategy._run_git returns a non‑zero result for the
rev-parse/HEAD call you do not set merged_commit_sha to "unknown" — instead
either raise a WorkspaceMergeError from merge_workspace (preferred) or set
merged_commit_sha to None so callers can detect failure; update any
callers/tests that expect "unknown" accordingly and ensure error context from
_run_git is preserved in the exception or log.

In `@tests/unit/engine/test_workspace_merge.py`:
- Around line 172-205: The test currently only checks returned results but not
that the orchestrator actually stopped calling merge_workspace after the first
conflict; update test_human_escalation_stops_on_conflict to assert that
mock_strategy.merge_workspace was called exactly once (or assert it was not
called for "ws-2") after awaiting orch.merge_all, referencing
mock_strategy.merge_workspace, orch.merge_all, and the workspace ids/ws
variables (ws1, ws2) so the stop-on-conflict control flow is exercised; apply
the same extra assertion in the other similar test section mentioned (lines
309-335).

In `@tests/unit/engine/test_workspace_protocol.py`:
- Around line 12-17: Replace the fragile private-attribute checks in
test_protocol_is_runtime_checkable with a behavior-based conformance test:
implement a minimal local stub class that defines the required
methods/attributes of the WorkspaceIsolationStrategy protocol (use the same
method names/signatures referenced later in the file) and assert
isinstance(stub_instance, WorkspaceIsolationStrategy); do the same for the
related test around lines 29-42 so both tests validate runtime protocol
conformance via an actual stub implementing the protocol rather than checking
__protocol_attrs__ or _is_runtime_protocol or just names via dir().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 826579d7-edbc-41d7-b189-251f2251adb6

📥 Commits

Reviewing files that changed from the base of the PR and between c7f1b26 and 4a99398.

📒 Files selected for processing (29)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • README.md
  • src/ai_company/core/enums.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/decomposition/__init__.py
  • src/ai_company/engine/decomposition/llm.py
  • src/ai_company/engine/decomposition/llm_prompt.py
  • src/ai_company/engine/errors.py
  • src/ai_company/engine/workspace/__init__.py
  • src/ai_company/engine/workspace/config.py
  • src/ai_company/engine/workspace/git_worktree.py
  • src/ai_company/engine/workspace/merge.py
  • src/ai_company/engine/workspace/models.py
  • src/ai_company/engine/workspace/protocol.py
  • src/ai_company/engine/workspace/service.py
  • src/ai_company/observability/events/decomposition.py
  • src/ai_company/observability/events/workspace.py
  • tests/integration/engine/test_workspace_integration.py
  • tests/unit/engine/conftest.py
  • tests/unit/engine/test_decomposition_llm.py
  • tests/unit/engine/test_decomposition_llm_prompt.py
  • tests/unit/engine/test_workspace_config.py
  • tests/unit/engine/test_workspace_git_worktree.py
  • tests/unit/engine/test_workspace_merge.py
  • tests/unit/engine/test_workspace_models.py
  • tests/unit/engine/test_workspace_protocol.py
  • tests/unit/engine/test_workspace_service.py
  • tests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14
Line length: 88 characters (ruff enforced)

Files:

  • src/ai_company/engine/decomposition/__init__.py
  • tests/unit/engine/test_workspace_protocol.py
  • tests/unit/engine/test_workspace_merge.py
  • src/ai_company/observability/events/decomposition.py
  • src/ai_company/engine/workspace/protocol.py
  • src/ai_company/engine/workspace/config.py
  • tests/unit/engine/test_workspace_models.py
  • src/ai_company/engine/decomposition/llm.py
  • src/ai_company/engine/errors.py
  • src/ai_company/engine/decomposition/llm_prompt.py
  • tests/unit/engine/test_decomposition_llm.py
  • tests/unit/engine/test_workspace_config.py
  • tests/unit/engine/conftest.py
  • src/ai_company/core/enums.py
  • tests/integration/engine/test_workspace_integration.py
  • src/ai_company/engine/workspace/git_worktree.py
  • src/ai_company/engine/workspace/models.py
  • src/ai_company/engine/workspace/service.py
  • tests/unit/engine/test_decomposition_llm_prompt.py
  • tests/unit/observability/test_events.py
  • src/ai_company/engine/__init__.py
  • tests/unit/engine/test_workspace_git_worktree.py
  • src/ai_company/engine/workspace/merge.py
  • tests/unit/engine/test_workspace_service.py
  • src/ai_company/observability/events/workspace.py
  • src/ai_company/engine/workspace/__init__.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: All public functions require type hints; mypy strict mode enforced
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Use Pydantic v2: BaseModel, model_validator, computed_field, ConfigDict; use @computed_field for derived values instead of storing redundant fields
Use NotBlankStr from core.types for all identifier/name fields (including optional variants and tuples) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls) over bare create_task
Functions must be less than 50 lines; files must be less than 800 lines
Every module with business logic MUST import logger: from ai_company.observability import get_logger then assign logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code
Logger variable name must always be logger (not _logger, not log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider); import directly
Always use structured logging: logger.info(EVENT, key=value) — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names: example-provider, example-large-001, etc. Vendor names only in DESIGN_SPEC.md, .claude/ files, or third-party imports

Files:

  • src/ai_company/engine/decomposition/__init__.py
  • src/ai_company/observability/events/decomposition.py
  • src/ai_company/engine/workspace/protocol.py
  • src/ai_company/engine/workspace/config.py
  • src/ai_company/engine/decomposition/llm.py
  • src/ai_company/engine/errors.py
  • src/ai_company/engine/decomposition/llm_prompt.py
  • src/ai_company/core/enums.py
  • src/ai_company/engine/workspace/git_worktree.py
  • src/ai_company/engine/workspace/models.py
  • src/ai_company/engine/workspace/service.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/workspace/merge.py
  • src/ai_company/observability/events/workspace.py
  • src/ai_company/engine/workspace/__init__.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow for test categorization
Prefer @pytest.mark.parametrize for testing similar cases
Use vendor-agnostic test provider names: test-provider, test-small-001, etc.

Files:

  • tests/unit/engine/test_workspace_protocol.py
  • tests/unit/engine/test_workspace_merge.py
  • tests/unit/engine/test_workspace_models.py
  • tests/unit/engine/test_decomposition_llm.py
  • tests/unit/engine/test_workspace_config.py
  • tests/unit/engine/conftest.py
  • tests/integration/engine/test_workspace_integration.py
  • tests/unit/engine/test_decomposition_llm_prompt.py
  • tests/unit/observability/test_events.py
  • tests/unit/engine/test_workspace_git_worktree.py
  • tests/unit/engine/test_workspace_service.py
DESIGN_SPEC.md

📄 CodeRabbit inference engine (CLAUDE.md)

When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality

Files:

  • DESIGN_SPEC.md
🧠 Learnings (1)
📚 Learning: 2026-03-08T18:28:46.654Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T18:28:46.654Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly

Applied to files:

  • src/ai_company/observability/events/decomposition.py
  • tests/unit/observability/test_events.py
  • src/ai_company/observability/events/workspace.py
🧬 Code graph analysis (17)
src/ai_company/engine/decomposition/__init__.py (1)
src/ai_company/engine/decomposition/llm.py (2)
  • LlmDecompositionConfig (54-76)
  • LlmDecompositionStrategy (79-302)
tests/unit/engine/test_workspace_protocol.py (1)
src/ai_company/engine/workspace/protocol.py (1)
  • WorkspaceIsolationStrategy (14-90)
src/ai_company/engine/workspace/protocol.py (2)
src/ai_company/engine/workspace/models.py (3)
  • MergeResult (93-140)
  • Workspace (35-64)
  • WorkspaceRequest (11-32)
src/ai_company/engine/workspace/git_worktree.py (5)
  • setup_workspace (122-215)
  • teardown_workspace (326-395)
  • merge_workspace (217-324)
  • list_active_workspaces (397-403)
  • get_strategy_type (405-411)
src/ai_company/engine/workspace/config.py (1)
src/ai_company/core/enums.py (2)
  • ConflictEscalation (366-374)
  • MergeOrder (354-363)
tests/unit/engine/test_workspace_models.py (1)
src/ai_company/engine/workspace/models.py (6)
  • MergeConflict (67-90)
  • MergeResult (93-140)
  • WorkspaceGroupResult (143-180)
  • WorkspaceRequest (11-32)
  • all_merged (168-172)
  • total_conflicts (178-180)
src/ai_company/engine/decomposition/llm.py (7)
src/ai_company/engine/decomposition/llm_prompt.py (4)
  • build_decomposition_tool (58-130)
  • build_retry_message (190-206)
  • build_system_message (133-158)
  • build_task_message (161-187)
src/ai_company/providers/models.py (3)
  • ChatMessage (138-210)
  • CompletionConfig (213-254)
  • CompletionResponse (257-306)
src/ai_company/core/task.py (1)
  • Task (45-261)
src/ai_company/engine/decomposition/models.py (2)
  • DecompositionContext (262-287)
  • DecompositionPlan (66-122)
src/ai_company/providers/protocol.py (1)
  • CompletionProvider (21-80)
src/ai_company/tools/base.py (1)
  • description (115-117)
src/ai_company/engine/shutdown.py (1)
  • strategy (352-354)
src/ai_company/engine/decomposition/llm_prompt.py (3)
src/ai_company/core/enums.py (3)
  • Complexity (214-220)
  • CoordinationTopology (325-336)
  • TaskStructure (313-322)
src/ai_company/engine/errors.py (1)
  • DecompositionError (50-51)
src/ai_company/providers/models.py (3)
  • ChatMessage (138-210)
  • CompletionResponse (257-306)
  • ToolDefinition (64-93)
tests/unit/engine/test_decomposition_llm.py (6)
src/ai_company/core/enums.py (4)
  • CoordinationTopology (325-336)
  • Priority (205-211)
  • TaskStructure (313-322)
  • TaskType (194-202)
src/ai_company/engine/decomposition/llm.py (4)
  • LlmDecompositionConfig (54-76)
  • LlmDecompositionStrategy (79-302)
  • decompose (103-208)
  • get_strategy_name (210-212)
src/ai_company/engine/decomposition/models.py (2)
  • DecompositionContext (262-287)
  • DecompositionPlan (66-122)
src/ai_company/engine/decomposition/protocol.py (1)
  • DecompositionStrategy (14-40)
src/ai_company/engine/errors.py (2)
  • DecompositionDepthError (58-59)
  • DecompositionError (50-51)
src/ai_company/providers/models.py (3)
  • CompletionResponse (257-306)
  • TokenUsage (12-35)
  • ToolCall (96-119)
tests/unit/engine/test_workspace_config.py (2)
src/ai_company/core/enums.py (2)
  • ConflictEscalation (366-374)
  • MergeOrder (354-363)
src/ai_company/engine/workspace/config.py (2)
  • PlannerWorktreesConfig (9-43)
  • WorkspaceIsolationConfig (46-63)
tests/integration/engine/test_workspace_integration.py (5)
src/ai_company/engine/errors.py (1)
  • WorkspaceLimitError (90-91)
src/ai_company/engine/workspace/config.py (1)
  • PlannerWorktreesConfig (9-43)
src/ai_company/engine/workspace/git_worktree.py (5)
  • PlannerWorktreeStrategy (63-459)
  • setup_workspace (122-215)
  • merge_workspace (217-324)
  • teardown_workspace (326-395)
  • list_active_workspaces (397-403)
src/ai_company/engine/workspace/models.py (1)
  • WorkspaceRequest (11-32)
src/ai_company/engine/workspace/protocol.py (4)
  • setup_workspace (21-38)
  • merge_workspace (55-74)
  • teardown_workspace (40-53)
  • list_active_workspaces (76-82)
src/ai_company/engine/workspace/git_worktree.py (6)
src/ai_company/engine/errors.py (4)
  • WorkspaceCleanupError (86-87)
  • WorkspaceLimitError (90-91)
  • WorkspaceMergeError (82-83)
  • WorkspaceSetupError (78-79)
src/ai_company/engine/workspace/config.py (1)
  • PlannerWorktreesConfig (9-43)
src/ai_company/engine/workspace/models.py (4)
  • MergeConflict (67-90)
  • MergeResult (93-140)
  • Workspace (35-64)
  • WorkspaceRequest (11-32)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/engine/parallel_models.py (2)
  • task_id (87-89)
  • agent_id (79-81)
src/ai_company/tools/sandbox/result.py (1)
  • success (26-28)
src/ai_company/engine/workspace/models.py (4)
src/ai_company/core/enums.py (1)
  • ConflictEscalation (366-374)
src/ai_company/engine/parallel_models.py (2)
  • task_id (87-89)
  • agent_id (79-81)
src/ai_company/tools/base.py (1)
  • description (115-117)
src/ai_company/tools/sandbox/result.py (1)
  • success (26-28)
src/ai_company/engine/workspace/service.py (5)
src/ai_company/engine/errors.py (1)
  • WorkspaceCleanupError (86-87)
src/ai_company/engine/workspace/merge.py (2)
  • MergeOrchestrator (31-210)
  • merge_all (64-150)
src/ai_company/engine/workspace/models.py (3)
  • Workspace (35-64)
  • WorkspaceGroupResult (143-180)
  • WorkspaceRequest (11-32)
src/ai_company/engine/workspace/config.py (1)
  • WorkspaceIsolationConfig (46-63)
src/ai_company/engine/workspace/protocol.py (3)
  • WorkspaceIsolationStrategy (14-90)
  • setup_workspace (21-38)
  • teardown_workspace (40-53)
src/ai_company/engine/__init__.py (1)
src/ai_company/engine/errors.py (5)
  • WorkspaceCleanupError (86-87)
  • WorkspaceError (74-75)
  • WorkspaceLimitError (90-91)
  • WorkspaceMergeError (82-83)
  • WorkspaceSetupError (78-79)
tests/unit/engine/test_workspace_git_worktree.py (4)
src/ai_company/engine/errors.py (4)
  • WorkspaceCleanupError (86-87)
  • WorkspaceLimitError (90-91)
  • WorkspaceMergeError (82-83)
  • WorkspaceSetupError (78-79)
src/ai_company/engine/workspace/git_worktree.py (7)
  • PlannerWorktreeStrategy (63-459)
  • get_strategy_type (405-411)
  • setup_workspace (122-215)
  • merge_workspace (217-324)
  • teardown_workspace (326-395)
  • list_active_workspaces (397-403)
  • _collect_conflicts (428-459)
src/ai_company/engine/workspace/models.py (2)
  • Workspace (35-64)
  • WorkspaceRequest (11-32)
src/ai_company/engine/workspace/protocol.py (6)
  • WorkspaceIsolationStrategy (14-90)
  • get_strategy_type (84-90)
  • setup_workspace (21-38)
  • merge_workspace (55-74)
  • teardown_workspace (40-53)
  • list_active_workspaces (76-82)
tests/unit/engine/test_workspace_service.py (3)
src/ai_company/engine/errors.py (2)
  • WorkspaceCleanupError (86-87)
  • WorkspaceSetupError (78-79)
src/ai_company/engine/workspace/config.py (1)
  • WorkspaceIsolationConfig (46-63)
src/ai_company/engine/workspace/models.py (6)
  • MergeConflict (67-90)
  • MergeResult (93-140)
  • WorkspaceGroupResult (143-180)
  • WorkspaceRequest (11-32)
  • all_merged (168-172)
  • total_conflicts (178-180)
src/ai_company/engine/workspace/__init__.py (6)
src/ai_company/engine/workspace/config.py (2)
  • PlannerWorktreesConfig (9-43)
  • WorkspaceIsolationConfig (46-63)
src/ai_company/engine/workspace/git_worktree.py (1)
  • PlannerWorktreeStrategy (63-459)
src/ai_company/engine/workspace/merge.py (1)
  • MergeOrchestrator (31-210)
src/ai_company/engine/workspace/models.py (5)
  • MergeConflict (67-90)
  • MergeResult (93-140)
  • Workspace (35-64)
  • WorkspaceGroupResult (143-180)
  • WorkspaceRequest (11-32)
src/ai_company/engine/workspace/protocol.py (1)
  • WorkspaceIsolationStrategy (14-90)
src/ai_company/engine/workspace/service.py (1)
  • WorkspaceIsolationService (38-192)
🪛 LanguageTool
README.md

[typographical] ~19-~19: To join two clauses or introduce examples, consider using an em dash.
Context: ...trail - Task Decomposition & Routing - DAG-based and LLM-based subtask decompos...

(DASH_RULE)


[typographical] ~20-~20: To join two clauses or introduce examples, consider using an em dash.
Context: ...agent-task scoring - Task Assignment - Pluggable strategies (manual, role-based...

(DASH_RULE)


[typographical] ~21-~21: To join two clauses or introduce examples, consider using an em dash.
Context: ...capable agents - Workspace Isolation - Git worktree-based concurrent workspace ...

(DASH_RULE)

🔇 Additional comments (2)
tests/unit/engine/test_decomposition_llm.py (1)

281-286: No action needed. The DecompositionStrategy protocol is properly decorated with @runtime_checkable (line 13 of src/ai_company/engine/decomposition/protocol.py), so the isinstance() check in this test works correctly without raising TypeError.

src/ai_company/engine/decomposition/llm.py (1)

103-208: ⚠️ Potential issue | 🟠 Major

Keep provider call failures inside the retry/error contract.

The loop only retries parse and validation problems. If self._provider.complete() raises on timeout or transport failure, the exception bypasses retries, DECOMPOSITION_FAILED is never emitted, and callers get a provider-specific exception instead of DecompositionError. Catch call failures here and move the single-attempt logic into a helper so decompose() can fail consistently.

As per coding guidelines, "All error paths must log at WARNING or ERROR with context before raising" and "Functions must be less than 50 lines."

⛔ Skipped due to learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T18:28:46.654Z
Learning: All provider calls go through `BaseCompletionProvider` which applies retry and rate limiting automatically; never implement retry logic in driver subclasses or calling code
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T18:28:46.654Z
Learning: Applies to src/ai_company/providers/**/*.py : Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError` — non-retryable errors raise immediately without retry

Comment on lines +239 to +247
deps = raw.get("dependencies") or []
skills = raw.get("required_skills") or []
return SubtaskDefinition(
id=raw["id"],
title=raw["title"],
description=raw["description"],
dependencies=tuple(deps),
estimated_complexity=complexity,
required_skills=tuple(skills),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate array-typed fields before iterating model output.

subtasks, dependencies, and required_skills are consumed as iterables without checking their shape. Malformed output like "subtasks": {} or "dependencies": "task-1" currently produces obscure parse errors or character-by-character tuples instead of a clean DecompositionError.

Suggested change
-    deps = raw.get("dependencies") or []
-    skills = raw.get("required_skills") or []
+    deps = raw.get("dependencies") or []
+    if not isinstance(deps, list):
+        msg = "Subtask field 'dependencies' must be an array"
+        raise DecompositionError(msg)
+    skills = raw.get("required_skills") or []
+    if not isinstance(skills, list):
+        msg = "Subtask field 'required_skills' must be an array"
+        raise DecompositionError(msg)
@@
-    raw_subtasks = args.get("subtasks")
-    if not raw_subtasks:
+    raw_subtasks = args.get("subtasks")
+    if not isinstance(raw_subtasks, list):
+        msg = "Field 'subtasks' must be an array"
+        raise DecompositionError(msg)
+    if not raw_subtasks:
         msg = "No subtasks found in response"
         raise DecompositionError(msg)
+    if any(not isinstance(subtask, dict) for subtask in raw_subtasks):
+        msg = "Each subtask must be an object"
+        raise DecompositionError(msg)

Also applies to: 268-273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/decomposition/llm_prompt.py` around lines 239 - 247,
The code constructs SubtaskDefinition using raw["subtasks"],
raw.get("dependencies") and raw.get("required_skills") without validating their
types, which lets objects or strings be iterated into character tuples; update
the parsing logic around SubtaskDefinition (and the similar block around lines
handling subtasks at 268-273) to explicitly check that subtasks is a list/tuple
and that dependencies and required_skills are list/tuple (or None), and if not
raise a DecompositionError with a clear message; convert validated lists into
tuples for the SubtaskDefinition initializer (dependencies=tuple(...),
required_skills=tuple(...)) so malformed model output yields a clean
DecompositionError instead of character-by-character tuples or obscure failures.

Comment on lines +327 to +341
try:
return _args_to_plan(tc.arguments, parent_task_id)
except DecompositionError:
raise
except Exception as exc:
logger.warning(
DECOMPOSITION_LLM_PARSE_ERROR,
error=str(exc),
exc_type=type(exc).__name__,
)
msg = f"Failed to parse tool call arguments: {exc}"
raise DecompositionError(msg) from exc

msg = "No tool call for submit_decomposition_plan found"
raise DecompositionError(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Emit DECOMPOSITION_LLM_PARSE_ERROR on the early exits.

The common failure paths here — re-raised DecompositionErrors, missing tool calls, missing content, and JSON decode failures — all raise without any structured warning. That makes the most common parse and validation failures invisible in telemetry.

Suggested change
             try:
                 return _args_to_plan(tc.arguments, parent_task_id)
-            except DecompositionError:
-                raise
+            except DecompositionError as exc:
+                logger.warning(
+                    DECOMPOSITION_LLM_PARSE_ERROR,
+                    error=str(exc),
+                    parent_task_id=parent_task_id,
+                )
+                raise
@@
     msg = "No tool call for submit_decomposition_plan found"
+    logger.warning(
+        DECOMPOSITION_LLM_PARSE_ERROR,
+        error=msg,
+        parent_task_id=parent_task_id,
+    )
     raise DecompositionError(msg)
@@
     if response.content is None:
         msg = "Response has no content to parse"
+        logger.warning(
+            DECOMPOSITION_LLM_PARSE_ERROR,
+            error=msg,
+            parent_task_id=parent_task_id,
+        )
         raise DecompositionError(msg)
@@
     except json.JSONDecodeError as exc:
         msg = f"Failed to parse JSON from content: {exc}"
+        logger.warning(
+            DECOMPOSITION_LLM_PARSE_ERROR,
+            error=msg,
+            parent_task_id=parent_task_id,
+        )
         raise DecompositionError(msg) from exc
@@
-    except DecompositionError:
-        raise
+    except DecompositionError as exc:
+        logger.warning(
+            DECOMPOSITION_LLM_PARSE_ERROR,
+            error=str(exc),
+            parent_task_id=parent_task_id,
+        )
+        raise
As per coding guidelines, "All error paths must log at WARNING or ERROR with context before raising."

Also applies to: 364-378

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/decomposition/llm_prompt.py` around lines 327 - 341,
The code currently raises DecompositionError on several early-exit paths without
emitting the DECOMPOSITION_LLM_PARSE_ERROR log; update each failure path (the
except DecompositionError re-raise in the _args_to_plan call, the "No tool call
for submit_decomposition_plan" path, and other parse/JSON/content missing paths
referenced around _args_to_plan and the later block at 364-378) to call
logger.warning(DECOMPOSITION_LLM_PARSE_ERROR, error=<brief error message or
exc>, exc_type=<type name>, context=<identifier like parent_task_id or
tc.arguments>) immediately before raising; ensure you include the same
structured fields (error and exc_type) used elsewhere so telemetry captures
every raise of DecompositionError.

Comment on lines +277 to +302
def _validate_plan(
plan: DecompositionPlan,
context: DecompositionContext,
) -> None:
"""Validate plan against context constraints.

Args:
plan: The parsed decomposition plan.
context: Decomposition constraints.

Raises:
DecompositionError: If subtask count exceeds limit.
"""
if len(plan.subtasks) > context.max_subtasks:
msg = (
f"Plan has {len(plan.subtasks)} subtasks, "
f"exceeds max_subtasks of "
f"{context.max_subtasks}"
)
logger.warning(
DECOMPOSITION_VALIDATION_ERROR,
subtask_count=len(plan.subtasks),
max_subtasks=context.max_subtasks,
error=msg,
)
raise DecompositionError(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject cyclic dependency graphs in _validate_plan().

This only enforces max_subtasks. DecompositionPlan already rejects duplicate IDs and unknown dependencies, but its own validator explicitly leaves cycle detection to DependencyGraph.validate(), so a cyclic LLM plan can still be returned from this strategy. Validate the dependency graph here before accepting the plan.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/decomposition/llm.py` around lines 277 - 302, The
_validate_plan function currently only enforces max_subtasks; additionally call
the dependency-graph validator to reject cyclic dependencies by invoking
DependencyGraph.validate on the plan's graph (e.g.,
DependencyGraph.validate(plan.dependency_graph) or
plan.dependency_graph.validate()), catch any validation exception, log it with
DECOMPOSITION_VALIDATION_ERROR including subtask_count and max_subtasks, and
re-raise a DecompositionError with the validation message so cyclic plans are
rejected alongside the existing max_subtasks check.

Comment on lines +20 to +21
model_config = ConfigDict(frozen=True)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/ai_company/engine/workspace/config.py | head -70

Repository: Aureliolo/ai-company

Length of output: 2560


🏁 Script executed:

# Verify Pydantic v2 default extra behavior
python <<'EOF'
from pydantic import BaseModel, ConfigDict

# Test default behavior
class DefaultConfig(BaseModel):
    name: str

# Test with frozen only
class FrozenConfig(BaseModel):
    model_config = ConfigDict(frozen=True)
    name: str

# Inspect what extra setting is in each
print("DefaultConfig model_config extra:", DefaultConfig.model_config.get("extra"))
print("FrozenConfig model_config extra:", FrozenConfig.model_config.get("extra"))

# Test silently ignoring unknown fields
try:
    obj = FrozenConfig(name="test", unknown_field="ignored")
    print("FrozenConfig allowed unknown field silently:", obj.model_dump())
except Exception as e:
    print("FrozenConfig rejected:", e)
EOF

Repository: Aureliolo/ai-company

Length of output: 202


Reject unknown keys on these config models.

Both PlannerWorktreesConfig and WorkspaceIsolationConfig use Pydantic's default extra="ignore" behavior. Typos in YAML like max_concurrent_worktrese or planner_worktree will be silently dropped, and defaults will be used instead. For user-facing config, this silently converts a configuration error into a runtime issue.

Add extra="forbid" to both ConfigDict calls (lines 20-21 and 54-55) to reject unknown fields and surface config mistakes to users.

🔧 Proposed fix
 class PlannerWorktreesConfig(BaseModel):
-    model_config = ConfigDict(frozen=True)
+    model_config = ConfigDict(frozen=True, extra="forbid")
 class WorkspaceIsolationConfig(BaseModel):
-    model_config = ConfigDict(frozen=True)
+    model_config = ConfigDict(frozen=True, extra="forbid")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model_config = ConfigDict(frozen=True)
model_config = ConfigDict(frozen=True, extra="forbid")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/workspace/config.py` around lines 20 - 21, The config
models currently use ConfigDict(frozen=True) which leaves Pydantic's
extra="ignore" behavior and silently drops unknown keys; update both ConfigDict
instances used for PlannerWorktreesConfig and WorkspaceIsolationConfig to
include extra="forbid" (i.e., ConfigDict(frozen=True, extra="forbid")) so
unknown fields are rejected and user config typos surface as errors.

Comment on lines +48 to +60
def _validate_git_ref(value: str, label: str) -> None:
"""Validate that a string is safe for use as a git ref argument.

Args:
value: The string to validate.
label: Human-readable label for error messages.

Raises:
WorkspaceSetupError: If the value is unsafe for git.
"""
if not value or value.startswith("-") or not _SAFE_REF_RE.match(value):
msg = f"Unsafe {label} for git: {value!r}"
raise WorkspaceSetupError(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log rejected refs before raising.

This helper is an error path, but it raises WorkspaceSetupError without emitting any workspace event. Invalid task_id or base_branch inputs will fail setup with no structured telemetry.

Suggested change
 def _validate_git_ref(value: str, label: str) -> None:
@@
     if not value or value.startswith("-") or not _SAFE_REF_RE.match(value):
         msg = f"Unsafe {label} for git: {value!r}"
+        logger.warning(
+            WORKSPACE_SETUP_FAILED,
+            error=msg,
+            label=label,
+            value=value,
+        )
         raise WorkspaceSetupError(msg)
As per coding guidelines, "All error paths must log at WARNING or ERROR with context before raising."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/workspace/git_worktree.py` around lines 48 - 60, The
_validate_git_ref function raises WorkspaceSetupError for unsafe refs but
doesn't log; before raising, emit a structured warning/error log with context
(include label, the rejected value, and the _SAFE_REF_RE.pattern or reason) so
failures produce telemetry; update _validate_git_ref to call the module logger
(e.g., logger.warning/error) or the workspace event emitter with these fields
and then raise WorkspaceSetupError as before.

Comment on lines +29 to +32
file_scope: tuple[str, ...] = Field(
default=(),
description="Optional file path hints",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject blank file-scope hints at the model boundary.

file_scope is typed as tuple[str, ...], so values like ("", " ") validate even though those hints are semantically useless and can skew any overlap or planning logic that treats file_scope as meaningful input. Tighten the element type here.

Suggested change
-    file_scope: tuple[str, ...] = Field(
+    file_scope: tuple[NotBlankStr, ...] = Field(
         default=(),
         description="Optional file path hints",
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
file_scope: tuple[str, ...] = Field(
default=(),
description="Optional file path hints",
)
file_scope: tuple[NotBlankStr, ...] = Field(
default=(),
description="Optional file path hints",
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/workspace/models.py` around lines 29 - 32, The
file_scope Field currently allows empty or whitespace-only strings; tighten its
element type so each tuple entry is a non-empty, whitespace-stripped string
(e.g., use pydantic.constr(strip_whitespace=True, min_length=1) or an equivalent
NonEmptyStr type) to reject values like "" or "   ". Update the file_scope
declaration in src/ai_company/engine/workspace/models.py to use the constrained
string type for the tuple element (keeping the Field default and description) so
validation fails at the model boundary for blank hints.

Comment on lines +67 to +117
async def setup_group(
self,
*,
requests: tuple[WorkspaceRequest, ...],
) -> tuple[Workspace, ...]:
"""Create workspaces for a group of agent tasks.

Rolls back all already-created workspaces if any setup fails.

Args:
requests: Workspace creation requests.

Returns:
Tuple of created workspaces.

Raises:
WorkspaceLimitError: When max concurrent worktrees reached.
WorkspaceSetupError: When git operations fail.
"""
logger.info(
WORKSPACE_GROUP_SETUP_START,
count=len(requests),
)

workspaces: list[Workspace] = []
try:
for request in requests:
ws = await self._strategy.setup_workspace(
request=request,
)
workspaces.append(ws)
except Exception:
# Roll back already-created workspaces
for ws in workspaces:
try:
await self._strategy.teardown_workspace(
workspace=ws,
)
except WorkspaceCleanupError as cleanup_exc:
logger.warning(
WORKSPACE_TEARDOWN_FAILED,
workspace_id=ws.workspace_id,
error=f"Rollback cleanup failed: {cleanup_exc}",
)
raise

logger.info(
WORKSPACE_GROUP_SETUP_COMPLETE,
count=len(workspaces),
)
return tuple(workspaces)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Log setup failures at the service boundary, then factor out rollback handling.

The except Exception path only logs rollback cleanup failures, so the actual setup error can escape without any group-level context. Emit a warning/error before raise, and pull the rollback loop into a helper so setup_group() stays under the 50-line limit.

As per coding guidelines, "All error paths must log at WARNING or ERROR with context before raising" and "Functions must be less than 50 lines."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/workspace/service.py` around lines 67 - 117, The except
path in setup_group currently only logs cleanup failures and lets the original
setup exception escape without group-level context; add a log (logger.warning or
logger.error) inside the except that includes the group context (e.g.,
count=len(requests), identifiers if available) and the caught exception before
re-raising, and extract the rollback loop into a new helper method (e.g.,
_rollback_workspaces(self, workspaces: list[Workspace])) that iterates over
workspaces and calls self._strategy.teardown_workspace while handling
WorkspaceCleanupError and logging WORKSPACE_TEARDOWN_FAILED; update setup_group
to call this helper so the function stays concise and under 50 lines while
preserving existing logging constants (WORKSPACE_GROUP_SETUP_START,
WORKSPACE_GROUP_SETUP_COMPLETE) and use the same logger and error variables when
re-raising.

Comment on lines +339 to +361
async def test_merge_revparse_failure_uses_unknown(self) -> None:
"""When rev-parse fails, SHA is set to 'unknown'."""
strategy = _make_strategy()
ws = make_workspace()
strategy._active_workspaces[ws.workspace_id] = ws

mock_run_git = AsyncMock(
side_effect=[
(0, "", ""), # checkout
(0, "", ""), # merge
(1, "", "error: not a valid ref"), # rev-parse fails
],
)

with patch.object(
PlannerWorktreeStrategy,
"_run_git",
mock_run_git,
):
result = await strategy.merge_workspace(workspace=ws)

assert result.success is True
assert result.merged_commit_sha == "unknown"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't codify "unknown" as a successful merge SHA.

Line 361 bakes in "unknown" as merged_commit_sha when rev-parse HEAD fails. That turns a real lookup failure into a value that looks usable to downstream code. Either surface a WorkspaceMergeError here or keep merged_commit_sha=None; inventing a fake ref will be hard to distinguish from a valid one later.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_workspace_git_worktree.py` around lines 339 - 361, The
test and reviewer point out that PlannerWorktreeStrategy.merge_workspace
currently converts a rev-parse failure into a magic string "unknown"; instead
change the merge handling so that when PlannerWorktreeStrategy._run_git returns
a non‑zero result for the rev-parse/HEAD call you do not set merged_commit_sha
to "unknown" — instead either raise a WorkspaceMergeError from merge_workspace
(preferred) or set merged_commit_sha to None so callers can detect failure;
update any callers/tests that expect "unknown" accordingly and ensure error
context from _run_git is preserved in the exception or log.

Comment on lines +172 to +205
async def test_human_escalation_stops_on_conflict(self) -> None:
"""HUMAN escalation stops merging on first conflict."""
ws1 = make_workspace(workspace_id="ws-1", task_id="task-1")
ws2 = make_workspace(workspace_id="ws-2", task_id="task-2")

conflict = _make_conflict()
mock_strategy = AsyncMock()
mock_strategy.merge_workspace = AsyncMock(
side_effect=[
make_merge_result(
workspace_id="ws-1",
success=False,
conflicts=(conflict,),
merged_commit_sha=None,
),
make_merge_result(workspace_id="ws-2"),
],
)
mock_strategy.teardown_workspace = AsyncMock()

orch = _make_orchestrator(
strategy=mock_strategy,
conflict_escalation=ConflictEscalation.HUMAN,
)
results = await orch.merge_all(
workspaces=(ws1, ws2),
completion_order=("ws-1", "ws-2"),
)

# Should stop after first conflict
assert len(results) == 1
assert results[0].success is False
assert results[0].escalation is ConflictEscalation.HUMAN

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert that HUMAN escalation stops after the first merge attempt.

These tests only validate the returned results. An implementation that still awaits merge_workspace() for ws2 and then drops that result would still pass. Add an assertion on the merge mock so the stop-on-conflict / stop-on-exception control flow is actually covered.

Suggested test tightening
         assert len(results) == 1
         assert results[0].success is False
         assert results[0].escalation is ConflictEscalation.HUMAN
+        assert mock_strategy.merge_workspace.await_count == 1
@@
         # Should stop after exception with HUMAN escalation
         assert len(results) == 1
         assert results[0].success is False
+        assert mock_strategy.merge_workspace.await_count == 1

Also applies to: 309-335

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_workspace_merge.py` around lines 172 - 205, The test
currently only checks returned results but not that the orchestrator actually
stopped calling merge_workspace after the first conflict; update
test_human_escalation_stops_on_conflict to assert that
mock_strategy.merge_workspace was called exactly once (or assert it was not
called for "ws-2") after awaiting orch.merge_all, referencing
mock_strategy.merge_workspace, orch.merge_all, and the workspace ids/ws
variables (ws1, ws2) so the stop-on-conflict control flow is exercised; apply
the same extra assertion in the other similar test section mentioned (lines
309-335).

Comment on lines +12 to +17
def test_protocol_is_runtime_checkable(self) -> None:
"""Protocol can be used with isinstance checks."""
assert hasattr(WorkspaceIsolationStrategy, "__protocol_attrs__") or (
hasattr(WorkspaceIsolationStrategy, "_is_runtime_protocol")
and WorkspaceIsolationStrategy._is_runtime_protocol
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n tests/unit/engine/test_workspace_protocol.py

Repository: Aureliolo/ai-company

Length of output: 1804


🏁 Script executed:

# Find the WorkspaceIsolationStrategy definition
fd -t f -e py | xargs rg "class WorkspaceIsolationStrategy" -B 2 -A 10

Repository: Aureliolo/ai-company

Length of output: 980


🏁 Script executed:

cat -n src/ai_company/engine/workspace/protocol.py

Repository: Aureliolo/ai-company

Length of output: 2915


🏁 Script executed:

# Check Python version constraints in pyproject.toml or similar
fd -t f \( -name "pyproject.toml" -o -name "setup.py" -o -name ".python-version" \) | xargs cat 2>/dev/null | head -50

Repository: Aureliolo/ai-company

Length of output: 294


🏁 Script executed:

find . -name "pyproject.toml" -o -name "setup.py" -o -name ".python-version" | head -5

Repository: Aureliolo/ai-company

Length of output: 120


🏁 Script executed:

# Check what Python version this targets
rg "python" pyproject.toml -A 5 -B 5 | head -40

Repository: Aureliolo/ai-company

Length of output: 1421


🏁 Script executed:

# Let's also check if there are any other implementations of the protocol to see the actual usage pattern
rg "WorkspaceIsolationStrategy" --type py -l

Repository: Aureliolo/ai-company

Length of output: 329


🏁 Script executed:

# Check line lengths in the test file
awk 'length($0) > 88 {print NR": "length($0)" chars: "$0}' tests/unit/engine/test_workspace_protocol.py

Repository: Aureliolo/ai-company

Length of output: 46


🏁 Script executed:

# Let's look at actual implementations to understand how this protocol is used
cat -n tests/unit/engine/test_workspace_git_worktree.py | head -50

Repository: Aureliolo/ai-company

Length of output: 1704


🏁 Script executed:

# Check the service that uses the protocol
rg "WorkspaceIsolationStrategy" src/ai_company/engine/workspace/service.py -B 3 -A 5

Repository: Aureliolo/ai-company

Length of output: 518


🏁 Script executed:

# Check if there's typing module import and understand the protocol setup better
rg "@runtime_checkable" -B 2 -A 15 src/ai_company/engine/workspace/protocol.py

Repository: Aureliolo/ai-company

Length of output: 503


🏁 Script executed:

# Let's understand: would a conforming stub actually pass isinstance in Python 3.14?
# Check if there are any actual protocol implementations we can learn from
rg "class.*WorkspaceIsolationStrategy" -A 20 src/ai_company/engine/workspace/git_worktree.py | head -40

Repository: Aureliolo/ai-company

Length of output: 46


🏁 Script executed:

# Let me verify the concern by checking what Python 3.14 actually provides for runtime_checkable
python3 << 'EOF'
from typing import Protocol, runtime_checkable

`@runtime_checkable`
class TestProto(Protocol):
    def method(self) -> None: ...

# Check what attributes exist
attrs = dir(TestProto)
print("Has __protocol_attrs__:", hasattr(TestProto, "__protocol_attrs__"))
print("Has _is_runtime_protocol:", hasattr(TestProto, "_is_runtime_protocol"))
if hasattr(TestProto, "_is_runtime_protocol"):
    print("Value of _is_runtime_protocol:", TestProto._is_runtime_protocol)

# Try a conforming stub
class ConformingStub:
    def method(self) -> None:
        pass

print("\nConforming stub passes isinstance:", isinstance(ConformingStub(), TestProto))

# Try non-conforming
class NonConforming:
    pass

print("Non-conforming passes isinstance:", isinstance(NonConforming(), TestProto))
EOF

Repository: Aureliolo/ai-company

Length of output: 241


Replace private attribute checks with a behavior-based protocol conformance test.

The check on line 14 for __protocol_attrs__ is dead code in Python 3.14 (the attribute does not exist). Line 15's _is_runtime_protocol is a private implementation detail. The dir() subset check on lines 39–42 only verifies method names exist, not that they match the protocol signature.

Instead, create a minimal conforming stub class with the required methods and verify it passes isinstance(..., WorkspaceIsolationStrategy). This approach is stable across Python releases and tests the actual runtime protocol contract.

Also applies to: 29-42

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_workspace_protocol.py` around lines 12 - 17, Replace
the fragile private-attribute checks in test_protocol_is_runtime_checkable with
a behavior-based conformance test: implement a minimal local stub class that
defines the required methods/attributes of the WorkspaceIsolationStrategy
protocol (use the same method names/signatures referenced later in the file) and
assert isinstance(stub_instance, WorkspaceIsolationStrategy); do the same for
the related test around lines 29-42 so both tests validate runtime protocol
conformance via an actual stub implementing the protocol rather than checking
__protocol_attrs__ or _is_runtime_protocol or just names via dir().

@Aureliolo Aureliolo merged commit aa0eefe into main Mar 8, 2026
6 of 7 checks passed
@Aureliolo Aureliolo deleted the feat/llm-decomposition-workspace-isolation branch March 8, 2026 19:06
Comment on lines +49 to +75
def _validate_git_ref(value: str, label: str) -> None:
"""Validate that a string is safe for use as a git command argument.

Prevents argument injection and path traversal. Does not fully
validate git ref format rules (e.g. consecutive slashes).

Args:
value: The string to validate.
label: Human-readable label for error messages.

Raises:
WorkspaceSetupError: If the value is unsafe for git.
"""
if (
not value
or value.startswith("-")
or ".." in value
or not _SAFE_REF_RE.match(value)
):
msg = f"Unsafe {label} for git: {value!r}"
logger.warning(
WORKSPACE_SETUP_FAILED,
label=label,
value=value,
error=msg,
)
raise WorkspaceSetupError(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong exception type raised in merge/teardown contexts

_validate_git_ref always raises WorkspaceSetupError, but it is called from three different lifecycle methods: setup_workspace (correct), merge_workspace (line 282), and teardown_workspace (line 395).

When validation fails during a merge or teardown, callers pattern-matching on WorkspaceMergeError or WorkspaceCleanupError will not catch the validation error since it raises WorkspaceSetupError instead. This breaks structured error handling and makes diagnostics misleading.

Consider parameterizing the exception type so the validator can raise context-appropriate exceptions:

def _validate_git_ref(
    value: str,
    label: str,
    error_cls: type[WorkspaceError] = WorkspaceSetupError,
) -> None:
    """..."""
    if (
        not value
        or value.startswith("-")
        or ".." in value
        or not _SAFE_REF_RE.match(value)
    ):
        msg = f"Unsafe {label} for git: {value!r}"
        logger.warning(
            WORKSPACE_SETUP_FAILED,
            label=label,
            value=value,
            error=msg,
        )
        raise error_cls(msg)

Then call with the appropriate exception type:

  • From setup_workspace: _validate_git_ref(..., error_cls=WorkspaceSetupError)
  • From merge_workspace: _validate_git_ref(..., error_cls=WorkspaceMergeError)
  • From teardown_workspace: _validate_git_ref(..., error_cls=WorkspaceCleanupError)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/workspace/git_worktree.py
Line: 49-75

Comment:
**Wrong exception type raised in merge/teardown contexts**

`_validate_git_ref` always raises `WorkspaceSetupError`, but it is called from three different lifecycle methods: `setup_workspace` (correct), `merge_workspace` (line 282), and `teardown_workspace` (line 395).

When validation fails during a merge or teardown, callers pattern-matching on `WorkspaceMergeError` or `WorkspaceCleanupError` will not catch the validation error since it raises `WorkspaceSetupError` instead. This breaks structured error handling and makes diagnostics misleading.

Consider parameterizing the exception type so the validator can raise context-appropriate exceptions:

```python
def _validate_git_ref(
    value: str,
    label: str,
    error_cls: type[WorkspaceError] = WorkspaceSetupError,
) -> None:
    """..."""
    if (
        not value
        or value.startswith("-")
        or ".." in value
        or not _SAFE_REF_RE.match(value)
    ):
        msg = f"Unsafe {label} for git: {value!r}"
        logger.warning(
            WORKSPACE_SETUP_FAILED,
            label=label,
            value=value,
            error=msg,
        )
        raise error_cls(msg)
```

Then call with the appropriate exception type:
- From `setup_workspace`: `_validate_git_ref(..., error_cls=WorkspaceSetupError)`
- From `merge_workspace`: `_validate_git_ref(..., error_cls=WorkspaceMergeError)`
- From `teardown_workspace`: `_validate_git_ref(..., error_cls=WorkspaceCleanupError)`

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +131 to +151
try:
stdout_bytes, stderr_bytes = await asyncio.wait_for(
proc.communicate(),
timeout=cmd_timeout,
)
except TimeoutError:
proc.kill()
await proc.wait()
msg = f"git {args[0] if args else ''} timed out after {cmd_timeout}s"
logger.exception(
WORKSPACE_SETUP_FAILED,
error=msg,
args=args,
)
return (-1, "", msg)
rc = proc.returncode if proc.returncode is not None else -1
return (
rc,
stdout_bytes.decode("utf-8", errors="replace").strip(),
stderr_bytes.decode("utf-8", errors="replace").strip(),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subprocess orphaned on asyncio.CancelledError

The _run_git method only catches TimeoutError after asyncio.wait_for. If the calling coroutine is cancelled while the git subprocess is running (e.g., during shutdown or when the outer asyncio.TaskGroup is cancelled), CancelledError propagates immediately without calling proc.kill() or await proc.wait().

The subprocess becomes an orphan and may continue holding git locks (.git/index.lock, .git/MERGE_HEAD, etc.), blocking all subsequent git operations and causing hard failures in production.

Handle CancelledError alongside TimeoutError:

try:
    stdout_bytes, stderr_bytes = await asyncio.wait_for(
        proc.communicate(),
        timeout=cmd_timeout,
    )
except (TimeoutError, asyncio.CancelledError):
    proc.kill()
    await proc.wait()
    if isinstance(..., asyncio.CancelledError):
        raise  # Preserve cancellation semantics
    msg = f"git {args[0] if args else ''} timed out after {cmd_timeout}s"
    logger.exception(
        WORKSPACE_SETUP_FAILED,
        error=msg,
        args=args,
    )
    return (-1, "", msg)

Alternatively, use a finally block to ensure cleanup always occurs.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/workspace/git_worktree.py
Line: 131-151

Comment:
**Subprocess orphaned on `asyncio.CancelledError`**

The `_run_git` method only catches `TimeoutError` after `asyncio.wait_for`. If the calling coroutine is cancelled while the git subprocess is running (e.g., during shutdown or when the outer `asyncio.TaskGroup` is cancelled), `CancelledError` propagates immediately without calling `proc.kill()` or `await proc.wait()`.

The subprocess becomes an orphan and may continue holding git locks (`.git/index.lock`, `.git/MERGE_HEAD`, etc.), blocking all subsequent git operations and causing hard failures in production.

Handle `CancelledError` alongside `TimeoutError`:

```python
try:
    stdout_bytes, stderr_bytes = await asyncio.wait_for(
        proc.communicate(),
        timeout=cmd_timeout,
    )
except (TimeoutError, asyncio.CancelledError):
    proc.kill()
    await proc.wait()
    if isinstance(..., asyncio.CancelledError):
        raise  # Preserve cancellation semantics
    msg = f"git {args[0] if args else ''} timed out after {cmd_timeout}s"
    logger.exception(
        WORKSPACE_SETUP_FAILED,
        error=msg,
        args=args,
    )
    return (-1, "", msg)
```

Alternatively, use a `finally` block to ensure cleanup always occurs.

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 8, 2026
PR #174 — git_worktree.py:
- _validate_git_ref now accepts error_cls/event params so merge context
  raises WorkspaceMergeError and teardown raises WorkspaceCleanupError
- _run_git catches asyncio.CancelledError to kill subprocess before
  re-raising, preventing orphaned git processes

PR #172 — task assignment:
- TaskAssignmentConfig.strategy validated against known strategy names
- max_concurrent_tasks_per_agent now enforced in _score_and_filter_candidates
  via new AssignmentRequest.max_concurrent_tasks field
- TaskAssignmentStrategy protocol docstring documents error signaling contract

PR #171 — worktree skill:
- rebase uses --left-right --count with triple-dot to detect behind-main
- setup reuse path uses correct git worktree add (without -b)
- setup handles dirty working tree with stash/abort prompt
- status table shows both ahead and behind counts
- tree command provides circular dependency recovery guidance

PR #170 — meeting parsing:
- Fix assigned? regex to assigned (prevents false-positive assignee
  extraction from "assign to X" in action item descriptions)
Aureliolo added a commit that referenced this pull request Mar 8, 2026
…176)

## Summary

- Fix CI failures on main: 2 test assertion mismatches in cost-optimized
assignment tests + mypy `attr-defined` error in strategy registry test
- Address all Greptile post-merge review findings across PRs #170#175
(14 fixes total)

### PR #175 — Test assertion fixes (CI blockers)
- `"no cost data"` → `"insufficient cost data"` to match implementation
wording
- `unknown-dev` → `known-dev` winner assertion (all-or-nothing fallback,
sort stability)
- `getattr()` for `_scorer` access on protocol type (Windows/Linux mypy
difference)

### PR #174 — Workspace isolation
- `_validate_git_ref` raises context-appropriate exception types
(`WorkspaceMergeError` in merge, `WorkspaceCleanupError` in teardown)
- `_run_git` catches `asyncio.CancelledError` to kill subprocess before
re-raising (prevents orphaned git processes)

### PR #172 — Task assignment
- `TaskAssignmentConfig.strategy` validated against 6 known strategy
names
- `max_concurrent_tasks_per_agent` enforced via new
`AssignmentRequest.max_concurrent_tasks` field in
`_score_and_filter_candidates`
- `TaskAssignmentStrategy` protocol docstring documents error signaling
contract (raises vs `selected=None`)

### PR #171 — Worktree skill
- `rebase` uses `--left-right --count` with triple-dot to detect
behind-main worktrees
- `setup` reuse path uses `git worktree add` without `-b` for existing
branches
- `setup` handles dirty working tree with stash/abort prompt
- `status` table shows both ahead and behind counts
- `tree` provides circular dependency recovery guidance

### PR #170 — Meeting parsing
- `assigned?` → `assigned` regex fix (prevents false-positive assignee
extraction from "assign to X")

## Test plan

- [x] All 3988 tests pass (10 new tests added)
- [x] mypy strict: 0 errors (463 source files)
- [x] ruff lint + format: all clean
- [x] Coverage: 96.53% (threshold: 80%)
- [x] Pre-commit hooks pass

## Review coverage

Quick mode — automated checks only (lint, type-check, tests, coverage).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement LLM-based decomposition strategy Implement workspace isolation with WorkspaceIsolationStrategy protocol (DESIGN_SPEC §6.8)

2 participants