Skip to content

feat: add coordination error taxonomy classification pipeline (#146)#181

Merged
Aureliolo merged 3 commits intomainfrom
feat/coordination-error-taxonomy
Mar 9, 2026
Merged

feat: add coordination error taxonomy classification pipeline (#146)#181
Aureliolo merged 3 commits intomainfrom
feat/coordination-error-taxonomy

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Adds engine/classification/ subpackage with coordination error taxonomy classification pipeline (§10.5)
  • Four heuristic-based detectors: logical contradiction, numerical drift, context omission, coordination failure
  • Pipeline runs post-execution when enabled via AgentEngine(error_taxonomy_config=...), never blocks agent work
  • Per-detector isolation — one broken detector doesn't prevent others from running
  • Classification results are log-only for M5; programmatic access planned for future milestone
  • ErrorFinding model with turn_range validation (non-negative, start ≤ end) and AwareDatetime for timestamps
  • Debug logging at detector entry/exit, structured event constants for observability
  • Documentation updated: DESIGN_SPEC.md §15.3 project structure, §10.5 current state; CLAUDE.md engine description and logging examples

Test plan

  • 16 new tests added across unit and integration
  • AgentEngine classification integration tested (no config, enabled config, MemoryError propagation)
  • RecursionError propagation tested in pipeline
  • Zero-value drift edge cases (zero-to-nonzero, zero-to-zero)
  • Common capitalised words filtering verified
  • Empty conversation boundary for all four detectors
  • Multiple contradictions and combined tool errors + finish reasons
  • turn_range validation (negative indices, inverted range, valid range)
  • All 4190 tests pass, 96.32% coverage
  • Pre-reviewed by 8 agents, 24 findings addressed

Closes #146

Copilot AI review requested due to automatic review settings March 9, 2026 08:08
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 9, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6d7505e3-177c-42a8-895f-61298d97e95d

📥 Commits

Reviewing files that changed from the base of the PR and between b3f8a55 and 421be92.

📒 Files selected for processing (18)
  • .github/workflows/dependency-review.yml
  • CLAUDE.md
  • DESIGN_SPEC.md
  • README.md
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/classification/__init__.py
  • src/ai_company/engine/classification/detectors.py
  • src/ai_company/engine/classification/models.py
  • src/ai_company/engine/classification/pipeline.py
  • src/ai_company/engine/validation.py
  • src/ai_company/observability/events/classification.py
  • tests/integration/engine/test_error_taxonomy_integration.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_classification_detectors.py
  • tests/unit/engine/test_classification_models.py
  • tests/unit/engine/test_classification_pipeline.py
  • tests/unit/observability/test_events.py

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Added optional post-execution error taxonomy classification that automatically detects and categorizes coordination errors including logical contradictions, numerical drift, context omissions, and tool failures.
    • Classification is fully configurable and can be enabled or disabled per execution.
  • Documentation

    • Updated design specifications and README to document the new error classification capabilities.

Walkthrough

Adds an opt-in post-execution coordination error taxonomy: new engine.classification package (models, detectors, pipeline), observability events, AgentEngine integration via error_taxonomy_config, and tests exercising detectors, pipeline behavior, and integration.

Changes

Cohort / File(s) Summary
Design & Docs
CLAUDE.md, DESIGN_SPEC.md, README.md
Documented the coordination error taxonomy (M5), runtime/analytics scope updates, and engine responsibilities for classification.
Engine Public Surface
src/ai_company/engine/__init__.py, src/ai_company/engine/agent_engine.py
Re-exported classification symbols; AgentEngine gains optional error_taxonomy_config ctor arg and invokes classification in post-execution pipeline; replaced some private validators with validation module calls.
Validation Helpers
src/ai_company/engine/validation.py
New pre-flight validation functions: validate_run_inputs, validate_agent, validate_task and EXECUTABLE_STATUSES constant (used by AgentEngine).
Classification — Models & API
src/ai_company/engine/classification/models.py, src/ai_company/engine/classification/__init__.py
Added ErrorSeverity enum, immutable ErrorFinding and ClassificationResult Pydantic models, and package re-exports.
Classification — Detectors
src/ai_company/engine/classification/detectors.py
Four stateless detectors added: detect_logical_contradictions, detect_numerical_drift, detect_context_omissions, detect_coordination_failures with evidence/turn ranges and logging hooks.
Classification — Pipeline
src/ai_company/engine/classification/pipeline.py
Async classify_execution_errors entrypoint, per-detector safe execution wrapper, aggregation into ClassificationResult, and structured logging events (start/finding/complete/error/skipped).
Observability Events
src/ai_company/observability/events/classification.py
Added classification lifecycle and detector event constants (CLASSIFICATION_*, DETECTOR_*).
Tests — Integration & Unit
tests/integration/engine/test_error_taxonomy_integration.py, tests/unit/engine/test_classification_*.py, tests/unit/engine/test_agent_engine.py, tests/unit/observability/test_events.py
New integration test for pipeline and extensive unit tests for models, detectors, pipeline behavior, AgentEngine integration, and event constants.
CI/Workflow Comment
.github/workflows/dependency-review.yml
Minor comment relocation related to license detection (non-functional).

Sequence Diagram(s)

sequenceDiagram
    participant AE as AgentEngine
    participant Pipeline as Classification Pipeline
    participant Detectors as Detector Suite
    participant Obs as Observability

    AE->>AE: Execution completes (messages, turns collected)
    AE->>Pipeline: classify_execution_errors(execution_result, agent_id, task_id, config)
    alt Config disabled
        Pipeline->>Obs: log CLASSIFICATION_SKIPPED
        Pipeline-->>AE: return None
    else Config enabled
        Pipeline->>Obs: log CLASSIFICATION_START (agent_id, task_id, execution_id, categories)
        Pipeline->>Detectors: _run_detectors(enabled_categories)
        par Detector Execution
            Detectors->>Detectors: detect_logical_contradictions(conversation)
            Detectors->>Detectors: detect_numerical_drift(conversation, threshold)
            Detectors->>Detectors: detect_context_omissions(conversation)
            Detectors->>Detectors: detect_coordination_failures(conversation, turns)
        end
        Detectors->>Obs: log CLASSIFICATION_FINDING (per finding)
        Detectors-->>Pipeline: aggregate findings
        Pipeline->>Pipeline: Build ClassificationResult
        Pipeline->>Obs: log CLASSIFICATION_COMPLETE (execution_id, finding_count)
        Pipeline-->>AE: return ClassificationResult
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding a coordination error taxonomy classification pipeline, which aligns with the primary objective and substantial code additions.
Description check ✅ Passed The description provides relevant details about the implementation, test coverage, and documentation updates. It explains the purpose, functionality, and scope of the changes related to the error taxonomy classification pipeline.
Linked Issues check ✅ Passed The PR implements all acceptance criteria from issue #146: four error detection categories [logical contradiction, numerical drift, context omission, coordination failure], opt-in configuration with per-category control, post-execution pipeline that never blocks, structured logging, graceful degradation, and comprehensive unit/integration tests.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #146 requirements. The new classification subpackage, detectors, models, pipeline integration, observability events, tests, and documentation updates are all in-scope for the M5 error taxonomy feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/coordination-error-taxonomy
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/coordination-error-taxonomy

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent engine's analytical capabilities by introducing a robust coordination error taxonomy classification pipeline. This new system allows for the identification and categorization of various agent coordination issues, such as logical contradictions and numerical discrepancies, without interfering with the primary agent execution flow. By providing structured insights into potential errors, it lays the groundwork for future programmatic access and more targeted diagnostics, ultimately improving the reliability and performance of agent operations.

Highlights

  • New Coordination Error Taxonomy Classification Pipeline: Introduced a new engine/classification/ subpackage to classify coordination errors based on a defined taxonomy, running post-execution to avoid blocking agent work.
  • Four Heuristic-Based Detectors: Implemented four heuristic-based detectors for logical contradiction, numerical drift, context omission, and coordination failure to identify specific types of errors.
  • Non-Blocking and Isolated Execution: The classification pipeline runs asynchronously after agent execution and is designed with per-detector isolation, ensuring that a failure in one detector does not impact others.
  • New Data Models for Error Findings: Defined ErrorFinding and ClassificationResult models, including turn_range validation and AwareDatetime for timestamps, to structure and store classification results.
  • Enhanced Observability: Integrated debug logging at detector entry/exit points and added structured event constants for improved observability of the classification process.
  • Documentation Updates: Updated DESIGN_SPEC.md and CLAUDE.md to reflect the new project structure, current state of LLM call analytics, and logging examples related to the error taxonomy.
Changelog
  • CLAUDE.md
    • Updated the description of the engine/ subpackage to include coordination error classification.
    • Added CLASSIFICATION_START to the list of example event names for structured logging.
  • DESIGN_SPEC.md
    • Updated the 'Current state' for LLM Call Analytics to include the error taxonomy classification pipeline.
    • Added a new 'Current state' section under 'M4/M5: Coordination Error Taxonomy' detailing the implementation of the classification pipeline and its detectors.
    • Updated the project structure diagram to include the new classification/ subpackage within engine/.
  • src/ai_company/engine/init.py
    • Imported ClassificationResult, ErrorFinding, ErrorSeverity, and classify_execution_errors from the new classification subpackage.
    • Added the newly imported classification models and functions to the module's __all__ export list.
  • src/ai_company/engine/agent_engine.py
    • Imported classify_execution_errors from ai_company.engine.classification.pipeline.
    • Imported ErrorTaxonomyConfig for type checking.
    • Added error_taxonomy_config as an optional parameter to the AgentEngine constructor.
    • Updated the docstring for _post_execution_pipeline to include classification.
    • Integrated the call to classify_execution_errors within the _post_execution_pipeline if error_taxonomy_config is provided and enabled.
  • src/ai_company/engine/classification/init.py
    • Added new file for the coordination error classification pipeline.
    • Re-exported public API components: ClassificationResult, ErrorFinding, ErrorSeverity, and classify_execution_errors.
  • src/ai_company/engine/classification/detectors.py
    • Added new file implementing heuristic-based detectors for coordination errors.
    • Implemented detect_logical_contradictions to find assertion negation pairs.
    • Implemented detect_numerical_drift to identify significant changes in numerical values with context.
    • Implemented detect_context_omissions to flag entities introduced early but absent later.
    • Implemented detect_coordination_failures to detect tool execution errors and error finish reasons.
  • src/ai_company/engine/classification/models.py
    • Added new file defining data models for error classification.
    • Defined ErrorSeverity enum for severity levels (LOW, MEDIUM, HIGH).
    • Defined ErrorFinding model to represent individual detected errors, including category, severity, description, evidence, and turn_range with validation.
    • Defined ClassificationResult model to aggregate findings, execution metadata, and classification timestamp.
  • src/ai_company/engine/classification/pipeline.py
    • Added new file for orchestrating the error classification pipeline.
    • Implemented classify_execution_errors as the main entry point, handling configuration and logging.
    • Implemented _run_detectors to execute enabled detectors with per-detector isolation for fault tolerance.
    • Implemented _safe_detect to wrap detector calls, catching and logging exceptions while allowing MemoryError and RecursionError to propagate.
  • src/ai_company/observability/events/classification.py
    • Added new file defining structured logging event constants for the classification pipeline, including start, complete, finding, error, and skipped events, as well as detector-specific events.
  • tests/integration/engine/test_error_taxonomy_integration.py
    • Added new file for integration tests of the error taxonomy pipeline.
    • Included tests for detecting contradictions in realistic conversations.
    • Added tests for identifying coordination failures from tool errors and error finish reasons.
    • Verified that all categories run together and that the pipeline does not block execution.
    • Tested the behavior of a disabled taxonomy, ensuring it returns None quickly.
    • Included tests for numerical drift and context omission with realistic data scenarios.
  • tests/unit/engine/test_agent_engine.py
    • Imported ErrorTaxonomyConfig for testing.
    • Added unit tests for AgentEngine to verify that classification is skipped when no config is provided.
    • Added unit tests to confirm classify_execution_errors is invoked when a config is enabled.
    • Added unit tests to ensure MemoryError propagates from the classification pipeline.
  • tests/unit/engine/test_classification_detectors.py
    • Added new file for unit tests of individual coordination error detectors.
    • Tested detect_logical_contradictions for clean conversations, contradictions, and ignored messages.
    • Tested detect_numerical_drift for consistent numbers, detected drift, custom thresholds, and zero-value edge cases.
    • Tested detect_context_omissions for entities referenced, dropped entities, short conversations, and common capitalized words.
    • Tested detect_coordination_failures for no errors, tool execution errors, error finish reasons, and combined scenarios.
    • Verified all detectors handle empty conversations gracefully.
  • tests/unit/engine/test_classification_models.py
    • Added new file for unit tests of classification result models.
    • Tested ErrorSeverity enum values and member count.
    • Tested ErrorFinding construction, immutability, and validation for turn_range (non-negative, start <= end).
    • Tested ClassificationResult construction, computed fields for findings, and immutability.
  • tests/unit/engine/test_classification_pipeline.py
    • Added new file for unit tests of the error classification pipeline.
    • Tested that a disabled config returns None.
    • Tested that clean execution returns empty findings.
    • Verified that only enabled categories are run.
    • Tested exception handling, ensuring regular exceptions are caught while MemoryError and RecursionError propagate.
    • Confirmed coordination failure findings are always high severity.
    • Verified the result object correctly lists checked categories.
  • tests/unit/observability/test_events.py
    • Imported new classification event constants.
    • Updated the test_all_domain_modules_discovered to include the 'classification' module.
    • Added test_classification_events_exist to verify the new classification event constants.
Activity
  • 16 new tests added across unit and integration.
  • AgentEngine classification integration tested for no config, enabled config, and MemoryError propagation.
  • RecursionError propagation tested in the pipeline.
  • Zero-value drift edge cases (zero-to-nonzero, zero-to-zero) covered.
  • Common capitalized words filtering verified.
  • Empty conversation boundary tested for all four detectors.
  • Multiple contradictions and combined tool errors + finish reasons tested.
  • turn_range validation (negative indices, inverted range, valid range) confirmed.
  • All 4190 tests pass, 96.32% coverage.
  • Pre-reviewed by 8 agents, 24 findings addressed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR introduces a coordination error taxonomy classification pipeline (engine/classification/) with four heuristic-based detectors (logical contradiction, numerical drift, context omission, coordination failure). The architecture is well-structured with non-blocking post-execution placement, per-detector fault isolation, structured event constants, and clean Pydantic models.

Critical blocker: The PR contains Python 2 exception syntax (except MemoryError, RecursionError:) at three locations across two files that will cause immediate SyntaxError on import in Python 3.14+:

  • src/ai_company/engine/agent_engine.py (line 293)
  • src/ai_company/engine/classification/pipeline.py (lines 94, 242)

This must be corrected to except (MemoryError, RecursionError): in all three locations before the PR can be merged. The module cannot be imported or tested without this fix.

Confidence Score: 0/5

  • Not safe to merge — the PR contains three Python 2 exception syntax errors that will cause immediate SyntaxError on module import in Python 3.14+, preventing the entire classification pipeline from being imported or tested.
  • The codebase requires Python ≥3.14 (per pyproject.toml), but uses Python 2's comma-based exception syntax (except MemoryError, RecursionError:) at three locations across two critical files. This is a hard blocker — the modules cannot be imported, tests cannot run, and the feature is completely non-functional until corrected. While the architecture design is sound and tests are comprehensive, these syntax errors must be fixed before any other review is meaningful.
  • src/ai_company/engine/agent_engine.py (line 293) and src/ai_company/engine/classification/pipeline.py (lines 94, 242) — all three must be corrected from except MemoryError, RecursionError: to except (MemoryError, RecursionError):

Last reviewed commit: 421be92

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-designed and thoroughly tested coordination error classification pipeline, including a new engine/classification subpackage and clean integration into AgentEngine. However, a high-severity Regular Expression Denial of Service (ReDoS) vulnerability was identified in the entity detection logic. Additionally, critical Python 2 style exception handling syntax (except A, B:) is used, which is invalid in Python 3 and will cause runtime SyntaxErrors, preventing the pipeline from executing. These issues must be addressed to ensure the security and reliability of the classification pipeline.

execution_id=execution_id,
config=config,
)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This line uses Python 2 style except syntax (except MemoryError, RecursionError:), which is a SyntaxError in Python 3. This will prevent the module from loading and the pipeline from executing. The correct syntax for catching multiple exceptions is to enclose them in a tuple, e.g., except (Exception1, Exception2):. This issue is also present in other parts of the codebase, such as: src/ai_company/engine/classification/pipeline.py: line 231 and src/ai_company/engine/agent_engine.py: lines 188, 315, and 770.

    except (MemoryError, RecursionError):

return tuple(findings)


_ENTITY_PATTERN = re.compile(r"\b([A-Z][a-zA-Z]{2,}(?:[A-Z][a-zA-Z]*)*)\b")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The regular expression _ENTITY_PATTERN is vulnerable to Regular Expression Denial of Service (ReDoS) due to nested quantifiers. The pattern r"\b([A-Z][a-zA-Z]{2,}(?:[A-Z][a-zA-Z]*)*)\b" contains a nested group (?:[A-Z][a-zA-Z]*)* where both the inner and outer quantifiers can match capital letters. This ambiguity causes exponential backtracking when the regex engine encounters a long string of capital letters that is not followed by a word boundary (e.g., "AbcAAAAAAAAAAAAAAAAAAAA_"). Since this regex processes LLM-generated content, it could be exploited to cause a Denial of Service.

To remediate this, simplify the regular expression to avoid nested quantifiers. Since the pattern is intended to match a single word starting with a capital letter and having at least 3 characters, the nested group is redundant.

Suggested change
_ENTITY_PATTERN = re.compile(r"\b([A-Z][a-zA-Z]{2,}(?:[A-Z][a-zA-Z]*)*)\b")
_ENTITY_PATTERN = re.compile(r"\b[A-Z][a-zA-Z]{2,}\b")

"""
try:
return detector_fn()
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This line contains a syntax error in Python 3. The correct syntax for catching multiple exceptions is except (Exception1, Exception2):. The current code except MemoryError, RecursionError: will raise a SyntaxError at runtime.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a coordination error taxonomy classification pipeline (engine/classification/) that runs post-execution when opted in via AgentEngine(error_taxonomy_config=...). It implements §10.5 of the DESIGN_SPEC and closes issue #146. Four heuristic detectors (logical contradiction, numerical drift, context omission, coordination failure) analyze conversation histories after agent execution finishes. Results are currently log-only.

Changes:

  • New engine/classification/ subpackage with models.py, detectors.py, pipeline.py, and __init__.py
  • New observability/events/classification.py with 8 structured event constants; AgentEngine gains an error_taxonomy_config parameter that triggers post-execution classification
  • 16 new tests across 4 files (3 unit, 1 integration) and documentation updates to DESIGN_SPEC.md and CLAUDE.md

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/ai_company/engine/classification/__init__.py Package re-exports for the public classification API
src/ai_company/engine/classification/models.py ErrorSeverity enum, ErrorFinding and ClassificationResult Pydantic models
src/ai_company/engine/classification/detectors.py Four pure-function heuristic detectors for the four error categories
src/ai_company/engine/classification/pipeline.py classify_execution_errors async orchestrator with per-detector isolation
src/ai_company/engine/agent_engine.py Adds error_taxonomy_config parameter and invokes classification in _post_execution_pipeline
src/ai_company/engine/__init__.py Re-exports the new classification public API
src/ai_company/observability/events/classification.py Eight Final[str] event constants for structured logging
tests/unit/engine/test_classification_models.py Unit tests for ErrorSeverity, ErrorFinding, and ClassificationResult
tests/unit/engine/test_classification_detectors.py Unit tests for all four detector functions
tests/unit/engine/test_classification_pipeline.py Unit tests for the classify_execution_errors pipeline function
tests/unit/engine/test_agent_engine.py New TestAgentEngineClassification class testing engine integration
tests/integration/engine/test_error_taxonomy_integration.py End-to-end integration tests with realistic conversation patterns
tests/unit/observability/test_events.py Adds classification to expected domain modules and event constant assertions
DESIGN_SPEC.md Updates §10.5 current state and §15.3 project structure
CLAUDE.md Updates engine description and adds classification event example to logging guidelines

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +11
"""Integration tests for the error taxonomy pipeline.

Verifies end-to-end classification with realistic conversation
patterns and validates structured log events are emitted.
"""

import time
from datetime import date
from uuid import uuid4

import pytest
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test file is missing the module-level pytestmark declaration. All other integration test files in tests/integration/engine/ consistently set pytestmark = [pytest.mark.integration, pytest.mark.timeout(30)] at module scope (see test_agent_engine_integration.py:48, test_crash_recovery.py:35, test_multi_agent_delegation.py:85). Without this, the tests in this file will not be tagged with the integration marker or protected by the 30-second timeout guard.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +56
turn_range: tuple[int, int] | None = Field(
default=None,
description="Turn index range (start, end) where error observed",
)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The turn_range field on ErrorFinding is documented as "Turn index range (start, end) where error observed", but three of the four detectors populate it with conversation message indices (from _extract_assistant_texts, which uses position in the full conversation tuple including system/user/tool messages), not turn numbers. By contrast, detect_coordination_failures sets turn_range to (turn.turn_number, turn.turn_number) which are 1-based turn numbers from TurnRecord. The result is that turn_range values carry different semantics depending on which detector produced the finding, making them incomparable. Either the field name and docstring should be updated to "message_index_range", or the detectors should be updated to use consistent turn numbers.

Copilot uses AI. Check for mistakes.
Comment on lines +426 to +431
def test_classification_events_exist(self) -> None:
assert CLASSIFICATION_START == "classification.start"
assert CLASSIFICATION_COMPLETE == "classification.complete"
assert CLASSIFICATION_FINDING == "classification.finding"
assert CLASSIFICATION_ERROR == "classification.error"
assert CLASSIFICATION_SKIPPED == "classification.skipped"
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_classification_events_exist test only verifies 5 of the 8 constants defined in src/ai_company/observability/events/classification.py. The three per-detector lifecycle event constants (DETECTOR_START, DETECTOR_COMPLETE, DETECTOR_ERROR) are not asserted. Based on the convention in this same test class (e.g., test_conflict_events_exist checks all 21 conflict constants, test_workspace_events_exist checks all workspace constants), all defined event constants should be covered.

Copilot uses AI. Check for mistakes.
execution_id=execution_id,
config=config,
)
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both except MemoryError, RecursionError: clauses use Python 2 syntax that is a SyntaxError in Python 3. In Python 3, catching multiple exception types requires a tuple: except (MemoryError, RecursionError):. The codebase's correct usage is except (MemoryError, RecursionError) as exc: (see src/ai_company/tools/invoker.py:224 and src/ai_company/engine/parallel.py:290). As written, these lines will cause a SyntaxError at import time, making the entire module unimportable and all the tests that mock _run_detectors will fail to even load the module under test.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
"""
try:
return detector_fn()
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same Python 2 syntax issue: except MemoryError, RecursionError: is a SyntaxError in Python 3. It must be written as except (MemoryError, RecursionError): to correctly catch both exception types.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ai_company/engine/agent_engine.py (1)

85-109: 🧹 Nitpick | 🔵 Trivial

Document error_taxonomy_config in the constructor contract.

The constructor now exposes a new public parameter, but the Args: block still ends at shutdown_checker, so the public API docs no longer describe how classification is enabled.

As per coding guidelines, src/**/*.py: Use Google-style docstrings required on public classes and functions (enforced by ruff D rules).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 85 - 109, The docstring
Args section for the class constructor is missing documentation for the new
parameter error_taxonomy_config; update the constructor docstring (above def
__init__) to add an Args entry for error_taxonomy_config describing its type
(ErrorTaxonomyConfig | None), purpose (used to enable/configure error
classification), and default behavior (when None classification is disabled or
uses defaults), matching the style of the existing shutdown_checker line so the
public API docs correctly reflect the new parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/engine/classification/detectors.py`:
- Around line 79-87: The detectors are mixing raw conversation offsets and
TurnRecord.turn_number when building ErrorFinding.turn_range (e.g., in
_extract_assistant_texts and other detectors that currently emit indices like
enumerate(conversation) or hard-coded 0); normalize all turn_range values to use
TurnRecord.turn_number consistently: convert any conversation-index (from
functions like _extract_assistant_texts) to the corresponding
TurnRecord.turn_number before creating ErrorFinding.turn_range, and update all
detector sites (the blocks around the referenced locations) to look up the
TurnRecord for that message and use its turn_number instead of raw enumerate
indices or constants so downstream consumers always receive turn indices.

In `@src/ai_company/engine/classification/pipeline.py`:
- Around line 74-80: The code generates a new UUID for execution_id which breaks
correlation; instead use the run's existing execution id from
execution_result.context.execution_id when populating
ClassificationResult.execution_id and when logging. Replace the local creation
of execution_id (the str(uuid4()) assignment) and pass
execution_result.context.execution_id into logger.info for CLASSIFICATION_START
(and ensure ClassificationResult.execution_id is set from
execution_result.context.execution_id) so all logs and the ClassificationResult
share the same execution identifier.

In `@tests/integration/engine/test_error_taxonomy_integration.py`:
- Around line 199-253: Remove the hard wall-clock assertions and instead assert
behavioral correctness: for test_disabled_taxonomy_returns_none_fast(), keep
config = ErrorTaxonomyConfig(enabled=False) and assert
classify_execution_errors(...) returns None, and also verify detectors were not
invoked by spying/mocking the detector functions used by
classify_execution_errors (or assert that the internal detector dispatch method
was not called); for test_pipeline_does_not_block_execution(), remove the
elapsed < 2.0 check and either assert result is not None and detectors produced
expected classifications or move the performance check into a separate slow test
decorated with pytest markers (e.g., `@pytest.mark.integration` and
`@pytest.mark.slow`) and a per-test timeout (pytest.mark.timeout(30)) so CI uses a
30s limit instead of brittle short wall-clock assertions; ensure you reference
classify_execution_errors, ErrorTaxonomyConfig, and the detector dispatch/spies
when adding the mocks.

In `@tests/unit/observability/test_events.py`:
- Around line 426-431: Extend the test_classification_events_exist test to also
assert the three detector constants (DETECTOR_START, DETECTOR_COMPLETE,
DETECTOR_ERROR) are defined and equal to their expected string values; locate
the constants in classification.py (referencing DETECTOR_START,
DETECTOR_COMPLETE, DETECTOR_ERROR) and add corresponding assertions alongside
the existing CLASSIFICATION_* assertions in the test_classification_events_exist
function to cover all eight event constants.

---

Outside diff comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 85-109: The docstring Args section for the class constructor is
missing documentation for the new parameter error_taxonomy_config; update the
constructor docstring (above def __init__) to add an Args entry for
error_taxonomy_config describing its type (ErrorTaxonomyConfig | None), purpose
(used to enable/configure error classification), and default behavior (when None
classification is disabled or uses defaults), matching the style of the existing
shutdown_checker line so the public API docs correctly reflect the new
parameter.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aa6b5103-bbce-49f7-867c-7aae34d0b435

📥 Commits

Reviewing files that changed from the base of the PR and between f753779 and b3f8a55.

📒 Files selected for processing (15)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/classification/__init__.py
  • src/ai_company/engine/classification/detectors.py
  • src/ai_company/engine/classification/models.py
  • src/ai_company/engine/classification/pipeline.py
  • src/ai_company/observability/events/classification.py
  • tests/integration/engine/test_error_taxonomy_integration.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_classification_detectors.py
  • tests/unit/engine/test_classification_models.py
  • tests/unit/engine/test_classification_pipeline.py
  • tests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14

Files:

  • src/ai_company/observability/events/classification.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/classification/__init__.py
  • tests/unit/engine/test_classification_detectors.py
  • src/ai_company/engine/classification/models.py
  • tests/unit/engine/test_classification_pipeline.py
  • tests/unit/engine/test_classification_models.py
  • tests/integration/engine/test_error_taxonomy_integration.py
  • src/ai_company/engine/classification/pipeline.py
  • src/ai_company/engine/classification/detectors.py
  • tests/unit/engine/test_agent_engine.py
  • src/ai_company/engine/agent_engine.py
  • tests/unit/observability/test_events.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Add type hints to all public functions and classes; mypy strict mode is enforced
Use Google-style docstrings required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones (immutability); for non-Pydantic internal collections use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Enforce 88 character line length (ruff)
Keep functions to less than 50 lines and files to less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • src/ai_company/observability/events/classification.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/classification/__init__.py
  • src/ai_company/engine/classification/models.py
  • src/ai_company/engine/classification/pipeline.py
  • src/ai_company/engine/classification/detectors.py
  • src/ai_company/engine/agent_engine.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)
Never use import logging, logging.getLogger(), or print() in application code — use the structured logger from ai_company.observability
Always use variable name logger (not _logger, not log) for the logger instance
Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events. import EVENT_CONSTANT
Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases

Files:

  • src/ai_company/observability/events/classification.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/classification/__init__.py
  • src/ai_company/engine/classification/models.py
  • src/ai_company/engine/classification/pipeline.py
  • src/ai_company/engine/classification/detectors.py
  • src/ai_company/engine/agent_engine.py
src/ai_company/{providers,engine}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains

Files:

  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/classification/__init__.py
  • src/ai_company/engine/classification/models.py
  • src/ai_company/engine/classification/pipeline.py
  • src/ai_company/engine/classification/detectors.py
  • src/ai_company/engine/agent_engine.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to categorize tests
Configure asyncio_mode = 'auto' for pytest — no manual @pytest.mark.asyncio needed
Set test timeout to 30 seconds per test
Prefer @pytest.mark.parametrize for testing similar cases
Use generic test provider names (test-provider, test-small-001, etc.) in tests

Files:

  • tests/unit/engine/test_classification_detectors.py
  • tests/unit/engine/test_classification_pipeline.py
  • tests/unit/engine/test_classification_models.py
  • tests/integration/engine/test_error_taxonomy_integration.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/observability/test_events.py
🧠 Learnings (7)
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT

Applied to files:

  • src/ai_company/observability/events/classification.py
  • CLAUDE.md
  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Never use `import logging`, `logging.getLogger()`, or `print()` in application code — use the structured logger from ai_company.observability

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use variable name `logger` (not `_logger`, not `log`) for the logger instance

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Pure data models, enums, and re-exports do NOT need logging

Applied to files:

  • CLAUDE.md
🧬 Code graph analysis (8)
src/ai_company/engine/__init__.py (2)
src/ai_company/engine/classification/models.py (3)
  • ClassificationResult (71-110)
  • ErrorFinding (32-68)
  • ErrorSeverity (24-29)
src/ai_company/engine/classification/pipeline.py (1)
  • classify_execution_errors (43-107)
src/ai_company/engine/classification/__init__.py (2)
src/ai_company/engine/classification/models.py (3)
  • ClassificationResult (71-110)
  • ErrorFinding (32-68)
  • ErrorSeverity (24-29)
src/ai_company/engine/classification/pipeline.py (1)
  • classify_execution_errors (43-107)
src/ai_company/engine/classification/models.py (1)
src/ai_company/budget/coordination_config.py (1)
  • ErrorCategory (23-29)
tests/unit/engine/test_classification_pipeline.py (8)
src/ai_company/budget/coordination_config.py (2)
  • ErrorCategory (23-29)
  • ErrorTaxonomyConfig (32-57)
src/ai_company/core/agent.py (2)
  • AgentIdentity (246-304)
  • ModelConfig (145-174)
src/ai_company/engine/classification/models.py (3)
  • ErrorSeverity (24-29)
  • finding_count (102-104)
  • has_findings (108-110)
src/ai_company/engine/classification/pipeline.py (1)
  • classify_execution_errors (43-107)
src/ai_company/engine/context.py (3)
  • AgentContext (87-307)
  • from_identity (140-171)
  • with_message (173-182)
src/ai_company/engine/loop_protocol.py (3)
  • ExecutionResult (78-135)
  • TerminationReason (28-35)
  • TurnRecord (38-75)
src/ai_company/providers/enums.py (2)
  • FinishReason (15-22)
  • MessageRole (6-12)
src/ai_company/providers/models.py (2)
  • ChatMessage (138-210)
  • ToolResult (122-135)
tests/unit/engine/test_classification_models.py (1)
src/ai_company/engine/classification/models.py (5)
  • ClassificationResult (71-110)
  • ErrorFinding (32-68)
  • ErrorSeverity (24-29)
  • finding_count (102-104)
  • has_findings (108-110)
src/ai_company/engine/classification/pipeline.py (4)
src/ai_company/budget/coordination_config.py (2)
  • ErrorCategory (23-29)
  • ErrorTaxonomyConfig (32-57)
src/ai_company/engine/classification/detectors.py (4)
  • detect_context_omissions (328-402)
  • detect_coordination_failures (405-470)
  • detect_logical_contradictions (222-270)
  • detect_numerical_drift (273-325)
src/ai_company/engine/classification/models.py (3)
  • ClassificationResult (71-110)
  • ErrorFinding (32-68)
  • finding_count (102-104)
src/ai_company/engine/loop_protocol.py (1)
  • ExecutionResult (78-135)
tests/unit/engine/test_agent_engine.py (2)
src/ai_company/budget/coordination_config.py (1)
  • ErrorTaxonomyConfig (32-57)
src/ai_company/engine/agent_engine.py (1)
  • run (124-207)
src/ai_company/engine/agent_engine.py (2)
src/ai_company/engine/classification/pipeline.py (1)
  • classify_execution_errors (43-107)
src/ai_company/budget/coordination_config.py (1)
  • ErrorTaxonomyConfig (32-57)
🪛 GitHub Check: CodeQL
src/ai_company/engine/classification/detectors.py

[failure] 207-207: Inefficient regular expression
This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'A'.

🪛 LanguageTool
CLAUDE.md

[style] ~86-~86: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

🔇 Additional comments (16)
src/ai_company/observability/events/classification.py (1)

1-14: LGTM!

Clean event constant definitions following the established pattern. The constants are properly typed with Final[str] and values follow the domain.subject.qualifier naming convention consistent with other event modules.

tests/unit/observability/test_events.py (1)

11-17: LGTM!

Imports correctly added for the classification event constants.

CLAUDE.md (2)

52-52: LGTM!

Engine description correctly updated to reflect the new coordination error classification capability.


86-86: LGTM!

Logging documentation appropriately extended with the new CLASSIFICATION_START event example, maintaining consistency with other domain event examples.

src/ai_company/engine/classification/__init__.py (1)

1-19: LGTM!

Clean package initializer with well-defined public API. The re-exports are properly documented in __all__ and the docstring accurately describes the module's purpose.

tests/unit/engine/test_agent_engine.py (2)

5-5: LGTM!

Import additions are appropriate for the new classification tests.

Also applies to: 9-9


773-856: LGTM!

Well-structured tests covering the three key scenarios for error taxonomy classification integration:

  1. No config → classification skipped
  2. Enabled config → classification invoked
  3. MemoryError → propagates unconditionally

The tests properly use AsyncMock and patch the correct module path.

src/ai_company/engine/__init__.py (2)

31-36: LGTM!

Correct import of classification public API entities from the new subpackage.


173-173: LGTM!

__all__ properly extended with classification exports, maintaining alphabetical ordering.

Also applies to: 188-189, 265-265

tests/unit/engine/test_classification_models.py (3)

1-13: LGTM!

Clean test file setup with proper imports and pytest marker.


50-51: Acceptable use of broad exception catch with noqa.

The pytest.raises(Exception) pattern with # noqa: B017, PT011 is acknowledged. While catching ValidationError would be more precise for Pydantic frozen model violations and ValueError for turn_range validation, the current approach avoids coupling tests to Pydantic internals.

Also applies to: 77-84, 86-93, 173-174


155-164: Good timestamp boundary test.

The test correctly validates that classified_at defaults to the current time by capturing before/after timestamps.

src/ai_company/engine/classification/models.py (4)

1-22: LGTM!

Clean module setup with appropriate imports. The # noqa: TC001 comments correctly indicate type-checking-only imports that are used in type annotations.


24-30: LGTM!

Simple and well-documented StrEnum for severity levels.


32-68: LGTM!

ErrorFinding model is well-designed:

  • Frozen for immutability as required
  • NotBlankStr for description field
  • Proper validation of turn_range ensuring non-negative indices and start ≤ end
  • Google-style docstring with attribute descriptions

71-110: LGTM!

ClassificationResult model follows best practices:

  • Frozen for immutability
  • NotBlankStr for identifier fields per coding guidelines
  • AwareDatetime with UTC default for timezone-aware timestamps
  • @computed_field for derived values (finding_count, has_findings) as required by guidelines
  • Tuple types for immutable collections

Comment on lines +79 to +87
def _extract_assistant_texts(
conversation: tuple[ChatMessage, ...],
) -> list[tuple[int, str]]:
"""Extract (index, text) pairs from assistant messages."""
return [
(i, msg.content)
for i, msg in enumerate(conversation)
if msg.role == MessageRole.ASSISTANT and msg.content
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize turn_range across all detectors.

These findings mix raw conversation offsets (and even a hard-coded 0) with TurnRecord.turn_number. ErrorFinding.turn_range is documented as turn indices, so downstream consumers will get incompatible coordinates depending on category.

Also applies to: 197-201, 257-261, 389-393, 442-458

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/classification/detectors.py` around lines 79 - 87, The
detectors are mixing raw conversation offsets and TurnRecord.turn_number when
building ErrorFinding.turn_range (e.g., in _extract_assistant_texts and other
detectors that currently emit indices like enumerate(conversation) or hard-coded
0); normalize all turn_range values to use TurnRecord.turn_number consistently:
convert any conversation-index (from functions like _extract_assistant_texts) to
the corresponding TurnRecord.turn_number before creating
ErrorFinding.turn_range, and update all detector sites (the blocks around the
referenced locations) to look up the TurnRecord for that message and use its
turn_number instead of raw enumerate indices or constants so downstream
consumers always receive turn indices.

Implement the error classification pipeline for coordination metrics
(DESIGN_SPEC §10.5). Four detector functions analyse conversation
histories for logical contradictions, numerical drift, context
omissions, and coordination failures. The pipeline integrates into
AgentEngine._post_execution_pipeline() and never blocks execution.

New files:
- engine/classification/ subpackage (models, detectors, pipeline)
- observability/events/classification.py (event constants)
- Unit tests: models, detectors, pipeline (42 tests)
- Integration tests: full pipeline scenarios (7 tests)

Modified:
- engine/agent_engine.py: error_taxonomy_config param + _classify_errors()
- engine/__init__.py: re-export classification types
- tests/unit/observability/test_events.py: register classification module
Pre-reviewed by 8 agents (code-reviewer, python-reviewer,
pr-test-analyzer, silent-failure-hunter, type-design-analyzer,
logging-audit, resilience-audit, docs-consistency), 24 findings
addressed:

Source improvements:
- Fix TYPE_CHECKING import ordering in pipeline.py
- Add per-detector isolation (one broken detector doesn't kill others)
- Add turn_range validation (start <= end, non-negative)
- Use AwareDatetime for classified_at (rejects naive datetimes)
- Remove dead except Exception in agent_engine (pipeline already catches)
- Inline _classify_errors to reduce agent_engine.py toward 800-line limit
- Add debug logging to all detector entry/exit points
- Reorder _compute_drift and _check_drift_in_group before their caller
- Refactor _check_drift_in_group to return tuple instead of mutating list
- Document _compute_drift zero-baseline edge case behavior
- Fix constant pseudo-docstring to use comment syntax

Test additions (16 new tests):
- AgentEngine classification integration (3 paths: no config, enabled, MemoryError)
- RecursionError propagation in pipeline
- Zero-value drift edge cases (zero-to-nonzero, zero-to-zero)
- Common capitalised words filtering in context omissions
- Empty conversation for all four detectors
- Multiple contradictions in one conversation
- Combined tool errors + error finish reasons
- turn_range validation (negative, inverted, valid)

Documentation updates:
- DESIGN_SPEC.md: add classification/ to §15.3 project structure
- DESIGN_SPEC.md: add Current state callout to §10.5 error taxonomy
- DESIGN_SPEC.md: update engine/ description with classification
- CLAUDE.md: update engine/ description and logging event examples
- Extract validation methods from agent_engine.py into validation.py (739 lines, under 800 limit)
- Fix ReDoS vulnerability in _ENTITY_PATTERN regex (linear-time matching)
- Fix turn_range semantic inconsistency (use message indices consistently)
- Add cross-field validator for ClassificationResult findings vs categories
- Use execution_result.context.execution_id instead of generating fresh UUIDs
- Add threshold_percent validation in detect_numerical_drift
- Improve classification isolation in engine (re-raise MemoryError/RecursionError)
- Add per-detector isolation, MemoryError propagation, and empty categories tests
- Add DETECTOR_START/COMPLETE/ERROR event assertions
- Remove wall-clock assertions from integration tests, add pytestmark
- Add coordination error taxonomy to README implemented section
- Fix dependency-review.yml inline YAML comment in allow-licenses
- Use NotBlankStr for evidence tuples and pipeline parameters
Copilot AI review requested due to automatic review settings March 9, 2026 08:53
@Aureliolo Aureliolo force-pushed the feat/coordination-error-taxonomy branch from 312d94a to 421be92 Compare March 9, 2026 08:53
@Aureliolo Aureliolo merged commit 70c7480 into main Mar 9, 2026
9 checks passed
@Aureliolo Aureliolo deleted the feat/coordination-error-taxonomy branch March 9, 2026 08:53
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Comment on lines +293 to +294
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 2 except syntax is a SyntaxError in Python 3

except MemoryError, RecursionError: is the old Python 2 form that bound the exception to a variable — it was completely removed in Python 3. In Python 3, catching a tuple of exception types requires parentheses:

Suggested change
except MemoryError, RecursionError:
raise
except (MemoryError, RecursionError):
raise

Note: The same syntax error appears in src/ai_company/engine/classification/pipeline.py at lines 94 and 242 and must be corrected there as well.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 293-294

Comment:
**Python 2 `except` syntax is a `SyntaxError` in Python 3**

`except MemoryError, RecursionError:` is the old Python 2 form that bound the exception to a variable — it was completely removed in Python 3. In Python 3, catching a tuple of exception types requires parentheses:

```suggestion
            except (MemoryError, RecursionError):
                raise
```

Note: The same syntax error appears in `src/ai_company/engine/classification/pipeline.py` at lines 94 and 242 and must be corrected there as well.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +166 to +173
def test_frozen(self) -> None:
result = ClassificationResult(
execution_id="exec-005",
agent_id="agent-1",
task_id="task-1",
categories_checked=(),
)
with pytest.raises(Exception): # noqa: B017, PT011
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ClassificationResult._validate_findings_match_categories model validator (which raises ValueError when findings contain categories not in categories_checked) has no test coverage. Since ErrorFinding.turn_range validation is covered, this validator should be tested similarly — e.g., constructing a ClassificationResult with a finding whose category is not in categories_checked and asserting an exception is raised.

Copilot uses AI. Check for mistakes.
task_id,
config=self._error_taxonomy_config,
)
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3 syntax error: except MemoryError, RecursionError: is invalid Python 3 syntax. The correct syntax is except (MemoryError, RecursionError):. This was introduced as new code in the _post_execution_pipeline method as part of this PR.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
"""Integration tests for the error taxonomy pipeline.

Verifies end-to-end classification with realistic conversation
patterns and validates structured log events are emitted.
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level docstring states it "validates structured log events are emitted", but none of the tests in the file actually capture, inspect, or assert on any structured log events. The docstring should be updated to accurately reflect what the tests verify (end-to-end classification with realistic conversation patterns), or log event verification should be added.

Suggested change
patterns and validates structured log events are emitted.
patterns.

Copilot uses AI. Check for mistakes.
f"Turn {turn.turn_number} (index {turn_idx}): "
f"finish_reason={turn.finish_reason.value}",
),
turn_range=(turn_idx, turn_idx),
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantic inconsistency in turn_range usage within detect_coordination_failures: for tool execution errors (line 449), turn_range=(i, i) uses the message index from the conversation tuple (0-based position in the full conversation). For error finish reason findings (line 464), turn_range=(turn_idx, turn_idx) uses the index within the turns tuple, which is a separate index space with a different cardinality than the conversation. A consumer of ErrorFinding would have no way to know which index space applies. Since the ErrorFinding docstring describes turn_range as "Message index range (start, end) where error observed", the turn_idx usage is semantically incorrect. Either both usages should use conversation message indices (mapping each turn to its corresponding messages), or the field semantics should be clarified explicitly.

Suggested change
turn_range=(turn_idx, turn_idx),

Copilot uses AI. Check for mistakes.
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
This was referenced Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement coordination error taxonomy with opt-in classification pipeline (DESIGN_SPEC §10.5 M5)

3 participants