feat: add coordination error taxonomy classification pipeline (#146) by Aureliolo · Pull Request #181 · Aureliolo/synthorg

Aureliolo · 2026-03-09T08:08:25Z

Summary

Adds engine/classification/ subpackage with coordination error taxonomy classification pipeline (§10.5)
Four heuristic-based detectors: logical contradiction, numerical drift, context omission, coordination failure
Pipeline runs post-execution when enabled via AgentEngine(error_taxonomy_config=...), never blocks agent work
Per-detector isolation — one broken detector doesn't prevent others from running
Classification results are log-only for M5; programmatic access planned for future milestone
ErrorFinding model with turn_range validation (non-negative, start ≤ end) and AwareDatetime for timestamps
Debug logging at detector entry/exit, structured event constants for observability
Documentation updated: DESIGN_SPEC.md §15.3 project structure, §10.5 current state; CLAUDE.md engine description and logging examples

Test plan

Closes #146

coderabbitai · 2026-03-09T08:08:46Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6d7505e3-177c-42a8-895f-61298d97e95d

📥 Commits

Reviewing files that changed from the base of the PR and between b3f8a55 and 421be92.

📒 Files selected for processing (18)

.github/workflows/dependency-review.yml
CLAUDE.md
DESIGN_SPEC.md
README.md
src/ai_company/engine/__init__.py
src/ai_company/engine/agent_engine.py
src/ai_company/engine/classification/__init__.py
src/ai_company/engine/classification/detectors.py
src/ai_company/engine/classification/models.py
src/ai_company/engine/classification/pipeline.py
src/ai_company/engine/validation.py
src/ai_company/observability/events/classification.py
tests/integration/engine/test_error_taxonomy_integration.py
tests/unit/engine/test_agent_engine.py
tests/unit/engine/test_classification_detectors.py
tests/unit/engine/test_classification_models.py
tests/unit/engine/test_classification_pipeline.py
tests/unit/observability/test_events.py

📝 Walkthrough

Summary by CodeRabbit

Release Notes

New Features
- Added optional post-execution error taxonomy classification that automatically detects and categorizes coordination errors including logical contradictions, numerical drift, context omissions, and tool failures.
- Classification is fully configurable and can be enabled or disabled per execution.
Documentation
- Updated design specifications and README to document the new error classification capabilities.

Walkthrough

Adds an opt-in post-execution coordination error taxonomy: new engine.classification package (models, detectors, pipeline), observability events, AgentEngine integration via error_taxonomy_config, and tests exercising detectors, pipeline behavior, and integration.

Changes

Cohort / File(s)	Summary
Design & Docs `CLAUDE.md`, `DESIGN_SPEC.md`, `README.md`	Documented the coordination error taxonomy (M5), runtime/analytics scope updates, and engine responsibilities for classification.
Engine Public Surface `src/ai_company/engine/__init__.py`, `src/ai_company/engine/agent_engine.py`	Re-exported classification symbols; AgentEngine gains optional `error_taxonomy_config` ctor arg and invokes classification in post-execution pipeline; replaced some private validators with `validation` module calls.
Validation Helpers `src/ai_company/engine/validation.py`	New pre-flight validation functions: `validate_run_inputs`, `validate_agent`, `validate_task` and `EXECUTABLE_STATUSES` constant (used by AgentEngine).
Classification — Models & API `src/ai_company/engine/classification/models.py`, `src/ai_company/engine/classification/__init__.py`	Added `ErrorSeverity` enum, immutable `ErrorFinding` and `ClassificationResult` Pydantic models, and package re-exports.
Classification — Detectors `src/ai_company/engine/classification/detectors.py`	Four stateless detectors added: `detect_logical_contradictions`, `detect_numerical_drift`, `detect_context_omissions`, `detect_coordination_failures` with evidence/turn ranges and logging hooks.
Classification — Pipeline `src/ai_company/engine/classification/pipeline.py`	Async `classify_execution_errors` entrypoint, per-detector safe execution wrapper, aggregation into `ClassificationResult`, and structured logging events (start/finding/complete/error/skipped).
Observability Events `src/ai_company/observability/events/classification.py`	Added classification lifecycle and detector event constants (`CLASSIFICATION_`, `DETECTOR_`).
Tests — Integration & Unit `tests/integration/engine/test_error_taxonomy_integration.py`, `tests/unit/engine/test_classification_*.py`, `tests/unit/engine/test_agent_engine.py`, `tests/unit/observability/test_events.py`	New integration test for pipeline and extensive unit tests for models, detectors, pipeline behavior, AgentEngine integration, and event constants.
CI/Workflow Comment `.github/workflows/dependency-review.yml`	Minor comment relocation related to license detection (non-functional).

Sequence Diagram(s)

sequenceDiagram
    participant AE as AgentEngine
    participant Pipeline as Classification Pipeline
    participant Detectors as Detector Suite
    participant Obs as Observability

    AE->>AE: Execution completes (messages, turns collected)
    AE->>Pipeline: classify_execution_errors(execution_result, agent_id, task_id, config)
    alt Config disabled
        Pipeline->>Obs: log CLASSIFICATION_SKIPPED
        Pipeline-->>AE: return None
    else Config enabled
        Pipeline->>Obs: log CLASSIFICATION_START (agent_id, task_id, execution_id, categories)
        Pipeline->>Detectors: _run_detectors(enabled_categories)
        par Detector Execution
            Detectors->>Detectors: detect_logical_contradictions(conversation)
            Detectors->>Detectors: detect_numerical_drift(conversation, threshold)
            Detectors->>Detectors: detect_context_omissions(conversation)
            Detectors->>Detectors: detect_coordination_failures(conversation, turns)
        end
        Detectors->>Obs: log CLASSIFICATION_FINDING (per finding)
        Detectors-->>Pipeline: aggregate findings
        Pipeline->>Pipeline: Build ClassificationResult
        Pipeline->>Obs: log CLASSIFICATION_COMPLETE (execution_id, finding_count)
        Pipeline-->>AE: return ClassificationResult
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

feat: add coordination error taxonomy classification pipeline (#146) #181 — Implements a near-identical ai_company.engine.classification package and AgentEngine integration; likely overlapping code and tests.
feat: implement single-task execution lifecycle (#21) #144 — Modifies AgentEngine execution/post-execution flow; intersects with the integration changes here.
feat: implement AgentEngine core orchestrator (#11) #143 — Introduced AgentEngine run_result and execution events consumed by the classification pipeline; foundational overlap.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: adding a coordination error taxonomy classification pipeline, which aligns with the primary objective and substantial code additions.
Description check	✅ Passed	The description provides relevant details about the implementation, test coverage, and documentation updates. It explains the purpose, functionality, and scope of the changes related to the error taxonomy classification pipeline.
Linked Issues check	✅ Passed	The PR implements all acceptance criteria from issue `#146`: four error detection categories [logical contradiction, numerical drift, context omission, coordination failure], opt-in configuration with per-category control, post-execution pipeline that never blocks, structured logging, graceful degradation, and comprehensive unit/integration tests.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#146` requirements. The new classification subpackage, detectors, models, pipeline integration, observability events, tests, and documentation updates are all in-scope for the M5 error taxonomy feature.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/coordination-error-taxonomy

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/coordination-error-taxonomy

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-09T08:08:54Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent engine's analytical capabilities by introducing a robust coordination error taxonomy classification pipeline. This new system allows for the identification and categorization of various agent coordination issues, such as logical contradictions and numerical discrepancies, without interfering with the primary agent execution flow. By providing structured insights into potential errors, it lays the groundwork for future programmatic access and more targeted diagnostics, ultimately improving the reliability and performance of agent operations.

Highlights

New Coordination Error Taxonomy Classification Pipeline: Introduced a new engine/classification/ subpackage to classify coordination errors based on a defined taxonomy, running post-execution to avoid blocking agent work.
Four Heuristic-Based Detectors: Implemented four heuristic-based detectors for logical contradiction, numerical drift, context omission, and coordination failure to identify specific types of errors.
Non-Blocking and Isolated Execution: The classification pipeline runs asynchronously after agent execution and is designed with per-detector isolation, ensuring that a failure in one detector does not impact others.
New Data Models for Error Findings: Defined ErrorFinding and ClassificationResult models, including turn_range validation and AwareDatetime for timestamps, to structure and store classification results.
Enhanced Observability: Integrated debug logging at detector entry/exit points and added structured event constants for improved observability of the classification process.
Documentation Updates: Updated DESIGN_SPEC.md and CLAUDE.md to reflect the new project structure, current state of LLM call analytics, and logging examples related to the error taxonomy.

Changelog

CLAUDE.md
- Updated the description of the engine/ subpackage to include coordination error classification.
- Added CLASSIFICATION_START to the list of example event names for structured logging.
DESIGN_SPEC.md
- Updated the 'Current state' for LLM Call Analytics to include the error taxonomy classification pipeline.
- Added a new 'Current state' section under 'M4/M5: Coordination Error Taxonomy' detailing the implementation of the classification pipeline and its detectors.
- Updated the project structure diagram to include the new classification/ subpackage within engine/.
src/ai_company/engine/init.py
- Imported ClassificationResult, ErrorFinding, ErrorSeverity, and classify_execution_errors from the new classification subpackage.
- Added the newly imported classification models and functions to the module's __all__ export list.
src/ai_company/engine/agent_engine.py
- Imported classify_execution_errors from ai_company.engine.classification.pipeline.
- Imported ErrorTaxonomyConfig for type checking.
- Added error_taxonomy_config as an optional parameter to the AgentEngine constructor.
- Updated the docstring for _post_execution_pipeline to include classification.
- Integrated the call to classify_execution_errors within the _post_execution_pipeline if error_taxonomy_config is provided and enabled.
src/ai_company/engine/classification/init.py
- Added new file for the coordination error classification pipeline.
- Re-exported public API components: ClassificationResult, ErrorFinding, ErrorSeverity, and classify_execution_errors.
src/ai_company/engine/classification/detectors.py
- Added new file implementing heuristic-based detectors for coordination errors.
- Implemented detect_logical_contradictions to find assertion negation pairs.
- Implemented detect_numerical_drift to identify significant changes in numerical values with context.
- Implemented detect_context_omissions to flag entities introduced early but absent later.
- Implemented detect_coordination_failures to detect tool execution errors and error finish reasons.
src/ai_company/engine/classification/models.py
- Added new file defining data models for error classification.
- Defined ErrorSeverity enum for severity levels (LOW, MEDIUM, HIGH).
- Defined ErrorFinding model to represent individual detected errors, including category, severity, description, evidence, and turn_range with validation.
- Defined ClassificationResult model to aggregate findings, execution metadata, and classification timestamp.
src/ai_company/engine/classification/pipeline.py
- Added new file for orchestrating the error classification pipeline.
- Implemented classify_execution_errors as the main entry point, handling configuration and logging.
- Implemented _run_detectors to execute enabled detectors with per-detector isolation for fault tolerance.
- Implemented _safe_detect to wrap detector calls, catching and logging exceptions while allowing MemoryError and RecursionError to propagate.
src/ai_company/observability/events/classification.py
- Added new file defining structured logging event constants for the classification pipeline, including start, complete, finding, error, and skipped events, as well as detector-specific events.
tests/integration/engine/test_error_taxonomy_integration.py
- Added new file for integration tests of the error taxonomy pipeline.
- Included tests for detecting contradictions in realistic conversations.
- Added tests for identifying coordination failures from tool errors and error finish reasons.
- Verified that all categories run together and that the pipeline does not block execution.
- Tested the behavior of a disabled taxonomy, ensuring it returns None quickly.
- Included tests for numerical drift and context omission with realistic data scenarios.
tests/unit/engine/test_agent_engine.py
- Imported ErrorTaxonomyConfig for testing.
- Added unit tests for AgentEngine to verify that classification is skipped when no config is provided.
- Added unit tests to confirm classify_execution_errors is invoked when a config is enabled.
- Added unit tests to ensure MemoryError propagates from the classification pipeline.
tests/unit/engine/test_classification_detectors.py
- Added new file for unit tests of individual coordination error detectors.
- Tested detect_logical_contradictions for clean conversations, contradictions, and ignored messages.
- Tested detect_numerical_drift for consistent numbers, detected drift, custom thresholds, and zero-value edge cases.
- Tested detect_context_omissions for entities referenced, dropped entities, short conversations, and common capitalized words.
- Tested detect_coordination_failures for no errors, tool execution errors, error finish reasons, and combined scenarios.
- Verified all detectors handle empty conversations gracefully.
tests/unit/engine/test_classification_models.py
- Added new file for unit tests of classification result models.
- Tested ErrorSeverity enum values and member count.
- Tested ErrorFinding construction, immutability, and validation for turn_range (non-negative, start <= end).
- Tested ClassificationResult construction, computed fields for findings, and immutability.
tests/unit/engine/test_classification_pipeline.py
- Added new file for unit tests of the error classification pipeline.
- Tested that a disabled config returns None.
- Tested that clean execution returns empty findings.
- Verified that only enabled categories are run.
- Tested exception handling, ensuring regular exceptions are caught while MemoryError and RecursionError propagate.
- Confirmed coordination failure findings are always high severity.
- Verified the result object correctly lists checked categories.
tests/unit/observability/test_events.py
- Imported new classification event constants.
- Updated the test_all_domain_modules_discovered to include the 'classification' module.
- Added test_classification_events_exist to verify the new classification event constants.

Activity

16 new tests added across unit and integration.
AgentEngine classification integration tested for no config, enabled config, and MemoryError propagation.
RecursionError propagation tested in the pipeline.
Zero-value drift edge cases (zero-to-nonzero, zero-to-zero) covered.
Common capitalized words filtering verified.
Empty conversation boundary tested for all four detectors.
Multiple contradictions and combined tool errors + finish reasons tested.
turn_range validation (negative indices, inverted range, valid range) confirmed.
All 4190 tests pass, 96.32% coverage.
Pre-reviewed by 8 agents, 24 findings addressed.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

src/ai_company/engine/classification/detectors.py

greptile-apps · 2026-03-09T08:12:09Z

Greptile Summary

This PR introduces a coordination error taxonomy classification pipeline (engine/classification/) with four heuristic-based detectors (logical contradiction, numerical drift, context omission, coordination failure). The architecture is well-structured with non-blocking post-execution placement, per-detector fault isolation, structured event constants, and clean Pydantic models.

Critical blocker: The PR contains Python 2 exception syntax (except MemoryError, RecursionError:) at three locations across two files that will cause immediate SyntaxError on import in Python 3.14+:

src/ai_company/engine/agent_engine.py (line 293)
src/ai_company/engine/classification/pipeline.py (lines 94, 242)

This must be corrected to except (MemoryError, RecursionError): in all three locations before the PR can be merged. The module cannot be imported or tested without this fix.

Confidence Score: 0/5

Not safe to merge — the PR contains three Python 2 exception syntax errors that will cause immediate SyntaxError on module import in Python 3.14+, preventing the entire classification pipeline from being imported or tested.
The codebase requires Python ≥3.14 (per pyproject.toml), but uses Python 2's comma-based exception syntax (except MemoryError, RecursionError:) at three locations across two critical files. This is a hard blocker — the modules cannot be imported, tests cannot run, and the feature is completely non-functional until corrected. While the architecture design is sound and tests are comprehensive, these syntax errors must be fixed before any other review is meaningful.
src/ai_company/engine/agent_engine.py (line 293) and src/ai_company/engine/classification/pipeline.py (lines 94, 242) — all three must be corrected from except MemoryError, RecursionError: to except (MemoryError, RecursionError):

_{Last reviewed commit: 421be92}

src/ai_company/engine/classification/pipeline.py

gemini-code-assist

Code Review

This pull request introduces a well-designed and thoroughly tested coordination error classification pipeline, including a new engine/classification subpackage and clean integration into AgentEngine. However, a high-severity Regular Expression Denial of Service (ReDoS) vulnerability was identified in the entity detection logic. Additionally, critical Python 2 style exception handling syntax (except A, B:) is used, which is invalid in Python 3 and will cause runtime SyntaxErrors, preventing the pipeline from executing. These issues must be addressed to ensure the security and reliability of the classification pipeline.

gemini-code-assist · 2026-03-09T08:13:23Z

src/ai_company/engine/classification/pipeline.py

+            execution_id=execution_id,
+            config=config,
+        )
+    except MemoryError, RecursionError:


This line uses Python 2 style except syntax (except MemoryError, RecursionError:), which is a SyntaxError in Python 3. This will prevent the module from loading and the pipeline from executing. The correct syntax for catching multiple exceptions is to enclose them in a tuple, e.g., except (Exception1, Exception2):. This issue is also present in other parts of the codebase, such as: src/ai_company/engine/classification/pipeline.py: line 231 and src/ai_company/engine/agent_engine.py: lines 188, 315, and 770.

except (MemoryError, RecursionError):

gemini-code-assist · 2026-03-09T08:13:23Z

src/ai_company/engine/classification/detectors.py

+    return tuple(findings)
+
+
+_ENTITY_PATTERN = re.compile(r"\b([A-Z][a-zA-Z]{2,}(?:[A-Z][a-zA-Z]*)*)\b")


The regular expression _ENTITY_PATTERN is vulnerable to Regular Expression Denial of Service (ReDoS) due to nested quantifiers. The pattern r"\b([A-Z][a-zA-Z]{2,}(?:[A-Z][a-zA-Z]*)*)\b" contains a nested group (?:[A-Z][a-zA-Z]*)* where both the inner and outer quantifiers can match capital letters. This ambiguity causes exponential backtracking when the regex engine encounters a long string of capital letters that is not followed by a word boundary (e.g., "AbcAAAAAAAAAAAAAAAAAAAA_"). Since this regex processes LLM-generated content, it could be exploited to cause a Denial of Service.

To remediate this, simplify the regular expression to avoid nested quantifiers. Since the pattern is intended to match a single word starting with a capital letter and having at least 3 characters, the nested group is redundant.

Suggested change

_ENTITY_PATTERN = re.compile(r"\b([A-Z][a-zA-Z]{2,}(?:[A-Z][a-zA-Z]*)*)\b")

_ENTITY_PATTERN = re.compile(r"\b[A-Z][a-zA-Z]{2,}\b")

gemini-code-assist · 2026-03-09T08:13:23Z

src/ai_company/engine/classification/pipeline.py

+    """
+    try:
+        return detector_fn()
+    except MemoryError, RecursionError:


This line contains a syntax error in Python 3. The correct syntax for catching multiple exceptions is except (Exception1, Exception2):. The current code except MemoryError, RecursionError: will raise a SyntaxError at runtime.

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

Copilot

Pull request overview

This PR adds a coordination error taxonomy classification pipeline (engine/classification/) that runs post-execution when opted in via AgentEngine(error_taxonomy_config=...). It implements §10.5 of the DESIGN_SPEC and closes issue #146. Four heuristic detectors (logical contradiction, numerical drift, context omission, coordination failure) analyze conversation histories after agent execution finishes. Results are currently log-only.

Changes:

New engine/classification/ subpackage with models.py, detectors.py, pipeline.py, and __init__.py
New observability/events/classification.py with 8 structured event constants; AgentEngine gains an error_taxonomy_config parameter that triggers post-execution classification
16 new tests across 4 files (3 unit, 1 integration) and documentation updates to DESIGN_SPEC.md and CLAUDE.md

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/ai_company/engine/classification/__init__.py`	Package re-exports for the public classification API
`src/ai_company/engine/classification/models.py`	`ErrorSeverity` enum, `ErrorFinding` and `ClassificationResult` Pydantic models
`src/ai_company/engine/classification/detectors.py`	Four pure-function heuristic detectors for the four error categories
`src/ai_company/engine/classification/pipeline.py`	`classify_execution_errors` async orchestrator with per-detector isolation
`src/ai_company/engine/agent_engine.py`	Adds `error_taxonomy_config` parameter and invokes classification in `_post_execution_pipeline`
`src/ai_company/engine/__init__.py`	Re-exports the new classification public API
`src/ai_company/observability/events/classification.py`	Eight `Final[str]` event constants for structured logging
`tests/unit/engine/test_classification_models.py`	Unit tests for `ErrorSeverity`, `ErrorFinding`, and `ClassificationResult`
`tests/unit/engine/test_classification_detectors.py`	Unit tests for all four detector functions
`tests/unit/engine/test_classification_pipeline.py`	Unit tests for the `classify_execution_errors` pipeline function
`tests/unit/engine/test_agent_engine.py`	New `TestAgentEngineClassification` class testing engine integration
`tests/integration/engine/test_error_taxonomy_integration.py`	End-to-end integration tests with realistic conversation patterns
`tests/unit/observability/test_events.py`	Adds `classification` to expected domain modules and event constant assertions
`DESIGN_SPEC.md`	Updates §10.5 current state and §15.3 project structure
`CLAUDE.md`	Updates engine description and adds classification event example to logging guidelines

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T08:19:05Z

tests/integration/engine/test_error_taxonomy_integration.py

+"""Integration tests for the error taxonomy pipeline.
+
+Verifies end-to-end classification with realistic conversation
+patterns and validates structured log events are emitted.
+"""
+
+import time
+from datetime import date
+from uuid import uuid4
+
+import pytest


This integration test file is missing the module-level pytestmark declaration. All other integration test files in tests/integration/engine/ consistently set pytestmark = [pytest.mark.integration, pytest.mark.timeout(30)] at module scope (see test_agent_engine_integration.py:48, test_crash_recovery.py:35, test_multi_agent_delegation.py:85). Without this, the tests in this file will not be tagged with the integration marker or protected by the 30-second timeout guard.

Copilot · 2026-03-09T08:19:06Z

src/ai_company/engine/classification/models.py

+    turn_range: tuple[int, int] | None = Field(
+        default=None,
+        description="Turn index range (start, end) where error observed",
+    )


The turn_range field on ErrorFinding is documented as "Turn index range (start, end) where error observed", but three of the four detectors populate it with conversation message indices (from _extract_assistant_texts, which uses position in the full conversation tuple including system/user/tool messages), not turn numbers. By contrast, detect_coordination_failures sets turn_range to (turn.turn_number, turn.turn_number) which are 1-based turn numbers from TurnRecord. The result is that turn_range values carry different semantics depending on which detector produced the finding, making them incomparable. Either the field name and docstring should be updated to "message_index_range", or the detectors should be updated to use consistent turn numbers.

Copilot · 2026-03-09T08:19:06Z

tests/unit/observability/test_events.py

+    def test_classification_events_exist(self) -> None:
+        assert CLASSIFICATION_START == "classification.start"
+        assert CLASSIFICATION_COMPLETE == "classification.complete"
+        assert CLASSIFICATION_FINDING == "classification.finding"
+        assert CLASSIFICATION_ERROR == "classification.error"
+        assert CLASSIFICATION_SKIPPED == "classification.skipped"


The test_classification_events_exist test only verifies 5 of the 8 constants defined in src/ai_company/observability/events/classification.py. The three per-detector lifecycle event constants (DETECTOR_START, DETECTOR_COMPLETE, DETECTOR_ERROR) are not asserted. Based on the convention in this same test class (e.g., test_conflict_events_exist checks all 21 conflict constants, test_workspace_events_exist checks all workspace constants), all defined event constants should be covered.

Copilot · 2026-03-09T08:19:06Z

src/ai_company/engine/classification/pipeline.py

+            execution_id=execution_id,
+            config=config,
+        )
+    except MemoryError, RecursionError:


Both except MemoryError, RecursionError: clauses use Python 2 syntax that is a SyntaxError in Python 3. In Python 3, catching multiple exception types requires a tuple: except (MemoryError, RecursionError):. The codebase's correct usage is except (MemoryError, RecursionError) as exc: (see src/ai_company/tools/invoker.py:224 and src/ai_company/engine/parallel.py:290). As written, these lines will cause a SyntaxError at import time, making the entire module unimportable and all the tests that mock _run_detectors will fail to even load the module under test.

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

Copilot · 2026-03-09T08:19:06Z

src/ai_company/engine/classification/pipeline.py

+    """
+    try:
+        return detector_fn()
+    except MemoryError, RecursionError:


Same Python 2 syntax issue: except MemoryError, RecursionError: is a SyntaxError in Python 3. It must be written as except (MemoryError, RecursionError): to correctly catch both exception types.

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/ai_company/engine/agent_engine.py (1)
85-109: 🧹 Nitpick | 🔵 Trivial

Document error_taxonomy_config in the constructor contract.

The constructor now exposes a new public parameter, but the Args: block still ends at shutdown_checker, so the public API docs no longer describe how classification is enabled.

As per coding guidelines, src/**/*.py: Use Google-style docstrings required on public classes and functions (enforced by ruff D rules).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 85 - 109, The docstring
Args section for the class constructor is missing documentation for the new
parameter error_taxonomy_config; update the constructor docstring (above def
__init__) to add an Args entry for error_taxonomy_config describing its type
(ErrorTaxonomyConfig | None), purpose (used to enable/configure error
classification), and default behavior (when None classification is disabled or
uses defaults), matching the style of the existing shutdown_checker line so the
public API docs correctly reflect the new parameter.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/engine/classification/detectors.py`:
- Around line 79-87: The detectors are mixing raw conversation offsets and
TurnRecord.turn_number when building ErrorFinding.turn_range (e.g., in
_extract_assistant_texts and other detectors that currently emit indices like
enumerate(conversation) or hard-coded 0); normalize all turn_range values to use
TurnRecord.turn_number consistently: convert any conversation-index (from
functions like _extract_assistant_texts) to the corresponding
TurnRecord.turn_number before creating ErrorFinding.turn_range, and update all
detector sites (the blocks around the referenced locations) to look up the
TurnRecord for that message and use its turn_number instead of raw enumerate
indices or constants so downstream consumers always receive turn indices.

In `@src/ai_company/engine/classification/pipeline.py`:
- Around line 74-80: The code generates a new UUID for execution_id which breaks
correlation; instead use the run's existing execution id from
execution_result.context.execution_id when populating
ClassificationResult.execution_id and when logging. Replace the local creation
of execution_id (the str(uuid4()) assignment) and pass
execution_result.context.execution_id into logger.info for CLASSIFICATION_START
(and ensure ClassificationResult.execution_id is set from
execution_result.context.execution_id) so all logs and the ClassificationResult
share the same execution identifier.

In `@tests/integration/engine/test_error_taxonomy_integration.py`:
- Around line 199-253: Remove the hard wall-clock assertions and instead assert
behavioral correctness: for test_disabled_taxonomy_returns_none_fast(), keep
config = ErrorTaxonomyConfig(enabled=False) and assert
classify_execution_errors(...) returns None, and also verify detectors were not
invoked by spying/mocking the detector functions used by
classify_execution_errors (or assert that the internal detector dispatch method
was not called); for test_pipeline_does_not_block_execution(), remove the
elapsed < 2.0 check and either assert result is not None and detectors produced
expected classifications or move the performance check into a separate slow test
decorated with pytest markers (e.g., `@pytest.mark.integration` and
`@pytest.mark.slow`) and a per-test timeout (pytest.mark.timeout(30)) so CI uses a
30s limit instead of brittle short wall-clock assertions; ensure you reference
classify_execution_errors, ErrorTaxonomyConfig, and the detector dispatch/spies
when adding the mocks.

In `@tests/unit/observability/test_events.py`:
- Around line 426-431: Extend the test_classification_events_exist test to also
assert the three detector constants (DETECTOR_START, DETECTOR_COMPLETE,
DETECTOR_ERROR) are defined and equal to their expected string values; locate
the constants in classification.py (referencing DETECTOR_START,
DETECTOR_COMPLETE, DETECTOR_ERROR) and add corresponding assertions alongside
the existing CLASSIFICATION_* assertions in the test_classification_events_exist
function to cover all eight event constants.

---

Outside diff comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 85-109: The docstring Args section for the class constructor is
missing documentation for the new parameter error_taxonomy_config; update the
constructor docstring (above def __init__) to add an Args entry for
error_taxonomy_config describing its type (ErrorTaxonomyConfig | None), purpose
(used to enable/configure error classification), and default behavior (when None
classification is disabled or uses defaults), matching the style of the existing
shutdown_checker line so the public API docs correctly reflect the new
parameter.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aa6b5103-bbce-49f7-867c-7aae34d0b435

📥 Commits

Reviewing files that changed from the base of the PR and between f753779 and b3f8a55.

📒 Files selected for processing (15)

CLAUDE.md
DESIGN_SPEC.md
src/ai_company/engine/__init__.py
src/ai_company/engine/agent_engine.py
src/ai_company/engine/classification/__init__.py
src/ai_company/engine/classification/detectors.py
src/ai_company/engine/classification/models.py
src/ai_company/engine/classification/pipeline.py
src/ai_company/observability/events/classification.py
tests/integration/engine/test_error_taxonomy_integration.py
tests/unit/engine/test_agent_engine.py
tests/unit/engine/test_classification_detectors.py
tests/unit/engine/test_classification_models.py
tests/unit/engine/test_classification_pipeline.py
tests/unit/observability/test_events.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Agent
GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14

Files:

src/ai_company/observability/events/classification.py
src/ai_company/engine/__init__.py
src/ai_company/engine/classification/__init__.py
tests/unit/engine/test_classification_detectors.py
src/ai_company/engine/classification/models.py
tests/unit/engine/test_classification_pipeline.py
tests/unit/engine/test_classification_models.py
tests/integration/engine/test_error_taxonomy_integration.py
src/ai_company/engine/classification/pipeline.py
src/ai_company/engine/classification/detectors.py
tests/unit/engine/test_agent_engine.py
src/ai_company/engine/agent_engine.py
tests/unit/observability/test_events.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Add type hints to all public functions and classes; mypy strict mode is enforced
Use Google-style docstrings required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones (immutability); for non-Pydantic internal collections use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Enforce 88 character line length (ruff)
Keep functions to less than 50 lines and files to less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

src/ai_company/observability/events/classification.py
src/ai_company/engine/__init__.py
src/ai_company/engine/classification/__init__.py
src/ai_company/engine/classification/models.py
src/ai_company/engine/classification/pipeline.py
src/ai_company/engine/classification/detectors.py
src/ai_company/engine/agent_engine.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)
Never use import logging, logging.getLogger(), or print() in application code — use the structured logger from ai_company.observability
Always use variable name logger (not _logger, not log) for the logger instance
Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events. import EVENT_CONSTANT
Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases

Files:

src/ai_company/observability/events/classification.py
src/ai_company/engine/__init__.py
src/ai_company/engine/classification/__init__.py
src/ai_company/engine/classification/models.py
src/ai_company/engine/classification/pipeline.py
src/ai_company/engine/classification/detectors.py
src/ai_company/engine/agent_engine.py

src/ai_company/{providers,engine}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains

Files:

src/ai_company/engine/__init__.py
src/ai_company/engine/classification/__init__.py
src/ai_company/engine/classification/models.py
src/ai_company/engine/classification/pipeline.py
src/ai_company/engine/classification/detectors.py
src/ai_company/engine/agent_engine.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to categorize tests
Configure asyncio_mode = 'auto' for pytest — no manual @pytest.mark.asyncio needed
Set test timeout to 30 seconds per test
Prefer @pytest.mark.parametrize for testing similar cases
Use generic test provider names (test-provider, test-small-001, etc.) in tests

Files:

tests/unit/engine/test_classification_detectors.py
tests/unit/engine/test_classification_pipeline.py
tests/unit/engine/test_classification_models.py
tests/integration/engine/test_error_taxonomy_integration.py
tests/unit/engine/test_agent_engine.py
tests/unit/observability/test_events.py

🧠 Learnings (7)