feat: add coordination error taxonomy classification pipeline (#146)#181
feat: add coordination error taxonomy classification pipeline (#146)#181
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (18)
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughAdds an opt-in post-execution coordination error taxonomy: new engine.classification package (models, detectors, pipeline), observability events, AgentEngine integration via error_taxonomy_config, and tests exercising detectors, pipeline behavior, and integration. Changes
Sequence Diagram(s)sequenceDiagram
participant AE as AgentEngine
participant Pipeline as Classification Pipeline
participant Detectors as Detector Suite
participant Obs as Observability
AE->>AE: Execution completes (messages, turns collected)
AE->>Pipeline: classify_execution_errors(execution_result, agent_id, task_id, config)
alt Config disabled
Pipeline->>Obs: log CLASSIFICATION_SKIPPED
Pipeline-->>AE: return None
else Config enabled
Pipeline->>Obs: log CLASSIFICATION_START (agent_id, task_id, execution_id, categories)
Pipeline->>Detectors: _run_detectors(enabled_categories)
par Detector Execution
Detectors->>Detectors: detect_logical_contradictions(conversation)
Detectors->>Detectors: detect_numerical_drift(conversation, threshold)
Detectors->>Detectors: detect_context_omissions(conversation)
Detectors->>Detectors: detect_coordination_failures(conversation, turns)
end
Detectors->>Obs: log CLASSIFICATION_FINDING (per finding)
Detectors-->>Pipeline: aggregate findings
Pipeline->>Pipeline: Build ClassificationResult
Pipeline->>Obs: log CLASSIFICATION_COMPLETE (execution_id, finding_count)
Pipeline-->>AE: return ClassificationResult
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the agent engine's analytical capabilities by introducing a robust coordination error taxonomy classification pipeline. This new system allows for the identification and categorization of various agent coordination issues, such as logical contradictions and numerical discrepancies, without interfering with the primary agent execution flow. By providing structured insights into potential errors, it lays the groundwork for future programmatic access and more targeted diagnostics, ultimately improving the reliability and performance of agent operations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Greptile SummaryThis PR introduces a coordination error taxonomy classification pipeline ( Critical blocker: The PR contains Python 2 exception syntax (
This must be corrected to Confidence Score: 0/5
Last reviewed commit: 421be92 |
There was a problem hiding this comment.
Code Review
This pull request introduces a well-designed and thoroughly tested coordination error classification pipeline, including a new engine/classification subpackage and clean integration into AgentEngine. However, a high-severity Regular Expression Denial of Service (ReDoS) vulnerability was identified in the entity detection logic. Additionally, critical Python 2 style exception handling syntax (except A, B:) is used, which is invalid in Python 3 and will cause runtime SyntaxErrors, preventing the pipeline from executing. These issues must be addressed to ensure the security and reliability of the classification pipeline.
| execution_id=execution_id, | ||
| config=config, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
This line uses Python 2 style except syntax (except MemoryError, RecursionError:), which is a SyntaxError in Python 3. This will prevent the module from loading and the pipeline from executing. The correct syntax for catching multiple exceptions is to enclose them in a tuple, e.g., except (Exception1, Exception2):. This issue is also present in other parts of the codebase, such as: src/ai_company/engine/classification/pipeline.py: line 231 and src/ai_company/engine/agent_engine.py: lines 188, 315, and 770.
except (MemoryError, RecursionError):| return tuple(findings) | ||
|
|
||
|
|
||
| _ENTITY_PATTERN = re.compile(r"\b([A-Z][a-zA-Z]{2,}(?:[A-Z][a-zA-Z]*)*)\b") |
There was a problem hiding this comment.
The regular expression _ENTITY_PATTERN is vulnerable to Regular Expression Denial of Service (ReDoS) due to nested quantifiers. The pattern r"\b([A-Z][a-zA-Z]{2,}(?:[A-Z][a-zA-Z]*)*)\b" contains a nested group (?:[A-Z][a-zA-Z]*)* where both the inner and outer quantifiers can match capital letters. This ambiguity causes exponential backtracking when the regex engine encounters a long string of capital letters that is not followed by a word boundary (e.g., "AbcAAAAAAAAAAAAAAAAAAAA_"). Since this regex processes LLM-generated content, it could be exploited to cause a Denial of Service.
To remediate this, simplify the regular expression to avoid nested quantifiers. Since the pattern is intended to match a single word starting with a capital letter and having at least 3 characters, the nested group is redundant.
| _ENTITY_PATTERN = re.compile(r"\b([A-Z][a-zA-Z]{2,}(?:[A-Z][a-zA-Z]*)*)\b") | |
| _ENTITY_PATTERN = re.compile(r"\b[A-Z][a-zA-Z]{2,}\b") |
| """ | ||
| try: | ||
| return detector_fn() | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
This line contains a syntax error in Python 3. The correct syntax for catching multiple exceptions is except (Exception1, Exception2):. The current code except MemoryError, RecursionError: will raise a SyntaxError at runtime.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
There was a problem hiding this comment.
Pull request overview
This PR adds a coordination error taxonomy classification pipeline (engine/classification/) that runs post-execution when opted in via AgentEngine(error_taxonomy_config=...). It implements §10.5 of the DESIGN_SPEC and closes issue #146. Four heuristic detectors (logical contradiction, numerical drift, context omission, coordination failure) analyze conversation histories after agent execution finishes. Results are currently log-only.
Changes:
- New
engine/classification/subpackage withmodels.py,detectors.py,pipeline.py, and__init__.py - New
observability/events/classification.pywith 8 structured event constants;AgentEnginegains anerror_taxonomy_configparameter that triggers post-execution classification - 16 new tests across 4 files (3 unit, 1 integration) and documentation updates to
DESIGN_SPEC.mdandCLAUDE.md
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai_company/engine/classification/__init__.py |
Package re-exports for the public classification API |
src/ai_company/engine/classification/models.py |
ErrorSeverity enum, ErrorFinding and ClassificationResult Pydantic models |
src/ai_company/engine/classification/detectors.py |
Four pure-function heuristic detectors for the four error categories |
src/ai_company/engine/classification/pipeline.py |
classify_execution_errors async orchestrator with per-detector isolation |
src/ai_company/engine/agent_engine.py |
Adds error_taxonomy_config parameter and invokes classification in _post_execution_pipeline |
src/ai_company/engine/__init__.py |
Re-exports the new classification public API |
src/ai_company/observability/events/classification.py |
Eight Final[str] event constants for structured logging |
tests/unit/engine/test_classification_models.py |
Unit tests for ErrorSeverity, ErrorFinding, and ClassificationResult |
tests/unit/engine/test_classification_detectors.py |
Unit tests for all four detector functions |
tests/unit/engine/test_classification_pipeline.py |
Unit tests for the classify_execution_errors pipeline function |
tests/unit/engine/test_agent_engine.py |
New TestAgentEngineClassification class testing engine integration |
tests/integration/engine/test_error_taxonomy_integration.py |
End-to-end integration tests with realistic conversation patterns |
tests/unit/observability/test_events.py |
Adds classification to expected domain modules and event constant assertions |
DESIGN_SPEC.md |
Updates §10.5 current state and §15.3 project structure |
CLAUDE.md |
Updates engine description and adds classification event example to logging guidelines |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Integration tests for the error taxonomy pipeline. | ||
|
|
||
| Verifies end-to-end classification with realistic conversation | ||
| patterns and validates structured log events are emitted. | ||
| """ | ||
|
|
||
| import time | ||
| from datetime import date | ||
| from uuid import uuid4 | ||
|
|
||
| import pytest |
There was a problem hiding this comment.
This integration test file is missing the module-level pytestmark declaration. All other integration test files in tests/integration/engine/ consistently set pytestmark = [pytest.mark.integration, pytest.mark.timeout(30)] at module scope (see test_agent_engine_integration.py:48, test_crash_recovery.py:35, test_multi_agent_delegation.py:85). Without this, the tests in this file will not be tagged with the integration marker or protected by the 30-second timeout guard.
| turn_range: tuple[int, int] | None = Field( | ||
| default=None, | ||
| description="Turn index range (start, end) where error observed", | ||
| ) |
There was a problem hiding this comment.
The turn_range field on ErrorFinding is documented as "Turn index range (start, end) where error observed", but three of the four detectors populate it with conversation message indices (from _extract_assistant_texts, which uses position in the full conversation tuple including system/user/tool messages), not turn numbers. By contrast, detect_coordination_failures sets turn_range to (turn.turn_number, turn.turn_number) which are 1-based turn numbers from TurnRecord. The result is that turn_range values carry different semantics depending on which detector produced the finding, making them incomparable. Either the field name and docstring should be updated to "message_index_range", or the detectors should be updated to use consistent turn numbers.
| def test_classification_events_exist(self) -> None: | ||
| assert CLASSIFICATION_START == "classification.start" | ||
| assert CLASSIFICATION_COMPLETE == "classification.complete" | ||
| assert CLASSIFICATION_FINDING == "classification.finding" | ||
| assert CLASSIFICATION_ERROR == "classification.error" | ||
| assert CLASSIFICATION_SKIPPED == "classification.skipped" |
There was a problem hiding this comment.
The test_classification_events_exist test only verifies 5 of the 8 constants defined in src/ai_company/observability/events/classification.py. The three per-detector lifecycle event constants (DETECTOR_START, DETECTOR_COMPLETE, DETECTOR_ERROR) are not asserted. Based on the convention in this same test class (e.g., test_conflict_events_exist checks all 21 conflict constants, test_workspace_events_exist checks all workspace constants), all defined event constants should be covered.
| execution_id=execution_id, | ||
| config=config, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
Both except MemoryError, RecursionError: clauses use Python 2 syntax that is a SyntaxError in Python 3. In Python 3, catching multiple exception types requires a tuple: except (MemoryError, RecursionError):. The codebase's correct usage is except (MemoryError, RecursionError) as exc: (see src/ai_company/tools/invoker.py:224 and src/ai_company/engine/parallel.py:290). As written, these lines will cause a SyntaxError at import time, making the entire module unimportable and all the tests that mock _run_detectors will fail to even load the module under test.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| """ | ||
| try: | ||
| return detector_fn() | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
Same Python 2 syntax issue: except MemoryError, RecursionError: is a SyntaxError in Python 3. It must be written as except (MemoryError, RecursionError): to correctly catch both exception types.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ai_company/engine/agent_engine.py (1)
85-109: 🧹 Nitpick | 🔵 TrivialDocument
error_taxonomy_configin the constructor contract.The constructor now exposes a new public parameter, but the
Args:block still ends atshutdown_checker, so the public API docs no longer describe how classification is enabled.As per coding guidelines,
src/**/*.py: Use Google-style docstrings required on public classes and functions (enforced by ruff D rules).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/agent_engine.py` around lines 85 - 109, The docstring Args section for the class constructor is missing documentation for the new parameter error_taxonomy_config; update the constructor docstring (above def __init__) to add an Args entry for error_taxonomy_config describing its type (ErrorTaxonomyConfig | None), purpose (used to enable/configure error classification), and default behavior (when None classification is disabled or uses defaults), matching the style of the existing shutdown_checker line so the public API docs correctly reflect the new parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/engine/classification/detectors.py`:
- Around line 79-87: The detectors are mixing raw conversation offsets and
TurnRecord.turn_number when building ErrorFinding.turn_range (e.g., in
_extract_assistant_texts and other detectors that currently emit indices like
enumerate(conversation) or hard-coded 0); normalize all turn_range values to use
TurnRecord.turn_number consistently: convert any conversation-index (from
functions like _extract_assistant_texts) to the corresponding
TurnRecord.turn_number before creating ErrorFinding.turn_range, and update all
detector sites (the blocks around the referenced locations) to look up the
TurnRecord for that message and use its turn_number instead of raw enumerate
indices or constants so downstream consumers always receive turn indices.
In `@src/ai_company/engine/classification/pipeline.py`:
- Around line 74-80: The code generates a new UUID for execution_id which breaks
correlation; instead use the run's existing execution id from
execution_result.context.execution_id when populating
ClassificationResult.execution_id and when logging. Replace the local creation
of execution_id (the str(uuid4()) assignment) and pass
execution_result.context.execution_id into logger.info for CLASSIFICATION_START
(and ensure ClassificationResult.execution_id is set from
execution_result.context.execution_id) so all logs and the ClassificationResult
share the same execution identifier.
In `@tests/integration/engine/test_error_taxonomy_integration.py`:
- Around line 199-253: Remove the hard wall-clock assertions and instead assert
behavioral correctness: for test_disabled_taxonomy_returns_none_fast(), keep
config = ErrorTaxonomyConfig(enabled=False) and assert
classify_execution_errors(...) returns None, and also verify detectors were not
invoked by spying/mocking the detector functions used by
classify_execution_errors (or assert that the internal detector dispatch method
was not called); for test_pipeline_does_not_block_execution(), remove the
elapsed < 2.0 check and either assert result is not None and detectors produced
expected classifications or move the performance check into a separate slow test
decorated with pytest markers (e.g., `@pytest.mark.integration` and
`@pytest.mark.slow`) and a per-test timeout (pytest.mark.timeout(30)) so CI uses a
30s limit instead of brittle short wall-clock assertions; ensure you reference
classify_execution_errors, ErrorTaxonomyConfig, and the detector dispatch/spies
when adding the mocks.
In `@tests/unit/observability/test_events.py`:
- Around line 426-431: Extend the test_classification_events_exist test to also
assert the three detector constants (DETECTOR_START, DETECTOR_COMPLETE,
DETECTOR_ERROR) are defined and equal to their expected string values; locate
the constants in classification.py (referencing DETECTOR_START,
DETECTOR_COMPLETE, DETECTOR_ERROR) and add corresponding assertions alongside
the existing CLASSIFICATION_* assertions in the test_classification_events_exist
function to cover all eight event constants.
---
Outside diff comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 85-109: The docstring Args section for the class constructor is
missing documentation for the new parameter error_taxonomy_config; update the
constructor docstring (above def __init__) to add an Args entry for
error_taxonomy_config describing its type (ErrorTaxonomyConfig | None), purpose
(used to enable/configure error classification), and default behavior (when None
classification is disabled or uses defaults), matching the style of the existing
shutdown_checker line so the public API docs correctly reflect the new
parameter.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: aa6b5103-bbce-49f7-867c-7aae34d0b435
📒 Files selected for processing (15)
CLAUDE.mdDESIGN_SPEC.mdsrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/classification/__init__.pysrc/ai_company/engine/classification/detectors.pysrc/ai_company/engine/classification/models.pysrc/ai_company/engine/classification/pipeline.pysrc/ai_company/observability/events/classification.pytests/integration/engine/test_error_taxonomy_integration.pytests/unit/engine/test_agent_engine.pytests/unit/engine/test_classification_detectors.pytests/unit/engine/test_classification_models.pytests/unit/engine/test_classification_pipeline.pytests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649
Use PEP 758 except syntax: useexcept A, B:(no parentheses) — ruff enforces this on Python 3.14
Files:
src/ai_company/observability/events/classification.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/classification/__init__.pytests/unit/engine/test_classification_detectors.pysrc/ai_company/engine/classification/models.pytests/unit/engine/test_classification_pipeline.pytests/unit/engine/test_classification_models.pytests/integration/engine/test_error_taxonomy_integration.pysrc/ai_company/engine/classification/pipeline.pysrc/ai_company/engine/classification/detectors.pytests/unit/engine/test_agent_engine.pysrc/ai_company/engine/agent_engine.pytests/unit/observability/test_events.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Add type hints to all public functions and classes; mypy strict mode is enforced
Use Google-style docstrings required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones (immutability); for non-Pydantic internal collections use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use@computed_fieldfor derived values; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Enforce 88 character line length (ruff)
Keep functions to less than 50 lines and files to less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/observability/events/classification.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/classification/__init__.pysrc/ai_company/engine/classification/models.pysrc/ai_company/engine/classification/pipeline.pysrc/ai_company/engine/classification/detectors.pysrc/ai_company/engine/agent_engine.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)
Never useimport logging,logging.getLogger(), orprint()in application code — use the structured logger from ai_company.observability
Always use variable namelogger(not_logger, notlog) for the logger instance
Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events. import EVENT_CONSTANT
Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Files:
src/ai_company/observability/events/classification.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/classification/__init__.pysrc/ai_company/engine/classification/models.pysrc/ai_company/engine/classification/pipeline.pysrc/ai_company/engine/classification/detectors.pysrc/ai_company/engine/agent_engine.py
src/ai_company/{providers,engine}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains
Files:
src/ai_company/engine/__init__.pysrc/ai_company/engine/classification/__init__.pysrc/ai_company/engine/classification/models.pysrc/ai_company/engine/classification/pipeline.pysrc/ai_company/engine/classification/detectors.pysrc/ai_company/engine/agent_engine.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
Configure asyncio_mode = 'auto' for pytest — no manual@pytest.mark.asyncioneeded
Set test timeout to 30 seconds per test
Prefer@pytest.mark.parametrizefor testing similar cases
Use generic test provider names (test-provider, test-small-001, etc.) in tests
Files:
tests/unit/engine/test_classification_detectors.pytests/unit/engine/test_classification_pipeline.pytests/unit/engine/test_classification_models.pytests/integration/engine/test_error_taxonomy_integration.pytests/unit/engine/test_agent_engine.pytests/unit/observability/test_events.py
🧠 Learnings (7)
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Applied to files:
src/ai_company/observability/events/classification.pyCLAUDE.mdtests/unit/observability/test_events.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(__name__)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Never use `import logging`, `logging.getLogger()`, or `print()` in application code — use the structured logger from ai_company.observability
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use variable name `logger` (not `_logger`, not `log`) for the logger instance
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Pure data models, enums, and re-exports do NOT need logging
Applied to files:
CLAUDE.md
🧬 Code graph analysis (8)
src/ai_company/engine/__init__.py (2)
src/ai_company/engine/classification/models.py (3)
ClassificationResult(71-110)ErrorFinding(32-68)ErrorSeverity(24-29)src/ai_company/engine/classification/pipeline.py (1)
classify_execution_errors(43-107)
src/ai_company/engine/classification/__init__.py (2)
src/ai_company/engine/classification/models.py (3)
ClassificationResult(71-110)ErrorFinding(32-68)ErrorSeverity(24-29)src/ai_company/engine/classification/pipeline.py (1)
classify_execution_errors(43-107)
src/ai_company/engine/classification/models.py (1)
src/ai_company/budget/coordination_config.py (1)
ErrorCategory(23-29)
tests/unit/engine/test_classification_pipeline.py (8)
src/ai_company/budget/coordination_config.py (2)
ErrorCategory(23-29)ErrorTaxonomyConfig(32-57)src/ai_company/core/agent.py (2)
AgentIdentity(246-304)ModelConfig(145-174)src/ai_company/engine/classification/models.py (3)
ErrorSeverity(24-29)finding_count(102-104)has_findings(108-110)src/ai_company/engine/classification/pipeline.py (1)
classify_execution_errors(43-107)src/ai_company/engine/context.py (3)
AgentContext(87-307)from_identity(140-171)with_message(173-182)src/ai_company/engine/loop_protocol.py (3)
ExecutionResult(78-135)TerminationReason(28-35)TurnRecord(38-75)src/ai_company/providers/enums.py (2)
FinishReason(15-22)MessageRole(6-12)src/ai_company/providers/models.py (2)
ChatMessage(138-210)ToolResult(122-135)
tests/unit/engine/test_classification_models.py (1)
src/ai_company/engine/classification/models.py (5)
ClassificationResult(71-110)ErrorFinding(32-68)ErrorSeverity(24-29)finding_count(102-104)has_findings(108-110)
src/ai_company/engine/classification/pipeline.py (4)
src/ai_company/budget/coordination_config.py (2)
ErrorCategory(23-29)ErrorTaxonomyConfig(32-57)src/ai_company/engine/classification/detectors.py (4)
detect_context_omissions(328-402)detect_coordination_failures(405-470)detect_logical_contradictions(222-270)detect_numerical_drift(273-325)src/ai_company/engine/classification/models.py (3)
ClassificationResult(71-110)ErrorFinding(32-68)finding_count(102-104)src/ai_company/engine/loop_protocol.py (1)
ExecutionResult(78-135)
tests/unit/engine/test_agent_engine.py (2)
src/ai_company/budget/coordination_config.py (1)
ErrorTaxonomyConfig(32-57)src/ai_company/engine/agent_engine.py (1)
run(124-207)
src/ai_company/engine/agent_engine.py (2)
src/ai_company/engine/classification/pipeline.py (1)
classify_execution_errors(43-107)src/ai_company/budget/coordination_config.py (1)
ErrorTaxonomyConfig(32-57)
🪛 GitHub Check: CodeQL
src/ai_company/engine/classification/detectors.py
[failure] 207-207: Inefficient regular expression
This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'A'.
🪛 LanguageTool
CLAUDE.md
[style] ~86-~86: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
🔇 Additional comments (16)
src/ai_company/observability/events/classification.py (1)
1-14: LGTM!Clean event constant definitions following the established pattern. The constants are properly typed with
Final[str]and values follow thedomain.subject.qualifiernaming convention consistent with other event modules.tests/unit/observability/test_events.py (1)
11-17: LGTM!Imports correctly added for the classification event constants.
CLAUDE.md (2)
52-52: LGTM!Engine description correctly updated to reflect the new coordination error classification capability.
86-86: LGTM!Logging documentation appropriately extended with the new
CLASSIFICATION_STARTevent example, maintaining consistency with other domain event examples.src/ai_company/engine/classification/__init__.py (1)
1-19: LGTM!Clean package initializer with well-defined public API. The re-exports are properly documented in
__all__and the docstring accurately describes the module's purpose.tests/unit/engine/test_agent_engine.py (2)
5-5: LGTM!Import additions are appropriate for the new classification tests.
Also applies to: 9-9
773-856: LGTM!Well-structured tests covering the three key scenarios for error taxonomy classification integration:
- No config → classification skipped
- Enabled config → classification invoked
- MemoryError → propagates unconditionally
The tests properly use
AsyncMockand patch the correct module path.src/ai_company/engine/__init__.py (2)
31-36: LGTM!Correct import of classification public API entities from the new subpackage.
173-173: LGTM!
__all__properly extended with classification exports, maintaining alphabetical ordering.Also applies to: 188-189, 265-265
tests/unit/engine/test_classification_models.py (3)
1-13: LGTM!Clean test file setup with proper imports and pytest marker.
50-51: Acceptable use of broad exception catch with noqa.The
pytest.raises(Exception)pattern with# noqa: B017, PT011is acknowledged. While catchingValidationErrorwould be more precise for Pydantic frozen model violations andValueErrorfor turn_range validation, the current approach avoids coupling tests to Pydantic internals.Also applies to: 77-84, 86-93, 173-174
155-164: Good timestamp boundary test.The test correctly validates that
classified_atdefaults to the current time by capturing before/after timestamps.src/ai_company/engine/classification/models.py (4)
1-22: LGTM!Clean module setup with appropriate imports. The
# noqa: TC001comments correctly indicate type-checking-only imports that are used in type annotations.
24-30: LGTM!Simple and well-documented
StrEnumfor severity levels.
32-68: LGTM!
ErrorFindingmodel is well-designed:
- Frozen for immutability as required
NotBlankStrfor description field- Proper validation of
turn_rangeensuring non-negative indices and start ≤ end- Google-style docstring with attribute descriptions
71-110: LGTM!
ClassificationResultmodel follows best practices:
- Frozen for immutability
NotBlankStrfor identifier fields per coding guidelinesAwareDatetimewith UTC default for timezone-aware timestamps@computed_fieldfor derived values (finding_count,has_findings) as required by guidelines- Tuple types for immutable collections
| def _extract_assistant_texts( | ||
| conversation: tuple[ChatMessage, ...], | ||
| ) -> list[tuple[int, str]]: | ||
| """Extract (index, text) pairs from assistant messages.""" | ||
| return [ | ||
| (i, msg.content) | ||
| for i, msg in enumerate(conversation) | ||
| if msg.role == MessageRole.ASSISTANT and msg.content | ||
| ] |
There was a problem hiding this comment.
Normalize turn_range across all detectors.
These findings mix raw conversation offsets (and even a hard-coded 0) with TurnRecord.turn_number. ErrorFinding.turn_range is documented as turn indices, so downstream consumers will get incompatible coordinates depending on category.
Also applies to: 197-201, 257-261, 389-393, 442-458
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/classification/detectors.py` around lines 79 - 87, The
detectors are mixing raw conversation offsets and TurnRecord.turn_number when
building ErrorFinding.turn_range (e.g., in _extract_assistant_texts and other
detectors that currently emit indices like enumerate(conversation) or hard-coded
0); normalize all turn_range values to use TurnRecord.turn_number consistently:
convert any conversation-index (from functions like _extract_assistant_texts) to
the corresponding TurnRecord.turn_number before creating
ErrorFinding.turn_range, and update all detector sites (the blocks around the
referenced locations) to look up the TurnRecord for that message and use its
turn_number instead of raw enumerate indices or constants so downstream
consumers always receive turn indices.
Implement the error classification pipeline for coordination metrics (DESIGN_SPEC §10.5). Four detector functions analyse conversation histories for logical contradictions, numerical drift, context omissions, and coordination failures. The pipeline integrates into AgentEngine._post_execution_pipeline() and never blocks execution. New files: - engine/classification/ subpackage (models, detectors, pipeline) - observability/events/classification.py (event constants) - Unit tests: models, detectors, pipeline (42 tests) - Integration tests: full pipeline scenarios (7 tests) Modified: - engine/agent_engine.py: error_taxonomy_config param + _classify_errors() - engine/__init__.py: re-export classification types - tests/unit/observability/test_events.py: register classification module
Pre-reviewed by 8 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, docs-consistency), 24 findings addressed: Source improvements: - Fix TYPE_CHECKING import ordering in pipeline.py - Add per-detector isolation (one broken detector doesn't kill others) - Add turn_range validation (start <= end, non-negative) - Use AwareDatetime for classified_at (rejects naive datetimes) - Remove dead except Exception in agent_engine (pipeline already catches) - Inline _classify_errors to reduce agent_engine.py toward 800-line limit - Add debug logging to all detector entry/exit points - Reorder _compute_drift and _check_drift_in_group before their caller - Refactor _check_drift_in_group to return tuple instead of mutating list - Document _compute_drift zero-baseline edge case behavior - Fix constant pseudo-docstring to use comment syntax Test additions (16 new tests): - AgentEngine classification integration (3 paths: no config, enabled, MemoryError) - RecursionError propagation in pipeline - Zero-value drift edge cases (zero-to-nonzero, zero-to-zero) - Common capitalised words filtering in context omissions - Empty conversation for all four detectors - Multiple contradictions in one conversation - Combined tool errors + error finish reasons - turn_range validation (negative, inverted, valid) Documentation updates: - DESIGN_SPEC.md: add classification/ to §15.3 project structure - DESIGN_SPEC.md: add Current state callout to §10.5 error taxonomy - DESIGN_SPEC.md: update engine/ description with classification - CLAUDE.md: update engine/ description and logging event examples
- Extract validation methods from agent_engine.py into validation.py (739 lines, under 800 limit) - Fix ReDoS vulnerability in _ENTITY_PATTERN regex (linear-time matching) - Fix turn_range semantic inconsistency (use message indices consistently) - Add cross-field validator for ClassificationResult findings vs categories - Use execution_result.context.execution_id instead of generating fresh UUIDs - Add threshold_percent validation in detect_numerical_drift - Improve classification isolation in engine (re-raise MemoryError/RecursionError) - Add per-detector isolation, MemoryError propagation, and empty categories tests - Add DETECTOR_START/COMPLETE/ERROR event assertions - Remove wall-clock assertions from integration tests, add pytestmark - Add coordination error taxonomy to README implemented section - Fix dependency-review.yml inline YAML comment in allow-licenses - Use NotBlankStr for evidence tuples and pipeline parameters
312d94a to
421be92
Compare
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Python 2 except syntax is a SyntaxError in Python 3
except MemoryError, RecursionError: is the old Python 2 form that bound the exception to a variable — it was completely removed in Python 3. In Python 3, catching a tuple of exception types requires parentheses:
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): | |
| raise |
Note: The same syntax error appears in src/ai_company/engine/classification/pipeline.py at lines 94 and 242 and must be corrected there as well.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 293-294
Comment:
**Python 2 `except` syntax is a `SyntaxError` in Python 3**
`except MemoryError, RecursionError:` is the old Python 2 form that bound the exception to a variable — it was completely removed in Python 3. In Python 3, catching a tuple of exception types requires parentheses:
```suggestion
except (MemoryError, RecursionError):
raise
```
Note: The same syntax error appears in `src/ai_company/engine/classification/pipeline.py` at lines 94 and 242 and must be corrected there as well.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_frozen(self) -> None: | ||
| result = ClassificationResult( | ||
| execution_id="exec-005", | ||
| agent_id="agent-1", | ||
| task_id="task-1", | ||
| categories_checked=(), | ||
| ) | ||
| with pytest.raises(Exception): # noqa: B017, PT011 |
There was a problem hiding this comment.
The ClassificationResult._validate_findings_match_categories model validator (which raises ValueError when findings contain categories not in categories_checked) has no test coverage. Since ErrorFinding.turn_range validation is covered, this validator should be tested similarly — e.g., constructing a ClassificationResult with a finding whose category is not in categories_checked and asserting an exception is raised.
| task_id, | ||
| config=self._error_taxonomy_config, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
Python 3 syntax error: except MemoryError, RecursionError: is invalid Python 3 syntax. The correct syntax is except (MemoryError, RecursionError):. This was introduced as new code in the _post_execution_pipeline method as part of this PR.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| """Integration tests for the error taxonomy pipeline. | ||
|
|
||
| Verifies end-to-end classification with realistic conversation | ||
| patterns and validates structured log events are emitted. |
There was a problem hiding this comment.
The module-level docstring states it "validates structured log events are emitted", but none of the tests in the file actually capture, inspect, or assert on any structured log events. The docstring should be updated to accurately reflect what the tests verify (end-to-end classification with realistic conversation patterns), or log event verification should be added.
| patterns and validates structured log events are emitted. | |
| patterns. |
| f"Turn {turn.turn_number} (index {turn_idx}): " | ||
| f"finish_reason={turn.finish_reason.value}", | ||
| ), | ||
| turn_range=(turn_idx, turn_idx), |
There was a problem hiding this comment.
Semantic inconsistency in turn_range usage within detect_coordination_failures: for tool execution errors (line 449), turn_range=(i, i) uses the message index from the conversation tuple (0-based position in the full conversation). For error finish reason findings (line 464), turn_range=(turn_idx, turn_idx) uses the index within the turns tuple, which is a separate index space with a different cardinality than the conversation. A consumer of ErrorFinding would have no way to know which index space applies. Since the ErrorFinding docstring describes turn_range as "Message index range (start, end) where error observed", the turn_idx usage is semantically incorrect. Either both usages should use conversation message indices (mapping each turn to its corresponding messages), or the field semantics should be clarified explicitly.
| turn_range=(turn_idx, turn_idx), |
🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
engine/classification/subpackage with coordination error taxonomy classification pipeline (§10.5)AgentEngine(error_taxonomy_config=...), never blocks agent workErrorFindingmodel withturn_rangevalidation (non-negative, start ≤ end) andAwareDatetimefor timestampsTest plan
AgentEngineclassification integration tested (no config, enabled config, MemoryError propagation)RecursionErrorpropagation tested in pipelineturn_rangevalidation (negative indices, inverted range, valid range)Closes #146