Skip to content

feat: wire all modules into observability system#97

Merged
Aureliolo merged 3 commits intomainfrom
feat/wire-observability-system
Mar 1, 2026
Merged

feat: wire all modules into observability system#97
Aureliolo merged 3 commits intomainfrom
feat/wire-observability-system

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Closes #91

  • Events: Created observability/events.py with ~30 structured event constants following domain.noun.verb pattern
  • Correlation: Added with_correlation_async() decorator mirroring the sync version for async functions
  • Sink routing: Implemented _LoggerNameFilter and _SINK_ROUTING dict routing security→audit.log, budget+providers→cost_usage.log, engine+core→agent_activity.log
  • Module migration: Migrated 5 modules from stdlib logging.getLogger to structlog get_logger (config/loader, templates/loader, templates/renderer, litellm_driver, mappers)
  • New logging: Added structured logging to 5 modules that had none (providers/base, providers/registry, core/task, core/task_transitions, core/role_catalog)
  • Bootstrap: Added bootstrap_logging() function to wire configure_logging() into the startup path
  • Tests: 40+ new tests across 8 new test files and 6 extended test files, including integration tests for sink routing
  • Test infra: Fixed structlog proxy caching issue by preventing cache_logger_on_first_use during tests
  • Docs: Added ## Logging section to CLAUDE.md with mandatory conventions, added logging-audit agent to PR review skill

Verification

  • Zero raw logging.getLogger in app code (only in observability/setup.py)
  • Zero print() calls in app code
  • All 1380 tests pass (1332 unit + 48 integration)
  • Lint, format, and mypy all clean

Test plan

  • uv run ruff check src/ tests/ — zero errors
  • uv run ruff format --check src/ tests/ — all formatted
  • uv run mypy src/ tests/ — zero errors
  • uv run pytest tests/ -m "unit or integration" --timeout=30 — 1380 passed
  • Verify rg "logging\.getLogger" src/ai_company/ --glob '!observability/setup.py' — 0 matches
  • CI pipeline passes (lint + mypy + test + coverage)

Migrate 5 modules from stdlib logging to structlog get_logger, add
structured logging to 5 more business-logic modules, implement sink
routing, add with_correlation_async, create event constants, and wire
bootstrap_logging into the startup path.

- Create events.py with ~30 structured event name constants
- Add with_correlation_async() decorator for async functions
- Implement _LoggerNameFilter and _SINK_ROUTING for audit/cost/agent sinks
- Migrate config/loader, templates/loader, templates/renderer,
  litellm_driver, mappers from stdlib to get_logger
- Add logging to providers/base, providers/registry, core/task,
  core/task_transitions, core/role_catalog
- Add bootstrap_logging() to config/loader and export from config/__init__
- Add syrupy test dependency
- Add 40+ new tests (events, correlation_async, sink_routing,
  bootstrap_logging, base_provider logging, integration sink routing)
- Add logging tests to existing test files (loader, task, transitions,
  registry, renderer)
- Fix structlog proxy caching issue in tests/conftest.py
- Add Logging section to CLAUDE.md with mandatory conventions
- Add logging-audit agent to PR review skill
Copilot AI review requested due to automatic review settings March 1, 2026 18:51
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 1, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 1, 2026

Warning

Rate limit exceeded

@Aureliolo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 46 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 1a304cb and f07de33.

📒 Files selected for processing (8)
  • src/ai_company/observability/correlation.py
  • src/ai_company/observability/sinks.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/templates/loader.py
  • tests/conftest.py
  • tests/integration/observability/conftest.py
  • tests/unit/observability/conftest.py
📝 Walkthrough

Walkthrough

This PR implements comprehensive observability infrastructure by introducing structured event constants, wiring observability across core modules, adding async correlation support, implementing logger-to-sink routing, and instrumenting 15+ modules with event-based logging. It includes bootstrap_logging initialization, new public APIs, extensive test coverage, and documentation updates.

Changes

Cohort / File(s) Summary
Observability Foundation
src/ai_company/observability/events.py, src/ai_company/observability/correlation.py, src/ai_company/observability/sinks.py, src/ai_company/observability/__init__.py
Introduces 50 event name constants (CONFIG_, PROVIDER_, TASK_, TEMPLATE_, ROLE_*) following domain.noun.verb pattern; adds async correlation decorator with_correlation_async for async functions with context binding/unbinding; implements logger-to-sink routing via _LoggerNameFilter and _SINK_ROUTING mapping to route specific logger namespaces to dedicated files.
Config System Integration
src/ai_company/config/loader.py, src/ai_company/config/__init__.py
Adds bootstrap_logging() function to initialize observability post-config validation; replaces stdlib logger with get_logger and instruments key paths (discovery, YAML parsing, validation, overrides) with 10 CONFIG_* events; exports bootstrap_logging in public API.
Core Domain Instrumentation
src/ai_company/core/task.py, src/ai_company/core/task_transitions.py, src/ai_company/core/role_catalog.py
Adds event-based logging to task status transitions (TASK_STATUS_CHANGED), transition validation (TASK_TRANSITION_INVALID, TASK_TRANSITION_CONFIG_ERROR with critical/warning levels), and role lookup misses (ROLE_LOOKUP_MISS).
Provider System Instrumentation
src/ai_company/providers/base.py, src/ai_company/providers/drivers/litellm_driver.py, src/ai_company/providers/drivers/mappers.py, src/ai_company/providers/registry.py
Comprehensively instruments provider lifecycle with 20+ PROVIDER_* events (call start/success/error, stream events, tool-call issues, auth/connection/rate-limit errors, model info availability); replaces stdlib logger with get_logger; updates _map_exception return type from Exception to errors.ProviderError; adjusts _parse_arguments signature to accept Any type.
Template System Instrumentation
src/ai_company/templates/loader.py, src/ai_company/templates/renderer.py
Replaces stdlib logging with get_logger; adds event-based logging for template load lifecycle (TEMPLATE_LOAD_START/SUCCESS/ERROR, TEMPLATE_LIST_SKIP_INVALID, TEMPLATE_BUILTIN_DEFECT, TEMPLATE_PASS1_FLOAT_FALLBACK) and render lifecycle (TEMPLATE_RENDER_START/SUCCESS/ERROR, variable/Jinja2/YAML/validation errors, unknown personality presets).
Documentation & Policy
.claude/skills/aurelio-review-pr/skill.md, CLAUDE.md
Adds logging-audit agent entry to PR review skill for checking logging compliance; adds mandatory logging policy blocks to CLAUDE.md covering ai_company.observability usage, prohibition of logging.getLogger() and print(), logger naming, structured logging with kwargs, and event-based logging on error paths and state transitions.
Testing Infrastructure
tests/conftest.py, tests/unit/observability/conftest.py, tests/integration/observability/conftest.py
Adds structlog proxy caching disable patch for test isolation; introduces captured_logs fixture for field-level log assertions; adds _reset_logging fixture with _clear_logging_state to isolate logging configuration across tests.
Unit Tests - Events & Correlation
tests/unit/observability/test_events.py, tests/unit/observability/test_correlation_async.py, tests/unit/observability/test_sink_routing.py
Validates 50 event constants for domain.noun.verb pattern, uniqueness, and minimum count; tests async correlation decorator for binding/unbinding request_id/task_id/agent_id with error path coverage; validates logger name filtering and sink routing configuration (audit, cost_usage, agent_activity files).
Unit Tests - Config & Core
tests/unit/config/test_bootstrap_logging.py, tests/unit/config/test_loader.py, tests/unit/core/test_task.py, tests/unit/core/test_task_transitions.py
Tests bootstrap_logging forwarding to configure_logging; validates CONFIG_LOADED/PARSE_FAILED/VALIDATION_FAILED events in loader; verifies TASK_STATUS_CHANGED events on transitions; tests TASK_TRANSITION_INVALID warning logging on invalid transitions.
Unit Tests - Providers & Templates
tests/unit/providers/test_base_provider.py, tests/unit/providers/test_registry.py, tests/unit/templates/test_renderer.py
Tests PROVIDER_CALL_START/SUCCESS/ERROR and PROVIDER_STREAM_START event emission; validates PROVIDER_REGISTRY_BUILT and PROVIDER_DRIVER_NOT_REGISTERED events; verifies TEMPLATE_RENDER_START/SUCCESS event capture.
Integration Tests
tests/integration/observability/test_sink_routing_integration.py
Adds four integration tests verifying logger-to-sink routing: security events route to audit_log, budget events to cost_usage.log, engine events to agent_activity.log, and errors.log captures only error-level messages.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes span 25+ files across multiple domains with heterogeneous modifications: new public APIs (bootstrap_logging, with_correlation_async, 50 event constants), async context binding logic, provider error mapping signature change, logger-to-sink routing with filtering, and dense event-based instrumentation across 10+ modules. While individual module changes follow consistent patterns, the interplay between async correlation context handling, sink routing configuration, event constant usage, and test isolation requires careful verification of correctness and consistency. The variety of affected modules and logic density (correlation binding/unbinding, logger name filtering, structured event emission across error paths) elevates complexity beyond simple pattern repetition.

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: wire all modules into observability system' clearly describes the main objective of the PR, which is integrating all modules into the observability/logging system as confirmed by the changeset.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering events, correlation, sink routing, module migration, bootstrap functionality, tests, and documentation updates that match the file changes.
Linked Issues check ✅ Passed The PR successfully addresses all major objectives from issue #91: structured event constants defined, async correlation decorator implemented, sink routing configured, five modules migrated to get_logger, ~10 modules instrumented with logging, bootstrap_logging wired into startup, comprehensive test coverage added, CLAUDE.md updated with logging conventions, and logging-audit agent added to PR review skill.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #91 objectives: event definitions, correlation/sink routing enhancements, logger migration and instrumentation, bootstrap wiring, test infrastructure, and governance documentation. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/wire-observability-system

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's observability by integrating a comprehensive structured logging system. It standardizes event naming, enables flexible log routing to different files based on module, and ensures correlation IDs are propagated across both synchronous and asynchronous operations. The changes involve migrating existing logging, adding new logging to previously unlogged modules, and bolstering the test suite to cover these new features, ultimately providing a more robust and debuggable system.

Highlights

  • Structured Event Constants: Introduced observability/events.py with approximately 30 structured event constants following a domain.noun.verb pattern to standardize log messages.
  • Asynchronous Correlation Decorator: Added with_correlation_async() to mirror the existing synchronous version, enabling correlation ID binding for async functions.
  • Configurable Log Sink Routing: Implemented _LoggerNameFilter and _SINK_ROUTING to route logs from specific modules (e.g., security, budget, engine) to dedicated log files.
  • Module Logging Migration: Migrated 5 existing modules (config/loader, templates/loader, templates/renderer, litellm_driver, mappers) from standard library logging.getLogger to structlog.get_logger.
  • New Structured Logging: Added structured logging to 5 modules that previously lacked it (providers/base, providers/registry, core/task, core/task_transitions, core/role_catalog).
  • Logging Bootstrap Function: Created bootstrap_logging() to integrate the logging configuration into the application's startup process.
  • Enhanced Test Coverage: Added over 40 new tests across 8 new test files and extended 6 existing test files, including integration tests for the new sink routing functionality.
  • Test Infrastructure Improvement: Fixed a structlog proxy caching issue in tests by preventing cache_logger_on_first_use to ensure accurate log capturing.
  • Documentation and Review Skill: Added a 'Logging' section to CLAUDE.md detailing mandatory conventions and introduced a logging-audit agent to the PR review skill.
Changelog
  • .claude/skills/aurelio-review-pr/skill.md
    • Added a logging-audit agent to the PR review toolkit to enforce new logging conventions.
  • CLAUDE.md
    • Added a new 'Logging' section detailing mandatory conventions for structured logging, logger naming, event constants, and error handling.
  • pyproject.toml
    • Added syrupy to the test dependencies.
  • src/ai_company/config/init.py
    • Exported the new bootstrap_logging function.
  • src/ai_company/config/loader.py
    • Removed standard logging import and replaced with ai_company.observability.get_logger.
    • Imported structured event constants for config lifecycle events.
    • Replaced unstructured log messages with structured logger.warning calls for parse and validation failures.
    • Added structured logger.debug for environment variable resolution and config overrides.
    • Added structured logger.info for config discovery and successful loading.
    • Introduced bootstrap_logging function to configure the observability pipeline after config loading.
  • src/ai_company/core/role_catalog.py
    • Imported get_logger and ROLE_LOOKUP_MISS event constant.
    • Initialized a structlog logger for the module.
    • Added structured logger.debug when a role lookup misses.
  • src/ai_company/core/task.py
    • Imported get_logger and TASK_STATUS_CHANGED event constant.
    • Initialized a structlog logger for the module.
    • Added structured logger.info when a task's status changes.
  • src/ai_company/core/task_transitions.py
    • Imported get_logger and event constants for task transition errors.
    • Initialized a structlog logger for the module.
    • Added structured logger.critical for configuration errors in task transitions.
    • Added structured logger.warning for invalid task transitions.
  • src/ai_company/observability/init.py
    • Exported the new with_correlation_async decorator.
  • src/ai_company/observability/correlation.py
    • Removed a TODO comment for with_correlation_async().
    • Updated type hints to include Coroutine.
    • Modified with_correlation() to raise TypeError if applied to an async function, directing users to with_correlation_async().
    • Implemented the with_correlation_async decorator for binding correlation IDs in asynchronous functions.
  • src/ai_company/observability/events.py
    • Added a new file defining structured event constants for various domains like config, provider, task, and template lifecycles.
  • src/ai_company/observability/sinks.py
    • Removed a TODO comment for logger name filters.
    • Defined _SINK_ROUTING dictionary to map log file paths to logger name prefixes for routing.
    • Implemented _LoggerNameFilter class to filter log records based on include/exclude prefixes.
    • Modified build_handler to attach _LoggerNameFilter to file handlers based on _SINK_ROUTING.
  • src/ai_company/providers/base.py
    • Imported get_logger and various PROVIDER event constants.
    • Initialized a structlog logger for the module.
    • Added structured logger.debug for provider call start and success.
    • Added structured logger.debug for stream start.
    • Added structured logger.error for invalid messages or blank models.
  • src/ai_company/providers/drivers/litellm_driver.py
    • Removed standard logging import and replaced with ai_company.observability.get_logger.
    • Imported various PROVIDER event constants.
    • Initialized a structlog logger for the module.
    • Added structured logger.debug for provider call start, success, and stream start/done.
    • Added structured logger.error for model not found errors.
    • Replaced unstructured debug log for stream chunk with structured event.
    • Added structured logger.warning for rate limit errors and logger.error for authentication errors, and logger.warning for connection errors.
    • Replaced unstructured debug log for retry-after parsing with structured event.
    • Replaced unstructured info/warning logs for LiteLLM model info with structured events.
    • Replaced unstructured warning logs for tool call argument truncation, incomplete tool calls, and argument parsing failures with structured events.
  • src/ai_company/providers/drivers/mappers.py
    • Removed standard logging import and replaced with ai_company.observability.get_logger.
    • Initialized a structlog logger for the module.
    • Replaced unstructured warning logs for unknown finish reasons, missing tool call functions, incomplete tool calls, and argument parsing failures with structured events.
  • src/ai_company/providers/registry.py
    • Imported get_logger and various PROVIDER event constants.
    • Initialized a structlog logger for the module.
    • Added structured logger.error when a provider driver is not registered.
    • Added structured logger.info when the provider registry is successfully built.
    • Added structured logger.error when a driver factory is missing.
    • Added structured logger.debug when a provider driver is instantiated.
  • src/ai_company/templates/loader.py
    • Removed standard logging import and replaced with ai_company.observability.get_logger.
    • Imported various TEMPLATE event constants.
    • Initialized a structlog logger for the module.
    • Replaced unstructured warning log for skipping invalid user templates with structured event.
    • Replaced unstructured exception log for built-in template defects with structured event.
    • Added structured logger.debug for template load start and success.
    • Added structured logger.error for unknown templates.
    • Replaced unstructured debug log for float conversion fallback with structured event.
  • src/ai_company/templates/renderer.py
    • Removed standard logging import and replaced with ai_company.observability.get_logger.
    • Imported various TEMPLATE event constants.
    • Initialized a structlog logger for the module.
    • Added structured logger.info for template render start and success.
    • Added structured logger.error for required template variables not provided.
    • Added structured logger.warning for Jinja2 rendering failures, YAML parsing errors, unknown personality presets, and validation errors.
  • tests/conftest.py
    • Added a patch to structlog.configure to force cache_logger_on_first_use=False during tests, addressing a caching issue with structlog proxies.
  • tests/integration/observability/test_sink_routing_integration.py
    • Added a new integration test file to verify log sink routing with real file handlers, including tests for security, budget/providers, engine/core, and error-level routing.
  • tests/unit/config/test_bootstrap_logging.py
    • Added a new unit test file for bootstrap_logging to ensure it correctly calls configure_logging with the appropriate configuration.
  • tests/unit/config/test_loader.py
    • Imported structlog and relevant CONFIG event constants.
    • Added unit tests to verify that CONFIG_LOADED, CONFIG_PARSE_FAILED, and CONFIG_VALIDATION_FAILED events are emitted correctly during config loading.
  • tests/unit/core/test_task.py
    • Imported structlog and TASK_STATUS_CHANGED event constant.
    • Added unit tests to verify that TASK_STATUS_CHANGED event is emitted on task status transitions.
  • tests/unit/core/test_task_transitions.py
    • Imported structlog and TASK_TRANSITION_INVALID event constant.
    • Added unit tests to verify that TASK_TRANSITION_INVALID event is emitted on invalid task status transitions.
  • tests/unit/observability/conftest.py
    • Imported MutableMapping for type hinting.
    • Added captured_logs fixture to capture structlog output for field-level assertions in tests.
  • tests/unit/observability/test_correlation_async.py
    • Added a new unit test file for with_correlation_async decorator, covering binding, unbinding, exception handling, return value preservation, and type checking.
  • tests/unit/observability/test_events.py
    • Added a new unit test file to validate the observability.events constants, ensuring they are strings, follow the domain.noun.verb pattern, are unique, and cover expected domains.
  • tests/unit/observability/test_sink_routing.py
    • Added a new unit test file for _LoggerNameFilter and _SINK_ROUTING, verifying correct filtering logic and routing table configuration.
  • tests/unit/providers/test_base_provider.py
    • Added a new unit test file for BaseCompletionProvider logging, verifying emission of PROVIDER_CALL_START, PROVIDER_CALL_SUCCESS, PROVIDER_STREAM_START, and PROVIDER_CALL_ERROR events.
  • tests/unit/providers/test_registry.py
    • Imported structlog and relevant PROVIDER event constants.
    • Added unit tests to verify that PROVIDER_REGISTRY_BUILT and PROVIDER_DRIVER_NOT_REGISTERED events are emitted correctly.
  • tests/unit/templates/test_renderer.py
    • Imported structlog and relevant TEMPLATE event constants.
    • Added unit tests to verify that TEMPLATE_RENDER_START and TEMPLATE_RENDER_SUCCESS events are emitted during template rendering.
  • uv.lock
    • Updated the uv.lock file to include the new syrupy dependency.
Activity
  • Aureliolo has implemented a comprehensive observability system, wiring all modules into structured logging.
  • Aureliolo has created observability/events.py with ~30 structured event constants.
  • Aureliolo has added with_correlation_async() decorator for async functions.
  • Aureliolo has implemented _LoggerNameFilter and _SINK_ROUTING for log routing.
  • Aureliolo has migrated 5 modules to structlog.get_logger.
  • Aureliolo has added structured logging to 5 modules that previously had none.
  • Aureliolo has added bootstrap_logging() function to wire configure_logging() into the startup path.
  • Aureliolo has added 40+ new tests across 8 new test files and 6 extended test files, including integration tests for sink routing.
  • Aureliolo has fixed a structlog proxy caching issue in test infrastructure.
  • Aureliolo has added a 'Logging' section to CLAUDE.md and a logging-audit agent to the PR review skill.
  • Aureliolo has verified zero raw logging.getLogger in app code (only in observability/setup.py).
  • Aureliolo has verified zero print() calls in app code.
  • Aureliolo has verified all 1380 tests pass (1332 unit + 48 integration).
  • Aureliolo has verified lint, format, and mypy are all clean.
  • Aureliolo has confirmed uv run ruff check src/ tests/ yields zero errors.
  • Aureliolo has confirmed uv run ruff format --check src/ tests/ shows all formatted.
  • Aureliolo has confirmed uv run mypy src/ tests/ yields zero errors.
  • Aureliolo has confirmed uv run pytest tests/ -m "unit or integration" --timeout=30 shows 1380 passed.
  • Aureliolo has verified rg "logging\.getLogger" src/ai_company/ --glob '!observability/setup.py' yields 0 matches.
  • Aureliolo is awaiting CI pipeline pass (lint + mypy + test + coverage).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/templates/test_renderer.py (1)

1-6: 🧹 Nitpick | 🔵 Trivial

Add file-level timeout marker.

The file is missing the pytest.mark.timeout(30) marker that other test files in this PR have. Per coding guidelines, tests should have a 30-second timeout.

🛠️ Proposed fix
 import pytest
 import structlog
 from pydantic import ValidationError
 
+pytestmark = pytest.mark.timeout(30)
+
 from ai_company.config.schema import RootConfig
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/templates/test_renderer.py` around lines 1 - 6, Add a file-level
pytest timeout marker to this test module by declaring a top-level pytestmark =
pytest.mark.timeout(30) so the entire file uses the 30-second timeout; the file
already imports pytest, so add the pytestmark assignment near the module
docstring/top imports in tests/unit/templates/test_renderer.py to apply to all
tests in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/config/loader.py`:
- Around line 206-209: Replace the hard-coded event name string in the
logger.warning call inside loader.py with a constant defined in events.py: add
CONFIG_LINE_MAP_COMPOSE_FAILED = "config.line_map.compose_failed" to
src/ai_company/observability/events.py and update the logger.warning call in
src/ai_company/config/loader.py to import and use
events.CONFIG_LINE_MAP_COMPOSE_FAILED (ensure the import is added and the
error=str(exc) kwarg remains unchanged).
- Around line 175-178: Add a constant CONFIG_YAML_NON_SCALAR_KEY =
"config.yaml.non_scalar_key" to ai_company.observability.events and update the
logger call in loader.py (the logger.debug invocation that currently passes the
hardcoded string) to use events.CONFIG_YAML_NON_SCALAR_KEY instead; ensure you
import the constant (or the events module) at the top of
src/ai_company/config/loader.py so the logger.debug call uses the constant name
rather than the literal string.

In `@src/ai_company/providers/drivers/litellm_driver.py`:
- Around line 137-142: The driver is re-emitting lifecycle events that
BaseCompletionProvider already emits; remove or stop emitting the duplicate
logger.debug calls that use PROVIDER_CALL_START, PROVIDER_CALL_SUCCESS, and
PROVIDER_STREAM_START in this driver (the debug call passing
provider=self._provider_name, model=model, message_count=len(messages) and the
similar calls at the other two locations referenced). Locate these logger.debug
invocations in the LiteLLM driver methods (the calls that include
PROVIDER_CALL_START/PROVIDER_CALL_SUCCESS/PROVIDER_STREAM_START symbols) and
either delete them or change them to non-lifecycle debug messages (or guard them
behind a driver-only flag) so only BaseCompletionProvider emits those lifecycle
events.

In `@src/ai_company/templates/loader.py`:
- Around line 434-435: Replace the string literal
"template.pass1.float_fallback" in loader.py with a centralized constant: add
TEMPLATE_PASS1_FLOAT_FALLBACK: str = "template.pass1.float_fallback" to
observability/events.py and import that constant into loader.py, then use
TEMPLATE_PASS1_FLOAT_FALLBACK where the literal is currently passed (preserve
the value=repr(value) usage); this keeps event names consistent and
audit-friendly across the codebase.

In `@src/ai_company/templates/renderer.py`:
- Around line 192-196: Replace the logger.warning calls used right before
exceptions are raised with logger.error so terminal render/validation failures
are logged at the correct severity; specifically change the call that logs
TEMPLATE_RENDER_JINJA2_ERROR (and the similar warning calls in the other
indicated branches) to logger.error with the same message parameters (e.g.,
source_name and error=str(exc)), and update all three occurrences mentioned
(including the blocks around TEMPLATE_RENDER_JINJA2_ERROR and the two other
analogous branches) so they consistently log errors before re-raising.

In `@tests/unit/config/test_bootstrap_logging.py`:
- Around line 35-41: Rename the test method test_is_idempotent to a name that
reflects its actual behavior (for example
test_forwards_each_call_to_configure_logging) or keep the name and add a
one-line docstring explaining it checks that bootstrap_logging forwards calls to
configure_logging; update references to the test method name in the file and
ensure it still patches ai_company.observability.configure_logging and asserts
mock_configure.call_count == 2 (symbols: bootstrap_logging, configure_logging,
test_is_idempotent).

In `@tests/unit/config/test_loader.py`:
- Around line 666-697: The TestLoaderLogging test class is missing the required
30s timeout marker; add the pytest timeout mark to the class (e.g., annotate
TestLoaderLogging with pytest.mark.timeout(30)) so all its methods
(test_config_loaded_event_on_success, test_config_parse_failed_event,
test_config_validation_failed_event) inherit the 30-second limit, ensuring each
test in that class uses the mandated timeout.

In `@tests/unit/observability/conftest.py`:
- Around line 72-76: The new captured_logs fixture is unused; update the tests
(test_task.py, test_renderer.py, test_task_transitions.py) that call
structlog.testing.capture_logs() directly to accept the captured_logs fixture
instead (replace local with parameter captured_logs) and use that list for
assertions, or if you prefer to keep current usage, add a brief
comment/docstring to the fixture explaining intended use so it isn’t mistaken
for dead code; refer to the captured_logs fixture name when making the
replacement.

In `@tests/unit/providers/test_registry.py`:
- Around line 291-313: Add a 30s per-test timeout for this test module by
applying pytest's timeout marker; for example, set a module-level pytestmark =
pytest.mark.timeout(30) (or decorate the TestRegistryLogging class with
`@pytest.mark.timeout`(30)) so both TestRegistryLogging.test_registry_built_event
and TestRegistryLogging.test_driver_not_registered_event run with a 30-second
timeout.

---

Outside diff comments:
In `@tests/unit/templates/test_renderer.py`:
- Around line 1-6: Add a file-level pytest timeout marker to this test module by
declaring a top-level pytestmark = pytest.mark.timeout(30) so the entire file
uses the 30-second timeout; the file already imports pytest, so add the
pytestmark assignment near the module docstring/top imports in
tests/unit/templates/test_renderer.py to apply to all tests in this file.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce4693c and 5fc2f9d.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (32)
  • .claude/skills/aurelio-review-pr/skill.md
  • CLAUDE.md
  • pyproject.toml
  • src/ai_company/config/__init__.py
  • src/ai_company/config/loader.py
  • src/ai_company/core/role_catalog.py
  • src/ai_company/core/task.py
  • src/ai_company/core/task_transitions.py
  • src/ai_company/observability/__init__.py
  • src/ai_company/observability/correlation.py
  • src/ai_company/observability/events.py
  • src/ai_company/observability/sinks.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/providers/registry.py
  • src/ai_company/templates/loader.py
  • src/ai_company/templates/renderer.py
  • tests/conftest.py
  • tests/integration/observability/__init__.py
  • tests/integration/observability/test_sink_routing_integration.py
  • tests/unit/config/test_bootstrap_logging.py
  • tests/unit/config/test_loader.py
  • tests/unit/core/test_task.py
  • tests/unit/core/test_task_transitions.py
  • tests/unit/observability/conftest.py
  • tests/unit/observability/test_correlation_async.py
  • tests/unit/observability/test_events.py
  • tests/unit/observability/test_sink_routing.py
  • tests/unit/providers/test_base_provider.py
  • tests/unit/providers/test_registry.py
  • tests/unit/templates/test_renderer.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
Add type hints to all public functions — mypy strict mode enforced
Use Google-style docstrings on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones — practice immutability
Use Pydantic v2 with BaseModel, model_validator, and ConfigDict for data models
Enforce 88 character line length — ruff configured
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly — never silently swallow exceptions
Validate at system boundaries — validate user input, external APIs, and config files explicitly

Files:

  • src/ai_company/observability/__init__.py
  • tests/unit/core/test_task.py
  • src/ai_company/config/__init__.py
  • tests/unit/observability/test_correlation_async.py
  • src/ai_company/core/task.py
  • tests/unit/observability/test_sink_routing.py
  • tests/integration/observability/test_sink_routing_integration.py
  • src/ai_company/templates/loader.py
  • src/ai_company/providers/registry.py
  • tests/unit/providers/test_base_provider.py
  • tests/unit/config/test_loader.py
  • tests/unit/templates/test_renderer.py
  • tests/unit/core/test_task_transitions.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/templates/renderer.py
  • src/ai_company/providers/base.py
  • src/ai_company/observability/sinks.py
  • tests/unit/observability/conftest.py
  • src/ai_company/config/loader.py
  • src/ai_company/providers/drivers/mappers.py
  • tests/unit/providers/test_registry.py
  • src/ai_company/core/role_catalog.py
  • src/ai_company/observability/events.py
  • src/ai_company/core/task_transitions.py
  • tests/conftest.py
  • tests/unit/config/test_bootstrap_logging.py
  • src/ai_company/observability/correlation.py
  • tests/unit/observability/test_events.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Maintain minimum 80% code coverage — enforced in CI

Files:

  • src/ai_company/observability/__init__.py
  • src/ai_company/config/__init__.py
  • src/ai_company/core/task.py
  • src/ai_company/templates/loader.py
  • src/ai_company/providers/registry.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/templates/renderer.py
  • src/ai_company/providers/base.py
  • src/ai_company/observability/sinks.py
  • src/ai_company/config/loader.py
  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/core/role_catalog.py
  • src/ai_company/observability/events.py
  • src/ai_company/core/task_transitions.py
  • src/ai_company/observability/correlation.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow markers on all tests
Set test timeout to 30 seconds per test
Use parallel test execution with pytest-xdist via -n auto flag

Files:

  • tests/unit/core/test_task.py
  • tests/unit/observability/test_correlation_async.py
  • tests/unit/observability/test_sink_routing.py
  • tests/integration/observability/test_sink_routing_integration.py
  • tests/unit/providers/test_base_provider.py
  • tests/unit/config/test_loader.py
  • tests/unit/templates/test_renderer.py
  • tests/unit/core/test_task_transitions.py
  • tests/unit/observability/conftest.py
  • tests/unit/providers/test_registry.py
  • tests/conftest.py
  • tests/unit/config/test_bootstrap_logging.py
  • tests/unit/observability/test_events.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Pin all dependency versions using == in pyproject.toml
Organize dependencies into groups: test (pytest + plugins), dev (includes test + ruff, mypy, pre-commit, commitizen, pydantic) with uv sync installing all

Files:

  • pyproject.toml
🧠 Learnings (19)
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/integration/test_*.py : Place integration tests for component interactions in `tests/integration/` directory

Applied to files:

  • tests/integration/observability/test_sink_routing_integration.py
📚 Learning: 2026-03-01T18:03:07.755Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T18:03:07.755Z
Learning: Applies to pyproject.toml : Organize dependencies into groups: `test` (pytest + plugins), `dev` (includes test + ruff, mypy, pre-commit, commitizen, pydantic) with `uv sync` installing all

Applied to files:

  • pyproject.toml
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All functions and methods should have appropriate logging for debugging and traceability. Use `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/**/*.py : All functions and methods should have appropriate logging using `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Use logging levels: debug (routine), info (significant), warning (unexpected but recoverable), error (failures)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Logs must be written to `output/logs/story_factory.log` using `logger = logging.getLogger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Log all significant operations using Python logging with appropriate levels: debug for detailed info, info for successes, warning for retries, error for failures

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`), CI enforces this coverage requirement

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit. The CI enforces 100% coverage on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`).

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/unit/observability/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/unit/observability/conftest.py
  • tests/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/observability/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/observability/conftest.py
  • tests/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

  • tests/unit/observability/conftest.py
🧬 Code graph analysis (20)
src/ai_company/observability/__init__.py (1)
src/ai_company/observability/correlation.py (1)
  • with_correlation_async (154-211)
tests/unit/core/test_task.py (1)
src/ai_company/core/enums.py (1)
  • TaskStatus (122-143)
src/ai_company/config/__init__.py (1)
src/ai_company/config/loader.py (1)
  • bootstrap_logging (482-497)
tests/unit/observability/test_correlation_async.py (2)
src/ai_company/observability/correlation.py (1)
  • with_correlation_async (154-211)
src/ai_company/providers/drivers/litellm_driver.py (1)
  • update (644-663)
src/ai_company/core/task.py (1)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
tests/unit/observability/test_sink_routing.py (1)
src/ai_company/observability/sinks.py (2)
  • _LoggerNameFilter (34-66)
  • filter (57-66)
tests/integration/observability/test_sink_routing_integration.py (3)
src/ai_company/observability/config.py (2)
  • LogConfig (115-192)
  • SinkConfig (50-112)
src/ai_company/observability/enums.py (2)
  • LogLevel (6-17)
  • SinkType (32-41)
src/ai_company/observability/setup.py (1)
  • configure_logging (159-195)
src/ai_company/templates/loader.py (2)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/templates/errors.py (3)
  • TemplateNotFoundError (10-11)
  • TemplateRenderError (14-20)
  • TemplateValidationError (23-59)
src/ai_company/providers/registry.py (2)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/providers/errors.py (2)
  • DriverFactoryNotFoundError (159-162)
  • DriverNotRegisteredError (143-146)
tests/unit/providers/test_base_provider.py (2)
src/ai_company/providers/base.py (6)
  • BaseCompletionProvider (35-303)
  • _do_complete (148-166)
  • _do_stream (169-189)
  • _do_get_model_capabilities (192-201)
  • complete (51-91)
  • stream (93-128)
src/ai_company/providers/errors.py (1)
  • InvalidRequestError (113-116)
tests/unit/templates/test_renderer.py (1)
src/ai_company/templates/renderer.py (1)
  • render_template (49-104)
tests/unit/core/test_task_transitions.py (2)
src/ai_company/core/enums.py (1)
  • TaskStatus (122-143)
src/ai_company/core/task_transitions.py (1)
  • validate_transition (56-90)
src/ai_company/templates/renderer.py (1)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/providers/base.py (1)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/config/loader.py (3)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/config/schema.py (1)
  • RootConfig (204-328)
src/ai_company/observability/setup.py (1)
  • configure_logging (159-195)
src/ai_company/providers/drivers/mappers.py (3)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/providers/enums.py (2)
  • FinishReason (15-22)
  • MessageRole (6-12)
src/ai_company/providers/models.py (3)
  • ChatMessage (114-186)
  • ToolCall (73-95)
  • ToolDefinition (45-70)
tests/unit/providers/test_registry.py (2)
src/ai_company/config/schema.py (2)
  • ProviderConfig (52-95)
  • ProviderModelConfig (17-49)
src/ai_company/providers/errors.py (1)
  • DriverNotRegisteredError (143-146)
src/ai_company/core/role_catalog.py (2)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/providers/registry.py (1)
  • get (66-94)
src/ai_company/core/task_transitions.py (1)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
tests/unit/config/test_bootstrap_logging.py (3)
src/ai_company/config/loader.py (1)
  • bootstrap_logging (482-497)
src/ai_company/config/schema.py (1)
  • RootConfig (204-328)
src/ai_company/observability/config.py (1)
  • LogConfig (115-192)
🔇 Additional comments (31)
pyproject.toml (1)

43-43: Good test dependency addition with correct pinning.

syrupy==5.1.0 is correctly pinned and placed in the test dependency group, which aligns with the repository dependency conventions.

As per coding guidelines, pyproject.toml dependencies should be pinned with == and organized into test/dev groups.

tests/conftest.py (1)

6-27: LGTM! Effective workaround for structlog proxy caching in tests.

The approach of forcing cache_logger_on_first_use=False globally ensures capture_logs() works reliably across all tests. The default-argument pattern for capturing _original is a valid Python idiom.

Minor observation: the _original: object type hint is imprecise (a Callable type would be more accurate), but the # type: ignore comments appropriately handle mypy.

CLAUDE.md (1)

61-72: LGTM! Clear and actionable logging governance.

The new Logging section provides comprehensive guidance that aligns with the PR's observability goals:

  • Mandatory get_logger usage with consistent logger naming
  • Event constants from ai_company.observability.events
  • Structured kwargs pattern
  • Clear level guidelines (DEBUG/INFO/WARNING/ERROR)

This supersedes any prior guidance about stdlib logging.getLogger usage. Based on learnings, documentation updates like this are expected when making architectural changes.

src/ai_company/observability/sinks.py (2)

22-67: LGTM! Well-designed logger routing with flexible filtering.

The _LoggerNameFilter class correctly implements prefix-based filtering with exclude-before-include precedence. The _SINK_ROUTING mapping cleanly separates concerns:

  • audit.log ← security logs
  • cost_usage.log ← budget and provider logs
  • agent_activity.log ← engine and core logs

The exclude_prefixes parameter is unused currently but provides forward compatibility for more complex routing scenarios.


164-169: LGTM! Filter attachment integrated correctly.

The filter is appropriately attached only to FILE sinks that have routing configured, leaving other sinks (like console or catch-all files) unfiltered.

src/ai_company/config/loader.py (1)

482-497: LGTM! Clean bootstrap entry point.

The bootstrap_logging function provides a clean public API for initializing observability after config loading. The lazy import of configure_logging appropriately avoids potential circular import issues between config and observability modules.

src/ai_company/observability/__init__.py (1)

30-31: LGTM! Public API surface correctly extended.

The with_correlation_async decorator is properly imported and added to __all__, making it available for async correlation tracking across the codebase.

Also applies to: 52-52

src/ai_company/core/task.py (2)

12-15: LGTM! Proper observability setup.

Logger is correctly instantiated using the observability module with __name__ and standard logger variable name.


223-230: LGTM! State transition logging correctly implemented.

The TASK_STATUS_CHANGED event is logged at INFO level after successful validation, with all relevant context (task_id, from_status, to_status). This follows the CLAUDE.md guideline that "all state transitions must log at INFO".

src/ai_company/core/role_catalog.py (2)

14-17: LGTM! Proper observability setup.

Logger correctly instantiated with standard conventions.


423-426: LGTM! Appropriate DEBUG-level logging for lookup misses.

Logging ROLE_LOOKUP_MISS at DEBUG level is appropriate since returning None for an unknown role is expected behavior (not an error). This contrasts correctly with providers/registry.py which uses ERROR level because it raises an exception.

tests/unit/config/test_bootstrap_logging.py (1)

11-11: LGTM! Correct timeout configuration.

Module-level timeout marker correctly set to 30 seconds per coding guidelines.

src/ai_company/config/__init__.py (1)

32-32: LGTM!

The bootstrap_logging function is correctly imported and exposed in the public API. The __all__ list maintains alphabetical ordering.

Also applies to: 58-58

tests/unit/core/test_task.py (1)

472-482: LGTM!

The logging test correctly uses structlog.testing.capture_logs() to verify the TASK_STATUS_CHANGED event is emitted with the expected fields (task_id, from_status, to_status). Test marker and timeout are properly configured via the file-level pytestmark.

.claude/skills/aurelio-review-pr/skill.md (1)

95-103: LGTM!

The logging-audit agent is well-defined with clear trigger conditions and specific violation checks that align with the observability conventions established in this PR. The severity categorization (CRITICAL for hard violations like logging.getLogger/print(), MAJOR for style issues) is appropriate.

tests/unit/templates/test_renderer.py (1)

201-210: LGTM!

The logging test correctly verifies that render_template emits both TEMPLATE_RENDER_START and TEMPLATE_RENDER_SUCCESS events. The use of structlog.testing.capture_logs() is appropriate for this assertion pattern.

tests/unit/core/test_task_transitions.py (1)

142-153: LGTM!

The test correctly verifies that TASK_TRANSITION_INVALID is emitted when an invalid transition is attempted. The nested context manager pattern with structlog.testing.capture_logs() and pytest.raises() is clean and appropriate.

src/ai_company/core/task_transitions.py (2)

18-24: LGTM!

The module correctly uses get_logger(__name__) for the module-level logger and imports the appropriate event constants. This follows the observability conventions established in this PR.


68-84: LGTM!

The logging instrumentation uses appropriate log levels:

  • critical for TASK_TRANSITION_CONFIG_ERROR (indicates a bug in the transition map)
  • warning for TASK_TRANSITION_INVALID (indicates invalid user input, recoverable)

Structured kwargs (current_status, target_status, allowed) provide useful context for debugging.

src/ai_company/observability/correlation.py (2)

154-211: LGTM!

The with_correlation_async decorator correctly mirrors the synchronous with_correlation pattern:

  • Validates the decorated function is a coroutine function
  • Binds correlation IDs before execution via bound_contextvars context manager
  • Properly awaits the wrapped coroutine
  • IDs are automatically unbound when the context manager exits

The type signature using Callable[_P, Coroutine[object, object, _T]] is appropriate for async function decorators.


132-132: LGTM!

The error message update correctly guides users to use with_correlation_async() for async functions.

tests/unit/observability/conftest.py (1)

8-10: No issue — TYPE_CHECKING imports are correct for Python 3.14.

With Python 3.14's native PEP 649 lazy annotation evaluation, using Iterator and MutableMapping under TYPE_CHECKING while referencing them in type hints is the correct approach. Annotations are not evaluated at runtime, so names from TYPE_CHECKING blocks are safe to use. The coding guidelines explicitly endorse this pattern: "Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations."

Likely an incorrect or invalid review comment.

tests/unit/observability/test_sink_routing.py (1)

25-89: Good coverage for routing/filter behavior.

These tests exercise the key acceptance/rejection paths and expected sink-prefix mappings clearly.

tests/unit/providers/test_base_provider.py (1)

89-127: Logging behavior tests are well-targeted.

The new cases verify success/start/error event emission on key base-provider paths with clear assertions.

tests/unit/observability/test_events.py (1)

26-64: Strong guardrails for observability event taxonomy.

Pattern, uniqueness, and baseline-count checks provide good regression protection for event constants.

src/ai_company/providers/registry.py (1)

80-89: Registry observability wiring looks good.

The added events capture the critical lifecycle/error points with actionable context fields.

Also applies to: 155-160, 182-187, 221-226

tests/unit/observability/test_correlation_async.py (1)

11-70: Decorator contract coverage is solid.

The suite validates context binding lifecycle, exception cleanup, and sync-function rejection comprehensively.

src/ai_company/providers/drivers/mappers.py (1)

12-17: Logging migration and warning instrumentation look correct.

The new logger integration and structured warning payloads keep behavior stable while improving observability.

Also applies to: 118-121, 147-150, 158-162, 189-192

src/ai_company/providers/base.py (1)

13-19: Structured provider observability wiring looks good.

The new lifecycle and validation-error events are consistently emitted with useful context and preserve existing control flow.

Also applies to: 32-33, 76-91, 118-123, 276-296

tests/integration/observability/test_sink_routing_integration.py (1)

15-147: Good end-to-end sink routing coverage.

These integration cases validate namespace routing and sink-level severity filtering with real file handlers.

src/ai_company/observability/events.py (1)

1-63: Event catalog is well-structured and clear.

The grouped constants improve consistency and make log instrumentation easier to audit.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent and thorough pull request that introduces a robust observability system based on structlog. The use of a central events.py for event constants, sink routing for different log types, and correlation ID tracking are all best practices that will significantly improve the application's maintainability and debuggability. The migration of existing modules and the extensive test suite are also very well done.

My review focuses on ensuring full adherence to the new logging conventions you've introduced. I've found a few places where bare strings are still used for log event names instead of constants from events.py. Addressing these will make the implementation fully consistent with the new standards defined in CLAUDE.md.

Comment on lines 175 to 178
logger.debug(
"Skipping non-scalar YAML key: %s",
type(key_node).__name__,
"config.yaml.non_scalar_key",
key_type=type(key_node).__name__,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file has a couple of log calls using bare strings for event names, which goes against the new logging conventions. Please define constants in observability/events.py for these and use them instead.

  • "config.yaml.non_scalar_key" (L176)
  • "config.line_map.compose_failed" (L207)

For example, for the first case, you could add CONFIG_YAML_NON_SCALAR_KEY: str = "config.yaml.non_scalar_key" to events.py, import it, and change the call to:

logger.debug(
    CONFIG_YAML_NON_SCALAR_KEY,
    key_type=type(key_node).__name__,
)

Comment on lines 429 to 431
if delta is None:
_logger.debug("Stream chunk has choices but no delta, skipping")
logger.debug("provider.stream.chunk_no_delta")
return result
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file contains several log calls that use bare strings for event names, which violates the new logging convention to use constants from observability/events.py.

Please convert the following bare strings into constants in events.py and use them here:

  • provider.stream.chunk_no_delta (L430)
  • provider.retry_after.parse_failed (L541)
  • provider.model_info.unavailable (L562)
  • provider.model_info.unexpected_error (L568)
  • provider.tool_call.arguments_truncated (L658)
  • provider.tool_call.incomplete (L673)
  • provider.tool_call.arguments_parse_failed (L683)

For example, logger.debug("provider.stream.chunk_no_delta") should become logger.debug(PROVIDER_STREAM_CHUNK_NO_DELTA) after defining the constant.

Comment on lines +118 to 121
logger.warning(
"provider.finish_reason.unknown",
reason=reason,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file also has several log calls using bare strings instead of event constants. This is a good example of why constants are valuable, as some of these strings (e.g., provider.tool_call.incomplete) are also used in litellm_driver.py.

Please convert these to constants in observability/events.py:

  • provider.finish_reason.unknown (L119)
  • provider.tool_call.missing_function (L147)
  • provider.tool_call.incomplete (L158)
  • provider.tool_call.arguments_parse_failed (L189)

Using a shared constant like PROVIDER_TOOL_CALL_INCOMPLETE would ensure consistency across both modules.

Comment on lines 433 to 436
logger.debug(
"Cannot convert %r to float in Pass 1 "
"(may be a Jinja2 placeholder), using 0.0",
value,
"template.pass1.float_fallback",
value=repr(value),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This log call uses a bare string "template.pass1.float_fallback" for the event name. To follow the new logging conventions, this should be defined as a constant in observability/events.py.

For example, you could add TEMPLATE_PASS1_FLOAT_FALLBACK: str = "template.pass1.float_fallback" to events.py, import it, and then change this call to:

logger.debug(
    TEMPLATE_PASS1_FLOAT_FALLBACK,
    value=repr(value),
)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires application modules into the existing ai_company.observability stack by standardizing on get_logger(__name__), introducing a shared catalog of structured event-name constants, adding async correlation support, and implementing sink routing based on logger-name prefixes—plus extensive test coverage to validate the behavior end-to-end.

Changes:

  • Added ai_company.observability.events constants and expanded instrumentation across config/templates/providers/core modules using structlog.
  • Implemented async correlation via with_correlation_async() and added logger-name-based sink routing (_LoggerNameFilter + _SINK_ROUTING).
  • Added/expanded unit + integration tests for event constants, correlation, sink routing, and newly introduced log emission.

Reviewed changes

Copilot reviewed 31 out of 33 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
uv.lock Adds syrupy to lockfile for test tooling.
pyproject.toml Adds syrupy to the test dependency group.
CLAUDE.md Introduces mandatory logging conventions for the repo.
.claude/skills/aurelio-review-pr/skill.md Updates PR review skill to include a logging-audit agent/checklist.
tests/conftest.py Patches structlog configure behavior during tests to avoid proxy caching issues.
tests/unit/templates/test_renderer.py Adds assertions for template renderer start/success log events.
tests/unit/providers/test_registry.py Adds assertions for provider registry build/error log events.
tests/unit/providers/test_base_provider.py New tests for BaseCompletionProvider logging behavior.
tests/unit/observability/test_sink_routing.py New unit tests for logger-name filters and routing table.
tests/unit/observability/test_events.py New tests enforcing event-name format, uniqueness, and presence.
tests/unit/observability/test_correlation_async.py New tests for async correlation decorator behavior.
tests/unit/observability/conftest.py Adds a captured_logs fixture for field-level structlog assertions.
tests/unit/core/test_task_transitions.py Adds test ensuring invalid transitions emit the expected log event.
tests/unit/core/test_task.py Adds test ensuring task status transitions emit the expected log event.
tests/unit/config/test_loader.py Adds tests asserting config loader emits success/parse/validation log events.
tests/unit/config/test_bootstrap_logging.py Adds tests for the new bootstrap_logging() startup hook.
tests/integration/observability/test_sink_routing_integration.py Integration tests validating routing behavior with real file handlers.
tests/integration/observability/init.py Marks/creates the integration test package.
src/ai_company/templates/renderer.py Migrates to get_logger + adds structured template render lifecycle logging.
src/ai_company/templates/loader.py Migrates to get_logger + adds structured template load/list logging.
src/ai_company/providers/registry.py Adds structured logging for registry build/lookup/factory failures.
src/ai_company/providers/drivers/mappers.py Migrates to get_logger and logs mapping edge cases in structured form.
src/ai_company/providers/drivers/litellm_driver.py Migrates to get_logger and adds provider call/stream/error structured logs.
src/ai_company/providers/base.py Adds baseline structured logging for provider call/stream entry + validation errors.
src/ai_company/observability/sinks.py Implements logger-name-based sink routing via handler filters.
src/ai_company/observability/events.py New module defining structured event-name constants.
src/ai_company/observability/correlation.py Adds with_correlation_async() and updates messaging for async misuse.
src/ai_company/observability/init.py Re-exports with_correlation_async.
src/ai_company/core/task_transitions.py Adds structured logging for invalid transitions and config errors.
src/ai_company/core/task.py Adds structured logging for task status transitions.
src/ai_company/core/role_catalog.py Adds structured logging for role lookup misses.
src/ai_company/config/loader.py Migrates to get_logger, adds structured config lifecycle logs, and adds bootstrap_logging().
src/ai_company/config/init.py Re-exports bootstrap_logging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 206 to 209
logger.warning(
"Failed to compose YAML AST for line mapping; "
"validation errors will lack line/column information: %s",
exc,
"config.line_map.compose_failed",
error=str(exc),
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log call uses a string literal event name ("config.line_map.compose_failed") instead of an ai_company.observability.events constant, which conflicts with the logging conventions added in this PR. Consider adding an event constant for this case and using it here.

Copilot uses AI. Check for mistakes.
Comment on lines +181 to +186
logger.debug(
PROVIDER_STREAM_START,
provider=self._provider_name,
model=model,
message_count=len(messages),
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same duplication issue as _do_complete(): BaseCompletionProvider.stream() now emits PROVIDER_STREAM_START, and this driver emits it again. Consider logging this event in only one place (or use a driver-specific event) to avoid duplicate entries.

Suggested change
logger.debug(
PROVIDER_STREAM_START,
provider=self._provider_name,
model=model,
message_count=len(messages),
)

Copilot uses AI. Check for mistakes.
Comment on lines +118 to 121
logger.warning(
"provider.finish_reason.unknown",
reason=reason,
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module logs events using string literals (e.g. "provider.finish_reason.unknown") instead of constants from ai_company.observability.events, which is now documented as mandatory in CLAUDE.md. Consider adding event constants for these mapper warnings and using them here so event naming stays consistent across providers.

Copilot uses AI. Check for mistakes.
Comment on lines +434 to +435
"template.pass1.float_fallback",
value=repr(value),
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log call uses a string literal event name ("template.pass1.float_fallback") rather than an ai_company.observability.events constant, which conflicts with the logging conventions introduced in this PR. Consider adding a corresponding template event constant and referencing it here.

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +9
def _patched_configure(
_original: object = structlog.configure,
**kwargs: object,
) -> None:
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global monkeypatch of structlog.configure only accepts **kwargs, which can break callers that pass positional args to structlog.configure(...) (it’s part of structlog’s public API). Consider accepting *args, **kwargs and forwarding both to the original function, while still forcing cache_logger_on_first_use=False.

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +76
@pytest.fixture
def captured_logs() -> Iterator[list[MutableMapping[str, Any]]]:
"""Capture structlog output as list of dicts for field-level assertions."""
with structlog.testing.capture_logs() as cap:
yield cap
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MutableMapping is imported only under TYPE_CHECKING, but it’s referenced in the runtime return annotation of captured_logs(). With Python 3.14 lazy annotations (and no __future__.annotations), evaluating annotations can raise NameError. Import MutableMapping at runtime (with # noqa: TC003 if needed) or use a string annotation.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +26
if TYPE_CHECKING:
from collections.abc import AsyncIterator

from ai_company.providers.capabilities import ModelCapabilities
from ai_company.providers.models import (
ChatMessage,
CompletionConfig,
CompletionResponse,
StreamChunk,
ToolDefinition,
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Types used in annotations (AsyncIterator, ModelCapabilities, StreamChunk, etc.) are imported only under TYPE_CHECKING. In this repo (Python 3.14, no __future__.annotations), evaluating annotations can raise NameError. Prefer runtime imports with # noqa: TC003 (as used elsewhere in tests) or string annotations.

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +73
return
yield # make it an async generator
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This async generator helper uses return before yield, leaving an unreachable yield solely to force generator-ness. This is hard to read and can trip “unreachable code” tooling. Consider using a clearer pattern (e.g., if False: yield ...) or yielding nothing without unreachable statements.

Suggested change
return
yield # make it an async generator
# This async generator intentionally yields no items.
if False:
yield # make it an async generator

Copilot uses AI. Check for mistakes.
Comment on lines 132 to 136
except TemplateRenderError, TemplateValidationError, OSError:
logger.exception(
"Built-in template %r is invalid (packaging defect)",
name,
TEMPLATE_BUILTIN_DEFECT,
template_name=name,
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This except TemplateRenderError, TemplateValidationError, OSError: clause is invalid Python 3 syntax and will raise a SyntaxError at import time. Use an exception tuple (and consider capturing as exc if you want to log details) so list_templates() can run.

Copilot uses AI. Check for mistakes.
Comment on lines 175 to 178
logger.debug(
"Skipping non-scalar YAML key: %s",
type(key_node).__name__,
"config.yaml.non_scalar_key",
key_type=type(key_node).__name__,
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These log calls use a bare string event name, but this PR’s logging conventions (CLAUDE.md ## Logging) require using constants from ai_company.observability.events. Add a corresponding constant (or reuse an existing one) and reference it here for consistency/auditability.

Copilot uses AI. Check for mistakes.
- Add 14 missing event constants to events.py and replace all bare
  string event names in litellm_driver, mappers, config/loader, and
  templates/loader
- Upgrade all event constants from str to Final[str]
- Remove duplicate lifecycle events from LiteLLMDriver that
  BaseCompletionProvider already emits
- Narrow _map_exception return type from Exception to ProviderError
- Add import-time exhaustiveness check for VALID_TRANSITIONS
- Fix outdated with_correlation docstring to reference
  with_correlation_async
- Add bootstrap_logging to config/__init__.py autosummary
- Elevate terminal render failures to logger.exception
- Add error logging in BaseCompletionProvider for _do_complete/_do_stream
- Make _SINK_ROUTING immutable with MappingProxyType
- Add input validation for _LoggerNameFilter prefixes
- Add timeout markers to TestLoaderLogging, TestRegistryLogging, and
  test_renderer.py
- Add integration test cleanup fixture for configure_logging isolation
- Rename test_is_idempotent to test_forwards_each_call_to_configure_logging
- Document captured_logs fixture with usage example
- Remove unused syrupy dependency
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ai_company/observability/correlation.py (1)

136-146: 🧹 Nitpick | 🔵 Trivial

Consider extracting the bindings logic to reduce duplication.

Both decorators build the same bindings dict. A small helper could DRY this up:

♻️ Optional refactor
def _build_bindings(
    request_id: str | None,
    task_id: str | None,
    agent_id: str | None,
) -> dict[str, str]:
    bindings: dict[str, str] = {}
    if request_id is not None:
        bindings["request_id"] = request_id
    if task_id is not None:
        bindings["task_id"] = task_id
    if agent_id is not None:
        bindings["agent_id"] = agent_id
    return bindings

Then both wrappers just call _build_bindings(request_id, task_id, agent_id).

Also applies to: 196-206

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/observability/correlation.py` around lines 136 - 146, The two
decorator inner functions (wrapper) duplicate the logic that constructs the
bindings dict before calling structlog.contextvars.bound_contextvars; extract
that logic into a small helper function named _build_bindings(request_id: str |
None, task_id: str | None, agent_id: str | None) -> dict[str, str] that returns
the bindings dict, then replace the duplicated block in each wrapper with a call
to _build_bindings(request_id, task_id, agent_id) and pass its result into
structlog.contextvars.bound_contextvars; keep the existing keys ("request_id",
"task_id", "agent_id") and the None checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/observability/sinks.py`:
- Around line 64-73: The PrefixFilter.filter correctly gives exclude precedence,
but build_handler is only passing include_prefixes and never wires up
exclude_prefixes; update build_handler to pass the exclude_prefixes argument
into PrefixFilter (or, if omission was intentional, add a brief comment in
build_handler explaining exclude_prefixes is intentionally unused for future
extensibility). Locate the PrefixFilter.filter implementation and the
build_handler function to either forward the exclude_prefixes list into
PrefixFilter(...) or annotate build_handler so readers know the omission is
intentional.

In `@src/ai_company/providers/drivers/litellm_driver.py`:
- Around line 669-674: The warning for PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED
currently includes an args_preview (self.arguments[:200]) which may leak
sensitive content; update the logger.warning call in litellm_driver (the call
site using logger.warning, PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED, self.name,
self.id, self.arguments) to remove args_preview entirely and only emit metadata
(e.g., tool_name=self.name and tool_id=self.id) or a non-sensitive boolean flag
indicating presence of arguments, ensuring no argument content is included in
the log payload.

In `@src/ai_company/providers/drivers/mappers.py`:
- Around line 153-156: The logger.warning call using
PROVIDER_TOOL_CALL_MISSING_FUNCTION currently logs raw payload via
item=repr(item), which can leak sensitive user/tool arguments; change both
occurrences (the call around PROVIDER_TOOL_CALL_MISSING_FUNCTION and the similar
logger usage in the 198-208 block) to log only structural metadata: replace
repr(item) with a small metadata object like {"type": type(item).__name__,
"length": len(item) if hasattr(item, "__len__") else None, "is_bytes":
isinstance(item, (bytes, bytearray))} and include a clear message via
logger.warning(PROVIDER_TOOL_CALL_MISSING_FUNCTION, item_meta=metadata) so logs
avoid content previews while preserving useful debugging info.

In `@src/ai_company/templates/loader.py`:
- Around line 163-164: The start-event is logged with the raw template name
(logger.debug(TEMPLATE_LOAD_START, template_name=name)) while success/failure
events use the normalized key (name_clean), causing split metrics; change the
start-event to log the normalized key by trimming and lowercasing the name first
(i.e., compute name_clean = name.strip().lower() before calling logger.debug for
TEMPLATE_LOAD_START) so all events use the same template_name value.

In `@tests/integration/observability/conftest.py`:
- Around line 13-32: The _clear_logging_state() helper and _reset_logging()
fixture are duplicated in tests/integration/observability/conftest.py and
tests/unit/observability/conftest.py; extract _clear_logging_state into a single
shared test helper (e.g., tests/conftest.py or tests/observability_helpers.py)
and update both observability conftest files to import and call that shared
_clear_logging_state, keeping the local _reset_logging() fixture as-is but
delegating to the imported helper to remove duplication.

---

Outside diff comments:
In `@src/ai_company/observability/correlation.py`:
- Around line 136-146: The two decorator inner functions (wrapper) duplicate the
logic that constructs the bindings dict before calling
structlog.contextvars.bound_contextvars; extract that logic into a small helper
function named _build_bindings(request_id: str | None, task_id: str | None,
agent_id: str | None) -> dict[str, str] that returns the bindings dict, then
replace the duplicated block in each wrapper with a call to
_build_bindings(request_id, task_id, agent_id) and pass its result into
structlog.contextvars.bound_contextvars; keep the existing keys ("request_id",
"task_id", "agent_id") and the None checks.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fc2f9d and 1a304cb.

📒 Files selected for processing (19)
  • src/ai_company/config/__init__.py
  • src/ai_company/config/loader.py
  • src/ai_company/core/task_transitions.py
  • src/ai_company/observability/correlation.py
  • src/ai_company/observability/events.py
  • src/ai_company/observability/sinks.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/templates/loader.py
  • src/ai_company/templates/renderer.py
  • tests/conftest.py
  • tests/integration/observability/conftest.py
  • tests/unit/config/test_bootstrap_logging.py
  • tests/unit/config/test_loader.py
  • tests/unit/observability/conftest.py
  • tests/unit/providers/test_base_provider.py
  • tests/unit/providers/test_registry.py
  • tests/unit/templates/test_renderer.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
Add type hints to all public functions — mypy strict mode enforced
Use Google-style docstrings on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones — practice immutability
Use Pydantic v2 with BaseModel, model_validator, and ConfigDict for data models
Enforce 88 character line length — ruff configured
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly — never silently swallow exceptions
Validate at system boundaries — validate user input, external APIs, and config files explicitly

Files:

  • src/ai_company/observability/correlation.py
  • src/ai_company/templates/loader.py
  • src/ai_company/providers/base.py
  • tests/unit/providers/test_registry.py
  • tests/integration/observability/conftest.py
  • src/ai_company/config/loader.py
  • tests/unit/config/test_loader.py
  • src/ai_company/core/task_transitions.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/templates/renderer.py
  • tests/unit/providers/test_base_provider.py
  • tests/conftest.py
  • src/ai_company/observability/sinks.py
  • tests/unit/config/test_bootstrap_logging.py
  • tests/unit/observability/conftest.py
  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/config/__init__.py
  • tests/unit/templates/test_renderer.py
  • src/ai_company/observability/events.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Maintain minimum 80% code coverage — enforced in CI

Files:

  • src/ai_company/observability/correlation.py
  • src/ai_company/templates/loader.py
  • src/ai_company/providers/base.py
  • src/ai_company/config/loader.py
  • src/ai_company/core/task_transitions.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/templates/renderer.py
  • src/ai_company/observability/sinks.py
  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/config/__init__.py
  • src/ai_company/observability/events.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow markers on all tests
Set test timeout to 30 seconds per test
Use parallel test execution with pytest-xdist via -n auto flag

Files:

  • tests/unit/providers/test_registry.py
  • tests/integration/observability/conftest.py
  • tests/unit/config/test_loader.py
  • tests/unit/providers/test_base_provider.py
  • tests/conftest.py
  • tests/unit/config/test_bootstrap_logging.py
  • tests/unit/observability/conftest.py
  • tests/unit/templates/test_renderer.py
🧠 Learnings (14)
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All functions and methods should have appropriate logging for debugging and traceability. Use `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.

Applied to files:

  • src/ai_company/templates/loader.py
  • src/ai_company/config/loader.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/templates/renderer.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/**/*.py : All functions and methods should have appropriate logging using `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.

Applied to files:

  • src/ai_company/templates/loader.py
  • src/ai_company/config/loader.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/templates/renderer.py
📚 Learning: 2026-03-01T18:03:07.755Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T18:03:07.755Z
Learning: Applies to tests/**/*.py : Set test timeout to 30 seconds per test

Applied to files:

  • tests/unit/providers/test_registry.py
  • tests/unit/config/test_loader.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/integration/observability/conftest.py
  • tests/conftest.py
  • tests/unit/observability/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/integration/observability/conftest.py
  • tests/unit/observability/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/integration/observability/conftest.py
  • tests/unit/observability/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/integration/observability/conftest.py
  • tests/unit/observability/conftest.py
📚 Learning: 2026-03-01T18:03:07.755Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T18:03:07.755Z
Learning: Applies to tests/**/*.py : Use `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow` markers on all tests

Applied to files:

  • tests/unit/config/test_loader.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Use logging levels: debug (routine), info (significant), warning (unexpected but recoverable), error (failures)

Applied to files:

  • src/ai_company/templates/renderer.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to **/*.py : Never dismiss warnings from `ruff check`, `pytest`, or other tools. Investigate and fix them, even if they appear unrelated to the current task.

Applied to files:

  • src/ai_company/templates/renderer.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • src/ai_company/templates/renderer.py
📚 Learning: 2026-03-01T18:03:07.755Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T18:03:07.755Z
Learning: Applies to **/*.py : Do not use `from __future__ import annotations` — Python 3.14 has PEP 649 native lazy annotations

Applied to files:

  • tests/unit/providers/test_base_provider.py
  • tests/unit/observability/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : NEVER add `from __future__ import annotations` - Python 3.14+ has native support for deferred annotation evaluation and does not require this import

Applied to files:

  • tests/unit/providers/test_base_provider.py
  • tests/unit/observability/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

  • tests/unit/observability/conftest.py
🧬 Code graph analysis (11)
src/ai_company/templates/loader.py (3)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/templates/errors.py (3)
  • TemplateNotFoundError (10-11)
  • TemplateRenderError (14-20)
  • TemplateValidationError (23-59)
src/ai_company/templates/schema.py (1)
  • CompanyTemplate (175-268)
src/ai_company/providers/base.py (4)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
tests/unit/providers/test_base_provider.py (2)
  • _do_complete (42-61)
  • _do_stream (63-75)
tests/unit/providers/test_registry.py (2)
  • _do_complete (61-69)
  • _do_stream (71-79)
tests/unit/providers/test_protocol.py (5)
  • _do_complete (89-102)
  • _do_complete (133-147)
  • _do_complete (246-251)
  • _do_stream (104-119)
  • _do_stream (149-162)
src/ai_company/config/loader.py (2)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/observability/setup.py (1)
  • configure_logging (159-195)
tests/unit/config/test_loader.py (3)
tests/unit/config/conftest.py (2)
  • tmp_config_file (148-154)
  • ConfigFileFactory (24-27)
src/ai_company/config/loader.py (2)
  • _parse_yaml_string (96-147)
  • _validate_config_dict (220-269)
src/ai_company/config/errors.py (2)
  • ConfigParseError (66-67)
  • ConfigValidationError (70-106)
src/ai_company/core/task_transitions.py (2)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/core/enums.py (1)
  • TaskStatus (122-143)
src/ai_company/providers/drivers/litellm_driver.py (2)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/providers/errors.py (4)
  • ProviderError (22-69)
  • RateLimitError (78-104)
  • AuthenticationError (72-75)
  • ProviderConnectionError (131-134)
src/ai_company/templates/renderer.py (1)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
tests/unit/providers/test_base_provider.py (5)
src/ai_company/providers/base.py (2)
  • complete (51-95)
  • stream (97-136)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (10-83)
src/ai_company/providers/models.py (6)
  • ChatMessage (114-186)
  • CompletionConfig (189-230)
  • CompletionResponse (233-282)
  • StreamChunk (285-363)
  • ToolDefinition (45-70)
  • TokenUsage (12-42)
src/ai_company/providers/enums.py (2)
  • FinishReason (15-22)
  • MessageRole (6-12)
src/ai_company/providers/errors.py (1)
  • InvalidRequestError (113-116)
tests/unit/config/test_bootstrap_logging.py (3)
src/ai_company/config/loader.py (1)
  • bootstrap_logging (484-499)
src/ai_company/config/schema.py (1)
  • RootConfig (204-328)
src/ai_company/observability/config.py (1)
  • LogConfig (115-192)
src/ai_company/config/__init__.py (1)
src/ai_company/config/loader.py (1)
  • bootstrap_logging (484-499)
tests/unit/templates/test_renderer.py (2)
src/ai_company/templates/loader.py (1)
  • load_template (151-207)
src/ai_company/templates/renderer.py (1)
  • render_template (49-104)
🔇 Additional comments (42)
tests/conftest.py (1)

6-28: LGTM — effective solution for structlog proxy caching in tests.

The monkey-patch approach using a default argument to capture the original structlog.configure is a clean pattern. The docstring clearly explains the caching problem and solution.

Minor: The object type hints are overly broad. Consider using Callable[..., None] for _original to avoid the type: ignore comment, though this is purely cosmetic given the isolated test-infrastructure scope.

src/ai_company/core/task_transitions.py (3)

18-24: LGTM — proper observability integration.

Clean integration of structured logging with appropriate event constants. Module-level logger initialization follows the project pattern.


55-58: Good defensive validation at import time.

This ensures VALID_TRANSITIONS stays synchronized with TaskStatus enum members. Any missing entry will fail fast at module import rather than at runtime.


73-89: Appropriate log levels for transition errors.

  • logger.critical for configuration errors (internal defect) is correct.
  • logger.warning for invalid transitions (caller error) is appropriate.

Both log before raising, ensuring events are captured even if exceptions are swallowed upstream.

src/ai_company/observability/correlation.py (1)

153-210: LGTM — correct async correlation implementation.

The decorator properly validates async functions, binds IDs using bound_contextvars for automatic cleanup, and mirrors the sync version's behavior. Type annotations correctly use Coroutine[object, object, _T].

tests/unit/observability/conftest.py (1)

72-84: Well-implemented fixture with clear documentation.

The type annotation list[MutableMapping[str, Any]] accurately reflects structlog's capture format, and the docstring provides a helpful usage example.

src/ai_company/observability/sinks.py (3)

28-34: Good use of MappingProxyType for immutability.

The routing configuration is correctly protected against accidental mutation, aligning with coding guidelines.


50-62: Solid input validation for filter prefixes.

The validation prevents empty or whitespace-only prefixes that would cause unexpected matching behavior.


171-176: LGTM — defensive check before attaching filter.

Correctly guards against None file_path and only applies routing for configured sinks.

src/ai_company/observability/events.py (1)

1-81: LGTM — well-organized event constants with consistent naming.

The Final[str] typing prevents accidental reassignment, and the domain.noun.verb convention is consistently applied. The section comments improve navigability.

tests/unit/templates/test_renderer.py (2)

23-23: Good — module-level timeout marker covers all tests.

The pytestmark = pytest.mark.timeout(30) correctly applies the 30-second timeout to all tests in the module, per coding guidelines.


203-212: LGTM — focused logging verification test.

The test correctly verifies that render_template emits exactly one TEMPLATE_RENDER_START and one TEMPLATE_RENDER_SUCCESS event. The assertions are appropriately scoped.

Consider using the captured_logs fixture from tests/unit/observability/conftest.py for consistency with the broader test infrastructure.

src/ai_company/providers/base.py (4)

13-32: LGTM!

The observability integration is well-implemented. The module-level logger is correctly initialized using get_logger(__name__), and the event constants are properly imported from the centralized events module.


76-95: LGTM!

The logging around complete() follows the expected pattern: DEBUG for start/success events, ERROR with exc_info=True for failures. The try/except properly re-raises exceptions after logging.


122-136: Stream error handling only covers iterator creation, not iteration.

The try/except wraps only the await self._do_stream(...) call, which handles errors during iterator creation. However, errors that occur during async iteration (when the caller consumes chunks) won't be logged here. This appears intentional given the docstring states implementations must "return an AsyncIterator", but worth confirming this is the desired behavior.

If you want errors during iteration to also be logged, consider wrapping the returned iterator. Otherwise, document that iteration errors are the caller's responsibility.


284-285: Static method uses module-level logger.

_validate_messages is a @staticmethod but uses the module-level logger. This works correctly, but note that if this class is ever subclassed and someone expects per-instance logger customization, it won't apply here. The current design is acceptable for this use case.

src/ai_company/templates/renderer.py (7)

22-46: LGTM!

The observability imports and logger initialization follow the established pattern. Event constants are properly imported from the centralized module.


77-104: LGTM!

INFO-level logging for render start and success events is appropriate for significant operations. The structured fields (source_name) enable effective log filtering and tracing.


135-140: LGTM!

Using logger.error for missing required variables is appropriate since this is a terminal failure that raises an exception. Based on learnings: "Use logger.error() for failures."


192-196: LGTM — using logger.exception is appropriate here.

The code now uses logger.exception which logs at ERROR level and automatically includes the exception traceback. This addresses the concern from the past review about using the correct severity for terminal failures.


223-227: LGTM — logger.exception correctly used for YAML parse errors.

Consistent with the Jinja2 error handling pattern.


371-376: LGTM!

Using logger.warning for unknown personality presets is appropriate — this is recoverable (the agent is still created, just without the preset personality), which aligns with the guideline: "Use logger.warning() for unexpected but recoverable situations."


448-452: LGTM — logger.exception correctly used for validation errors.

Consistent error logging pattern for terminal failures.

tests/unit/providers/test_registry.py (2)

6-12: LGTM!

Imports for structlog and observability events are correctly added to support the new logging tests.


291-314: LGTM!

The logging tests are well-structured with proper markers (@pytest.mark.unit and @pytest.mark.timeout(30)). The tests correctly verify that:

  1. PROVIDER_REGISTRY_BUILT is emitted with provider_count when building from config
  2. PROVIDER_DRIVER_NOT_REGISTERED is emitted with the missing provider name when lookup fails

This addresses the past review comment about adding the timeout marker.

tests/unit/providers/test_base_provider.py (3)

1-36: LGTM!

The test module is well-structured with:

  • Proper module docstring
  • Module-level pytestmark = pytest.mark.timeout(30) ensuring all tests have the required timeout
  • Clean imports following the pattern (no from __future__ import annotations)

39-86: LGTM!

The _StubProvider is a clean minimal implementation for testing, and the _msg helper provides a convenient way to create test messages. The stub correctly implements all required abstract methods.


89-127: LGTM!

The test class comprehensively covers the logging behavior:

  • test_complete_emits_call_start_and_success: Verifies happy path logging
  • test_stream_emits_stream_start: Verifies stream-specific event
  • test_empty_messages_emits_error: Verifies error logging for validation failure
  • test_blank_model_emits_error: Verifies error logging for invalid model

All tests properly use structlog.testing.capture_logs() and have the required @pytest.mark.unit marker.

src/ai_company/config/loader.py (4)

20-33: LGTM!

The observability imports and logger initialization follow the established pattern. All event constants are properly imported from the centralized module.


177-180: LGTM — Event constant now used.

This addresses the past review comment. The hardcoded string has been replaced with CONFIG_YAML_NON_SCALAR_KEY constant.


208-211: LGTM — Event constant now used.

This addresses the past review comment (marked as addressed in commit 1a304cb). The hardcoded string has been replaced with CONFIG_LINE_MAP_COMPOSE_FAILED constant.


484-499: LGTM!

The bootstrap_logging function is well-implemented:

  • Proper type hints and Google-style docstring
  • Lazy import of configure_logging avoids circular imports at module load time
  • Correctly extracts config.logging when config is provided, or passes None for defaults
  • Clear documentation about when to call it (once at startup after load_config)
tests/unit/config/test_bootstrap_logging.py (1)

1-41: LGTM!

The test module is comprehensive and well-structured:

  • Module-level timeout marker ensures all tests have the required 30s limit
  • test_calls_configure_logging_with_none: Verifies None is correctly forwarded
  • test_calls_configure_logging_with_config_logging: Verifies config.logging extraction
  • test_forwards_each_call_to_configure_logging: Renamed from test_is_idempotent per past review feedback, now accurately describes the behavior being tested

The mock target "ai_company.observability.configure_logging" correctly patches at the import location.

src/ai_company/config/__init__.py (3)

9-9: LGTM!

The bootstrap_logging function is correctly added to the autosummary documentation.


33-33: LGTM!

Import added correctly.


59-59: LGTM!

bootstrap_logging is correctly added to __all__, maintaining alphabetical order.

tests/unit/config/test_loader.py (3)

7-7: LGTM!

The structlog import is correctly added to support the new logging tests.


29-33: LGTM!

The observability event constants are correctly imported for use in test assertions.


666-698: LGTM!

The logging tests are well-structured with proper markers (@pytest.mark.unit and @pytest.mark.timeout(30)). This addresses the past review comment about adding the timeout marker.

The tests comprehensively verify:

  1. CONFIG_LOADED event on successful load with config_path
  2. CONFIG_PARSE_FAILED event on YAML syntax error with source
  3. CONFIG_VALIDATION_FAILED event on validation failure with error_count
src/ai_company/templates/loader.py (1)

23-31: Structured template event wiring is consistent.

The migration to get_logger(__name__) plus centralized TEMPLATE_* event constants is clean and keeps template observability stable across warning/error/debug paths.

Also applies to: 39-39, 116-119, 135-137, 180-185, 190-195, 198-202, 435-437

src/ai_company/providers/drivers/mappers.py (1)

12-18: Observability migration in mapper paths looks good.

Event-based warnings and module-level get_logger(__name__) are applied consistently on non-happy paths.

Also applies to: 22-22, 124-127, 164-168

src/ai_company/providers/drivers/litellm_driver.py (1)

43-57: Provider observability hooks are well integrated.

The new structured events in model resolution, stream completion, exception mapping, and LiteLLM model-info fallbacks are coherent and actionable.

Also applies to: 85-85, 268-273, 387-391, 475-496, 548-557

Comment on lines +153 to 156
logger.warning(
PROVIDER_TOOL_CALL_MISSING_FUNCTION,
item=repr(item),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging raw tool-call payload fragments.

These fields can contain user/tool argument content and leak sensitive data into logs. Prefer structural metadata (type/length) over payload previews.

🛡️ Suggested patch
         if func is None:
             logger.warning(
                 PROVIDER_TOOL_CALL_MISSING_FUNCTION,
-                item=repr(item),
+                item_type=type(item).__name__,
             )
             continue
@@
         except json.JSONDecodeError, ValueError:
             logger.warning(
                 PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED,
-                args_preview=raw[:200],
+                args_len=len(raw),
             )
             return {}
@@
         logger.warning(
             PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED,
-            args_preview=raw[:200],
+            args_len=len(raw),
             parsed_type=type(parsed).__name__,
         )

Also applies to: 198-208

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/providers/drivers/mappers.py` around lines 153 - 156, The
logger.warning call using PROVIDER_TOOL_CALL_MISSING_FUNCTION currently logs raw
payload via item=repr(item), which can leak sensitive user/tool arguments;
change both occurrences (the call around PROVIDER_TOOL_CALL_MISSING_FUNCTION and
the similar logger usage in the 198-208 block) to log only structural metadata:
replace repr(item) with a small metadata object like {"type":
type(item).__name__, "length": len(item) if hasattr(item, "__len__") else None,
"is_bytes": isinstance(item, (bytes, bytearray))} and include a clear message
via logger.warning(PROVIDER_TOOL_CALL_MISSING_FUNCTION, item_meta=metadata) so
logs avoid content previews while preserving useful debugging info.

- Remove sensitive args_preview from tool-call warning logs (mappers.py,
  litellm_driver.py) — log args_length/item_type metadata instead
- Use normalized name_clean in TEMPLATE_LOAD_START event (loader.py)
- Consolidate duplicated _clear_logging_state into shared
  tests/conftest.py::clear_logging_state
Copilot AI review requested due to automatic review settings March 1, 2026 19:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 32 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (1)

src/ai_company/observability/sinks.py:16

  • Path is imported only under TYPE_CHECKING, but it’s referenced in runtime annotations (log_dir: Path in _build_file_handler/build_handler). With Python 3.14 (PEP 649) in this repo, annotations may be evaluated at runtime, so Path should be available at runtime to avoid NameError during annotation introspection. Prefer importing Path unconditionally (optionally with the project’s # noqa: TC00x convention) instead of gating it behind TYPE_CHECKING.
from typing import TYPE_CHECKING, Any

if TYPE_CHECKING:
    from pathlib import Path


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 133 to 137
except TemplateRenderError, TemplateValidationError, OSError:
logger.exception(
"Built-in template %r is invalid (packaging defect)",
name,
TEMPLATE_BUILTIN_DEFECT,
template_name=name,
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except clause uses Python 2-style comma syntax (except TemplateRenderError, TemplateValidationError, OSError:), which is a SyntaxError in Python 3. Wrap the exceptions in parentheses (tuple) and bind the exception as as exc if you need it, consistent with the earlier user-template block.

Copilot uses AI. Check for mistakes.
Comment on lines 195 to 202
try:
parsed = json.loads(raw)
except json.JSONDecodeError, ValueError:
_logger.warning(
"Failed to parse tool call arguments: %r",
raw[:200],
logger.warning(
PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED,
args_length=len(raw),
)
return {}
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This except clause uses invalid comma syntax (except json.JSONDecodeError, ValueError:), which will raise a SyntaxError in Python 3 and prevent the module from importing. Use except (json.JSONDecodeError, ValueError): instead.

Copilot uses AI. Check for mistakes.
Comment on lines 524 to 531
try:
return float(raw)
except ValueError, TypeError:
_logger.debug(
"Could not parse retry-after header as seconds: %r",
raw,
logger.debug(
PROVIDER_RETRY_AFTER_PARSE_FAILED,
raw_value=repr(raw),
)
return None
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This except clause uses invalid comma syntax (except ValueError, TypeError:), which is a SyntaxError in Python 3. Use except (ValueError, TypeError): (or separate except blocks) so this module can import.

Copilot uses AI. Check for mistakes.
Comment on lines 544 to 552
try:
raw = _litellm.get_model_info(model=litellm_model)
info: dict[str, Any] = dict(raw) if raw else {}
except KeyError, ValueError:
_logger.info(
"No LiteLLM metadata for model %r, using config defaults",
litellm_model,
logger.info(
PROVIDER_MODEL_INFO_UNAVAILABLE,
model=litellm_model,
)
return {}
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This except clause uses invalid comma syntax (except KeyError, ValueError:), which is a SyntaxError in Python 3. Use except (KeyError, ValueError): so exception handling works and the module remains importable.

Copilot uses AI. Check for mistakes.
Comment on lines 666 to 675
try:
parsed = json.loads(self.arguments) if self.arguments else {}
except json.JSONDecodeError, ValueError:
_logger.warning(
"Failed to parse tool call arguments for tool %r (id=%r): %r",
self.name,
self.id,
self.arguments[:200] if self.arguments else "",
logger.warning(
PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED,
tool_name=self.name,
tool_id=self.id,
args_length=len(self.arguments) if self.arguments else 0,
)
parsed = {}
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This except clause uses invalid comma syntax (except json.JSONDecodeError, ValueError:), which is a SyntaxError in Python 3. Change it to except (json.JSONDecodeError, ValueError): so streamed tool-call argument parsing failures are handled without breaking imports.

Copilot uses AI. Check for mistakes.
Comment on lines 9 to 29
from typing import TYPE_CHECKING, Self

from ai_company.observability import get_logger
from ai_company.observability.events import (
PROVIDER_DRIVER_FACTORY_MISSING,
PROVIDER_DRIVER_INSTANTIATED,
PROVIDER_DRIVER_NOT_REGISTERED,
PROVIDER_REGISTRY_BUILT,
)

from .base import BaseCompletionProvider
from .errors import (
DriverFactoryNotFoundError,
DriverNotRegisteredError,
)

logger = get_logger(__name__)

if TYPE_CHECKING:
from ai_company.config.schema import ProviderConfig

Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProviderConfig is imported only under TYPE_CHECKING, but it’s referenced in runtime annotations (from_config() and _build_driver() signatures). With this repo targeting Python 3.14 (PEP 649 lazy annotations), these annotation expressions may be evaluated at runtime (e.g., via typing.get_type_hints), which would raise NameError if ProviderConfig isn’t importable at runtime. Prefer a runtime import using the project’s # noqa: TC00x pattern instead of if TYPE_CHECKING: for types used in annotations.

Copilot uses AI. Check for mistakes.
Comment on lines 431 to 438
try:
return float(value)
except TypeError, ValueError:
logger.debug(
"Cannot convert %r to float in Pass 1 "
"(may be a Jinja2 placeholder), using 0.0",
value,
TEMPLATE_PASS1_FLOAT_FALLBACK,
value=repr(value),
)
return 0.0
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This except clause uses invalid comma syntax (except TypeError, ValueError:), which is a SyntaxError in Python 3. Use except (TypeError, ValueError): so the float-coercion fallback works without breaking imports.

Copilot uses AI. Check for mistakes.
Comment on lines 15 to 21
from typing import TYPE_CHECKING, ParamSpec, TypeVar

import structlog

if TYPE_CHECKING:
from collections.abc import Callable
from collections.abc import Callable, Coroutine

Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callable/Coroutine are only imported under TYPE_CHECKING, but they’re referenced in runtime annotations below (e.g., return types for with_correlation*). In this repo (Python 3.14 / PEP 649), annotations can be evaluated at runtime, so these names should be importable at runtime (similar to from collections.abc import AsyncIterator # noqa: TC003 in providers/base.py). Import them unconditionally (optionally with the project’s # noqa: TC00x pattern) to avoid NameError when annotations are introspected.

Copilot uses AI. Check for mistakes.
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire all modules into observability system and harden logging infrastructure

2 participants