feat: wire all modules into observability system#97
Conversation
Migrate 5 modules from stdlib logging to structlog get_logger, add structured logging to 5 more business-logic modules, implement sink routing, add with_correlation_async, create event constants, and wire bootstrap_logging into the startup path. - Create events.py with ~30 structured event name constants - Add with_correlation_async() decorator for async functions - Implement _LoggerNameFilter and _SINK_ROUTING for audit/cost/agent sinks - Migrate config/loader, templates/loader, templates/renderer, litellm_driver, mappers from stdlib to get_logger - Add logging to providers/base, providers/registry, core/task, core/task_transitions, core/role_catalog - Add bootstrap_logging() to config/loader and export from config/__init__ - Add syrupy test dependency - Add 40+ new tests (events, correlation_async, sink_routing, bootstrap_logging, base_provider logging, integration sink routing) - Add logging tests to existing test files (loader, task, transitions, registry, renderer) - Fix structlog proxy caching issue in tests/conftest.py - Add Logging section to CLAUDE.md with mandatory conventions - Add logging-audit agent to PR review skill
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThis PR implements comprehensive observability infrastructure by introducing structured event constants, wiring observability across core modules, adding async correlation support, implementing logger-to-sink routing, and instrumenting 15+ modules with event-based logging. It includes bootstrap_logging initialization, new public APIs, extensive test coverage, and documentation updates. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The changes span 25+ files across multiple domains with heterogeneous modifications: new public APIs (bootstrap_logging, with_correlation_async, 50 event constants), async context binding logic, provider error mapping signature change, logger-to-sink routing with filtering, and dense event-based instrumentation across 10+ modules. While individual module changes follow consistent patterns, the interplay between async correlation context handling, sink routing configuration, event constant usage, and test isolation requires careful verification of correctness and consistency. The variety of affected modules and logic density (correlation binding/unbinding, logger name filtering, structured event emission across error paths) elevates complexity beyond simple pattern repetition. Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the application's observability by integrating a comprehensive structured logging system. It standardizes event naming, enables flexible log routing to different files based on module, and ensures correlation IDs are propagated across both synchronous and asynchronous operations. The changes involve migrating existing logging, adding new logging to previously unlogged modules, and bolstering the test suite to cover these new features, ultimately providing a more robust and debuggable system. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/templates/test_renderer.py (1)
1-6: 🧹 Nitpick | 🔵 TrivialAdd file-level timeout marker.
The file is missing the
pytest.mark.timeout(30)marker that other test files in this PR have. Per coding guidelines, tests should have a 30-second timeout.🛠️ Proposed fix
import pytest import structlog from pydantic import ValidationError +pytestmark = pytest.mark.timeout(30) + from ai_company.config.schema import RootConfig🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/templates/test_renderer.py` around lines 1 - 6, Add a file-level pytest timeout marker to this test module by declaring a top-level pytestmark = pytest.mark.timeout(30) so the entire file uses the 30-second timeout; the file already imports pytest, so add the pytestmark assignment near the module docstring/top imports in tests/unit/templates/test_renderer.py to apply to all tests in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/config/loader.py`:
- Around line 206-209: Replace the hard-coded event name string in the
logger.warning call inside loader.py with a constant defined in events.py: add
CONFIG_LINE_MAP_COMPOSE_FAILED = "config.line_map.compose_failed" to
src/ai_company/observability/events.py and update the logger.warning call in
src/ai_company/config/loader.py to import and use
events.CONFIG_LINE_MAP_COMPOSE_FAILED (ensure the import is added and the
error=str(exc) kwarg remains unchanged).
- Around line 175-178: Add a constant CONFIG_YAML_NON_SCALAR_KEY =
"config.yaml.non_scalar_key" to ai_company.observability.events and update the
logger call in loader.py (the logger.debug invocation that currently passes the
hardcoded string) to use events.CONFIG_YAML_NON_SCALAR_KEY instead; ensure you
import the constant (or the events module) at the top of
src/ai_company/config/loader.py so the logger.debug call uses the constant name
rather than the literal string.
In `@src/ai_company/providers/drivers/litellm_driver.py`:
- Around line 137-142: The driver is re-emitting lifecycle events that
BaseCompletionProvider already emits; remove or stop emitting the duplicate
logger.debug calls that use PROVIDER_CALL_START, PROVIDER_CALL_SUCCESS, and
PROVIDER_STREAM_START in this driver (the debug call passing
provider=self._provider_name, model=model, message_count=len(messages) and the
similar calls at the other two locations referenced). Locate these logger.debug
invocations in the LiteLLM driver methods (the calls that include
PROVIDER_CALL_START/PROVIDER_CALL_SUCCESS/PROVIDER_STREAM_START symbols) and
either delete them or change them to non-lifecycle debug messages (or guard them
behind a driver-only flag) so only BaseCompletionProvider emits those lifecycle
events.
In `@src/ai_company/templates/loader.py`:
- Around line 434-435: Replace the string literal
"template.pass1.float_fallback" in loader.py with a centralized constant: add
TEMPLATE_PASS1_FLOAT_FALLBACK: str = "template.pass1.float_fallback" to
observability/events.py and import that constant into loader.py, then use
TEMPLATE_PASS1_FLOAT_FALLBACK where the literal is currently passed (preserve
the value=repr(value) usage); this keeps event names consistent and
audit-friendly across the codebase.
In `@src/ai_company/templates/renderer.py`:
- Around line 192-196: Replace the logger.warning calls used right before
exceptions are raised with logger.error so terminal render/validation failures
are logged at the correct severity; specifically change the call that logs
TEMPLATE_RENDER_JINJA2_ERROR (and the similar warning calls in the other
indicated branches) to logger.error with the same message parameters (e.g.,
source_name and error=str(exc)), and update all three occurrences mentioned
(including the blocks around TEMPLATE_RENDER_JINJA2_ERROR and the two other
analogous branches) so they consistently log errors before re-raising.
In `@tests/unit/config/test_bootstrap_logging.py`:
- Around line 35-41: Rename the test method test_is_idempotent to a name that
reflects its actual behavior (for example
test_forwards_each_call_to_configure_logging) or keep the name and add a
one-line docstring explaining it checks that bootstrap_logging forwards calls to
configure_logging; update references to the test method name in the file and
ensure it still patches ai_company.observability.configure_logging and asserts
mock_configure.call_count == 2 (symbols: bootstrap_logging, configure_logging,
test_is_idempotent).
In `@tests/unit/config/test_loader.py`:
- Around line 666-697: The TestLoaderLogging test class is missing the required
30s timeout marker; add the pytest timeout mark to the class (e.g., annotate
TestLoaderLogging with pytest.mark.timeout(30)) so all its methods
(test_config_loaded_event_on_success, test_config_parse_failed_event,
test_config_validation_failed_event) inherit the 30-second limit, ensuring each
test in that class uses the mandated timeout.
In `@tests/unit/observability/conftest.py`:
- Around line 72-76: The new captured_logs fixture is unused; update the tests
(test_task.py, test_renderer.py, test_task_transitions.py) that call
structlog.testing.capture_logs() directly to accept the captured_logs fixture
instead (replace local with parameter captured_logs) and use that list for
assertions, or if you prefer to keep current usage, add a brief
comment/docstring to the fixture explaining intended use so it isn’t mistaken
for dead code; refer to the captured_logs fixture name when making the
replacement.
In `@tests/unit/providers/test_registry.py`:
- Around line 291-313: Add a 30s per-test timeout for this test module by
applying pytest's timeout marker; for example, set a module-level pytestmark =
pytest.mark.timeout(30) (or decorate the TestRegistryLogging class with
`@pytest.mark.timeout`(30)) so both TestRegistryLogging.test_registry_built_event
and TestRegistryLogging.test_driver_not_registered_event run with a 30-second
timeout.
---
Outside diff comments:
In `@tests/unit/templates/test_renderer.py`:
- Around line 1-6: Add a file-level pytest timeout marker to this test module by
declaring a top-level pytestmark = pytest.mark.timeout(30) so the entire file
uses the 30-second timeout; the file already imports pytest, so add the
pytestmark assignment near the module docstring/top imports in
tests/unit/templates/test_renderer.py to apply to all tests in this file.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (32)
.claude/skills/aurelio-review-pr/skill.mdCLAUDE.mdpyproject.tomlsrc/ai_company/config/__init__.pysrc/ai_company/config/loader.pysrc/ai_company/core/role_catalog.pysrc/ai_company/core/task.pysrc/ai_company/core/task_transitions.pysrc/ai_company/observability/__init__.pysrc/ai_company/observability/correlation.pysrc/ai_company/observability/events.pysrc/ai_company/observability/sinks.pysrc/ai_company/providers/base.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/providers/drivers/mappers.pysrc/ai_company/providers/registry.pysrc/ai_company/templates/loader.pysrc/ai_company/templates/renderer.pytests/conftest.pytests/integration/observability/__init__.pytests/integration/observability/test_sink_routing_integration.pytests/unit/config/test_bootstrap_logging.pytests/unit/config/test_loader.pytests/unit/core/test_task.pytests/unit/core/test_task_transitions.pytests/unit/observability/conftest.pytests/unit/observability/test_correlation_async.pytests/unit/observability/test_events.pytests/unit/observability/test_sink_routing.pytests/unit/providers/test_base_provider.pytests/unit/providers/test_registry.pytests/unit/templates/test_renderer.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
Add type hints to all public functions — mypy strict mode enforced
Use Google-style docstrings on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones — practice immutability
Use Pydantic v2 withBaseModel,model_validator, andConfigDictfor data models
Enforce 88 character line length — ruff configured
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly — never silently swallow exceptions
Validate at system boundaries — validate user input, external APIs, and config files explicitly
Files:
src/ai_company/observability/__init__.pytests/unit/core/test_task.pysrc/ai_company/config/__init__.pytests/unit/observability/test_correlation_async.pysrc/ai_company/core/task.pytests/unit/observability/test_sink_routing.pytests/integration/observability/test_sink_routing_integration.pysrc/ai_company/templates/loader.pysrc/ai_company/providers/registry.pytests/unit/providers/test_base_provider.pytests/unit/config/test_loader.pytests/unit/templates/test_renderer.pytests/unit/core/test_task_transitions.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/templates/renderer.pysrc/ai_company/providers/base.pysrc/ai_company/observability/sinks.pytests/unit/observability/conftest.pysrc/ai_company/config/loader.pysrc/ai_company/providers/drivers/mappers.pytests/unit/providers/test_registry.pysrc/ai_company/core/role_catalog.pysrc/ai_company/observability/events.pysrc/ai_company/core/task_transitions.pytests/conftest.pytests/unit/config/test_bootstrap_logging.pysrc/ai_company/observability/correlation.pytests/unit/observability/test_events.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain minimum 80% code coverage — enforced in CI
Files:
src/ai_company/observability/__init__.pysrc/ai_company/config/__init__.pysrc/ai_company/core/task.pysrc/ai_company/templates/loader.pysrc/ai_company/providers/registry.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/templates/renderer.pysrc/ai_company/providers/base.pysrc/ai_company/observability/sinks.pysrc/ai_company/config/loader.pysrc/ai_company/providers/drivers/mappers.pysrc/ai_company/core/role_catalog.pysrc/ai_company/observability/events.pysrc/ai_company/core/task_transitions.pysrc/ai_company/observability/correlation.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slowmarkers on all tests
Set test timeout to 30 seconds per test
Use parallel test execution withpytest-xdistvia-n autoflag
Files:
tests/unit/core/test_task.pytests/unit/observability/test_correlation_async.pytests/unit/observability/test_sink_routing.pytests/integration/observability/test_sink_routing_integration.pytests/unit/providers/test_base_provider.pytests/unit/config/test_loader.pytests/unit/templates/test_renderer.pytests/unit/core/test_task_transitions.pytests/unit/observability/conftest.pytests/unit/providers/test_registry.pytests/conftest.pytests/unit/config/test_bootstrap_logging.pytests/unit/observability/test_events.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Pin all dependency versions using==inpyproject.toml
Organize dependencies into groups:test(pytest + plugins),dev(includes test + ruff, mypy, pre-commit, commitizen, pydantic) withuv syncinstalling all
Files:
pyproject.toml
🧠 Learnings (19)
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/integration/test_*.py : Place integration tests for component interactions in `tests/integration/` directory
Applied to files:
tests/integration/observability/test_sink_routing_integration.py
📚 Learning: 2026-03-01T18:03:07.755Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T18:03:07.755Z
Learning: Applies to pyproject.toml : Organize dependencies into groups: `test` (pytest + plugins), `dev` (includes test + ruff, mypy, pre-commit, commitizen, pydantic) with `uv sync` installing all
Applied to files:
pyproject.toml
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All functions and methods should have appropriate logging for debugging and traceability. Use `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/**/*.py : All functions and methods should have appropriate logging using `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Use logging levels: debug (routine), info (significant), warning (unexpected but recoverable), error (failures)
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Logs must be written to `output/logs/story_factory.log` using `logger = logging.getLogger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Log all significant operations using Python logging with appropriate levels: debug for detailed info, info for successes, warning for retries, error for failures
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`), CI enforces this coverage requirement
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit. The CI enforces 100% coverage on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`).
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings
Applied to files:
tests/unit/observability/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`
Applied to files:
tests/unit/observability/conftest.pytests/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures
Applied to files:
tests/unit/observability/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`
Applied to files:
tests/unit/observability/conftest.pytests/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Applied to files:
tests/unit/observability/conftest.py
🧬 Code graph analysis (20)
src/ai_company/observability/__init__.py (1)
src/ai_company/observability/correlation.py (1)
with_correlation_async(154-211)
tests/unit/core/test_task.py (1)
src/ai_company/core/enums.py (1)
TaskStatus(122-143)
src/ai_company/config/__init__.py (1)
src/ai_company/config/loader.py (1)
bootstrap_logging(482-497)
tests/unit/observability/test_correlation_async.py (2)
src/ai_company/observability/correlation.py (1)
with_correlation_async(154-211)src/ai_company/providers/drivers/litellm_driver.py (1)
update(644-663)
src/ai_company/core/task.py (1)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)
tests/unit/observability/test_sink_routing.py (1)
src/ai_company/observability/sinks.py (2)
_LoggerNameFilter(34-66)filter(57-66)
tests/integration/observability/test_sink_routing_integration.py (3)
src/ai_company/observability/config.py (2)
LogConfig(115-192)SinkConfig(50-112)src/ai_company/observability/enums.py (2)
LogLevel(6-17)SinkType(32-41)src/ai_company/observability/setup.py (1)
configure_logging(159-195)
src/ai_company/templates/loader.py (2)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/templates/errors.py (3)
TemplateNotFoundError(10-11)TemplateRenderError(14-20)TemplateValidationError(23-59)
src/ai_company/providers/registry.py (2)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/providers/errors.py (2)
DriverFactoryNotFoundError(159-162)DriverNotRegisteredError(143-146)
tests/unit/providers/test_base_provider.py (2)
src/ai_company/providers/base.py (6)
BaseCompletionProvider(35-303)_do_complete(148-166)_do_stream(169-189)_do_get_model_capabilities(192-201)complete(51-91)stream(93-128)src/ai_company/providers/errors.py (1)
InvalidRequestError(113-116)
tests/unit/templates/test_renderer.py (1)
src/ai_company/templates/renderer.py (1)
render_template(49-104)
tests/unit/core/test_task_transitions.py (2)
src/ai_company/core/enums.py (1)
TaskStatus(122-143)src/ai_company/core/task_transitions.py (1)
validate_transition(56-90)
src/ai_company/templates/renderer.py (1)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)
src/ai_company/providers/base.py (1)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)
src/ai_company/config/loader.py (3)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/config/schema.py (1)
RootConfig(204-328)src/ai_company/observability/setup.py (1)
configure_logging(159-195)
src/ai_company/providers/drivers/mappers.py (3)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/providers/enums.py (2)
FinishReason(15-22)MessageRole(6-12)src/ai_company/providers/models.py (3)
ChatMessage(114-186)ToolCall(73-95)ToolDefinition(45-70)
tests/unit/providers/test_registry.py (2)
src/ai_company/config/schema.py (2)
ProviderConfig(52-95)ProviderModelConfig(17-49)src/ai_company/providers/errors.py (1)
DriverNotRegisteredError(143-146)
src/ai_company/core/role_catalog.py (2)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/providers/registry.py (1)
get(66-94)
src/ai_company/core/task_transitions.py (1)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)
tests/unit/config/test_bootstrap_logging.py (3)
src/ai_company/config/loader.py (1)
bootstrap_logging(482-497)src/ai_company/config/schema.py (1)
RootConfig(204-328)src/ai_company/observability/config.py (1)
LogConfig(115-192)
🔇 Additional comments (31)
pyproject.toml (1)
43-43: Good test dependency addition with correct pinning.
syrupy==5.1.0is correctly pinned and placed in thetestdependency group, which aligns with the repository dependency conventions.As per coding guidelines,
pyproject.tomldependencies should be pinned with==and organized intotest/devgroups.tests/conftest.py (1)
6-27: LGTM! Effective workaround for structlog proxy caching in tests.The approach of forcing
cache_logger_on_first_use=Falseglobally ensurescapture_logs()works reliably across all tests. The default-argument pattern for capturing_originalis a valid Python idiom.Minor observation: the
_original: objecttype hint is imprecise (aCallabletype would be more accurate), but the# type: ignorecomments appropriately handle mypy.CLAUDE.md (1)
61-72: LGTM! Clear and actionable logging governance.The new Logging section provides comprehensive guidance that aligns with the PR's observability goals:
- Mandatory
get_loggerusage with consistentloggernaming- Event constants from
ai_company.observability.events- Structured kwargs pattern
- Clear level guidelines (DEBUG/INFO/WARNING/ERROR)
This supersedes any prior guidance about stdlib
logging.getLoggerusage. Based on learnings, documentation updates like this are expected when making architectural changes.src/ai_company/observability/sinks.py (2)
22-67: LGTM! Well-designed logger routing with flexible filtering.The
_LoggerNameFilterclass correctly implements prefix-based filtering with exclude-before-include precedence. The_SINK_ROUTINGmapping cleanly separates concerns:
audit.log← security logscost_usage.log← budget and provider logsagent_activity.log← engine and core logsThe
exclude_prefixesparameter is unused currently but provides forward compatibility for more complex routing scenarios.
164-169: LGTM! Filter attachment integrated correctly.The filter is appropriately attached only to FILE sinks that have routing configured, leaving other sinks (like console or catch-all files) unfiltered.
src/ai_company/config/loader.py (1)
482-497: LGTM! Clean bootstrap entry point.The
bootstrap_loggingfunction provides a clean public API for initializing observability after config loading. The lazy import ofconfigure_loggingappropriately avoids potential circular import issues between config and observability modules.src/ai_company/observability/__init__.py (1)
30-31: LGTM! Public API surface correctly extended.The
with_correlation_asyncdecorator is properly imported and added to__all__, making it available for async correlation tracking across the codebase.Also applies to: 52-52
src/ai_company/core/task.py (2)
12-15: LGTM! Proper observability setup.Logger is correctly instantiated using the observability module with
__name__and standardloggervariable name.
223-230: LGTM! State transition logging correctly implemented.The
TASK_STATUS_CHANGEDevent is logged at INFO level after successful validation, with all relevant context (task_id,from_status,to_status). This follows the CLAUDE.md guideline that "all state transitions must log at INFO".src/ai_company/core/role_catalog.py (2)
14-17: LGTM! Proper observability setup.Logger correctly instantiated with standard conventions.
423-426: LGTM! Appropriate DEBUG-level logging for lookup misses.Logging
ROLE_LOOKUP_MISSat DEBUG level is appropriate since returningNonefor an unknown role is expected behavior (not an error). This contrasts correctly withproviders/registry.pywhich uses ERROR level because it raises an exception.tests/unit/config/test_bootstrap_logging.py (1)
11-11: LGTM! Correct timeout configuration.Module-level timeout marker correctly set to 30 seconds per coding guidelines.
src/ai_company/config/__init__.py (1)
32-32: LGTM!The
bootstrap_loggingfunction is correctly imported and exposed in the public API. The__all__list maintains alphabetical ordering.Also applies to: 58-58
tests/unit/core/test_task.py (1)
472-482: LGTM!The logging test correctly uses
structlog.testing.capture_logs()to verify theTASK_STATUS_CHANGEDevent is emitted with the expected fields (task_id,from_status,to_status). Test marker and timeout are properly configured via the file-levelpytestmark..claude/skills/aurelio-review-pr/skill.md (1)
95-103: LGTM!The logging-audit agent is well-defined with clear trigger conditions and specific violation checks that align with the observability conventions established in this PR. The severity categorization (CRITICAL for hard violations like
logging.getLogger/print(), MAJOR for style issues) is appropriate.tests/unit/templates/test_renderer.py (1)
201-210: LGTM!The logging test correctly verifies that
render_templateemits bothTEMPLATE_RENDER_STARTandTEMPLATE_RENDER_SUCCESSevents. The use ofstructlog.testing.capture_logs()is appropriate for this assertion pattern.tests/unit/core/test_task_transitions.py (1)
142-153: LGTM!The test correctly verifies that
TASK_TRANSITION_INVALIDis emitted when an invalid transition is attempted. The nested context manager pattern withstructlog.testing.capture_logs()andpytest.raises()is clean and appropriate.src/ai_company/core/task_transitions.py (2)
18-24: LGTM!The module correctly uses
get_logger(__name__)for the module-level logger and imports the appropriate event constants. This follows the observability conventions established in this PR.
68-84: LGTM!The logging instrumentation uses appropriate log levels:
criticalforTASK_TRANSITION_CONFIG_ERROR(indicates a bug in the transition map)warningforTASK_TRANSITION_INVALID(indicates invalid user input, recoverable)Structured kwargs (
current_status,target_status,allowed) provide useful context for debugging.src/ai_company/observability/correlation.py (2)
154-211: LGTM!The
with_correlation_asyncdecorator correctly mirrors the synchronouswith_correlationpattern:
- Validates the decorated function is a coroutine function
- Binds correlation IDs before execution via
bound_contextvarscontext manager- Properly awaits the wrapped coroutine
- IDs are automatically unbound when the context manager exits
The type signature using
Callable[_P, Coroutine[object, object, _T]]is appropriate for async function decorators.
132-132: LGTM!The error message update correctly guides users to use
with_correlation_async()for async functions.tests/unit/observability/conftest.py (1)
8-10: No issue — TYPE_CHECKING imports are correct for Python 3.14.With Python 3.14's native PEP 649 lazy annotation evaluation, using
IteratorandMutableMappingunderTYPE_CHECKINGwhile referencing them in type hints is the correct approach. Annotations are not evaluated at runtime, so names fromTYPE_CHECKINGblocks are safe to use. The coding guidelines explicitly endorse this pattern: "Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations."Likely an incorrect or invalid review comment.
tests/unit/observability/test_sink_routing.py (1)
25-89: Good coverage for routing/filter behavior.These tests exercise the key acceptance/rejection paths and expected sink-prefix mappings clearly.
tests/unit/providers/test_base_provider.py (1)
89-127: Logging behavior tests are well-targeted.The new cases verify success/start/error event emission on key base-provider paths with clear assertions.
tests/unit/observability/test_events.py (1)
26-64: Strong guardrails for observability event taxonomy.Pattern, uniqueness, and baseline-count checks provide good regression protection for event constants.
src/ai_company/providers/registry.py (1)
80-89: Registry observability wiring looks good.The added events capture the critical lifecycle/error points with actionable context fields.
Also applies to: 155-160, 182-187, 221-226
tests/unit/observability/test_correlation_async.py (1)
11-70: Decorator contract coverage is solid.The suite validates context binding lifecycle, exception cleanup, and sync-function rejection comprehensively.
src/ai_company/providers/drivers/mappers.py (1)
12-17: Logging migration and warning instrumentation look correct.The new logger integration and structured warning payloads keep behavior stable while improving observability.
Also applies to: 118-121, 147-150, 158-162, 189-192
src/ai_company/providers/base.py (1)
13-19: Structured provider observability wiring looks good.The new lifecycle and validation-error events are consistently emitted with useful context and preserve existing control flow.
Also applies to: 32-33, 76-91, 118-123, 276-296
tests/integration/observability/test_sink_routing_integration.py (1)
15-147: Good end-to-end sink routing coverage.These integration cases validate namespace routing and sink-level severity filtering with real file handlers.
src/ai_company/observability/events.py (1)
1-63: Event catalog is well-structured and clear.The grouped constants improve consistency and make log instrumentation easier to audit.
There was a problem hiding this comment.
Code Review
This is an excellent and thorough pull request that introduces a robust observability system based on structlog. The use of a central events.py for event constants, sink routing for different log types, and correlation ID tracking are all best practices that will significantly improve the application's maintainability and debuggability. The migration of existing modules and the extensive test suite are also very well done.
My review focuses on ensuring full adherence to the new logging conventions you've introduced. I've found a few places where bare strings are still used for log event names instead of constants from events.py. Addressing these will make the implementation fully consistent with the new standards defined in CLAUDE.md.
| logger.debug( | ||
| "Skipping non-scalar YAML key: %s", | ||
| type(key_node).__name__, | ||
| "config.yaml.non_scalar_key", | ||
| key_type=type(key_node).__name__, | ||
| ) |
There was a problem hiding this comment.
This file has a couple of log calls using bare strings for event names, which goes against the new logging conventions. Please define constants in observability/events.py for these and use them instead.
"config.yaml.non_scalar_key"(L176)"config.line_map.compose_failed"(L207)
For example, for the first case, you could add CONFIG_YAML_NON_SCALAR_KEY: str = "config.yaml.non_scalar_key" to events.py, import it, and change the call to:
logger.debug(
CONFIG_YAML_NON_SCALAR_KEY,
key_type=type(key_node).__name__,
)| if delta is None: | ||
| _logger.debug("Stream chunk has choices but no delta, skipping") | ||
| logger.debug("provider.stream.chunk_no_delta") | ||
| return result |
There was a problem hiding this comment.
This file contains several log calls that use bare strings for event names, which violates the new logging convention to use constants from observability/events.py.
Please convert the following bare strings into constants in events.py and use them here:
provider.stream.chunk_no_delta(L430)provider.retry_after.parse_failed(L541)provider.model_info.unavailable(L562)provider.model_info.unexpected_error(L568)provider.tool_call.arguments_truncated(L658)provider.tool_call.incomplete(L673)provider.tool_call.arguments_parse_failed(L683)
For example, logger.debug("provider.stream.chunk_no_delta") should become logger.debug(PROVIDER_STREAM_CHUNK_NO_DELTA) after defining the constant.
| logger.warning( | ||
| "provider.finish_reason.unknown", | ||
| reason=reason, | ||
| ) |
There was a problem hiding this comment.
This file also has several log calls using bare strings instead of event constants. This is a good example of why constants are valuable, as some of these strings (e.g., provider.tool_call.incomplete) are also used in litellm_driver.py.
Please convert these to constants in observability/events.py:
provider.finish_reason.unknown(L119)provider.tool_call.missing_function(L147)provider.tool_call.incomplete(L158)provider.tool_call.arguments_parse_failed(L189)
Using a shared constant like PROVIDER_TOOL_CALL_INCOMPLETE would ensure consistency across both modules.
| logger.debug( | ||
| "Cannot convert %r to float in Pass 1 " | ||
| "(may be a Jinja2 placeholder), using 0.0", | ||
| value, | ||
| "template.pass1.float_fallback", | ||
| value=repr(value), | ||
| ) |
There was a problem hiding this comment.
This log call uses a bare string "template.pass1.float_fallback" for the event name. To follow the new logging conventions, this should be defined as a constant in observability/events.py.
For example, you could add TEMPLATE_PASS1_FLOAT_FALLBACK: str = "template.pass1.float_fallback" to events.py, import it, and then change this call to:
logger.debug(
TEMPLATE_PASS1_FLOAT_FALLBACK,
value=repr(value),
)There was a problem hiding this comment.
Pull request overview
This PR wires application modules into the existing ai_company.observability stack by standardizing on get_logger(__name__), introducing a shared catalog of structured event-name constants, adding async correlation support, and implementing sink routing based on logger-name prefixes—plus extensive test coverage to validate the behavior end-to-end.
Changes:
- Added
ai_company.observability.eventsconstants and expanded instrumentation across config/templates/providers/core modules using structlog. - Implemented async correlation via
with_correlation_async()and added logger-name-based sink routing (_LoggerNameFilter+_SINK_ROUTING). - Added/expanded unit + integration tests for event constants, correlation, sink routing, and newly introduced log emission.
Reviewed changes
Copilot reviewed 31 out of 33 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds syrupy to lockfile for test tooling. |
| pyproject.toml | Adds syrupy to the test dependency group. |
| CLAUDE.md | Introduces mandatory logging conventions for the repo. |
| .claude/skills/aurelio-review-pr/skill.md | Updates PR review skill to include a logging-audit agent/checklist. |
| tests/conftest.py | Patches structlog configure behavior during tests to avoid proxy caching issues. |
| tests/unit/templates/test_renderer.py | Adds assertions for template renderer start/success log events. |
| tests/unit/providers/test_registry.py | Adds assertions for provider registry build/error log events. |
| tests/unit/providers/test_base_provider.py | New tests for BaseCompletionProvider logging behavior. |
| tests/unit/observability/test_sink_routing.py | New unit tests for logger-name filters and routing table. |
| tests/unit/observability/test_events.py | New tests enforcing event-name format, uniqueness, and presence. |
| tests/unit/observability/test_correlation_async.py | New tests for async correlation decorator behavior. |
| tests/unit/observability/conftest.py | Adds a captured_logs fixture for field-level structlog assertions. |
| tests/unit/core/test_task_transitions.py | Adds test ensuring invalid transitions emit the expected log event. |
| tests/unit/core/test_task.py | Adds test ensuring task status transitions emit the expected log event. |
| tests/unit/config/test_loader.py | Adds tests asserting config loader emits success/parse/validation log events. |
| tests/unit/config/test_bootstrap_logging.py | Adds tests for the new bootstrap_logging() startup hook. |
| tests/integration/observability/test_sink_routing_integration.py | Integration tests validating routing behavior with real file handlers. |
| tests/integration/observability/init.py | Marks/creates the integration test package. |
| src/ai_company/templates/renderer.py | Migrates to get_logger + adds structured template render lifecycle logging. |
| src/ai_company/templates/loader.py | Migrates to get_logger + adds structured template load/list logging. |
| src/ai_company/providers/registry.py | Adds structured logging for registry build/lookup/factory failures. |
| src/ai_company/providers/drivers/mappers.py | Migrates to get_logger and logs mapping edge cases in structured form. |
| src/ai_company/providers/drivers/litellm_driver.py | Migrates to get_logger and adds provider call/stream/error structured logs. |
| src/ai_company/providers/base.py | Adds baseline structured logging for provider call/stream entry + validation errors. |
| src/ai_company/observability/sinks.py | Implements logger-name-based sink routing via handler filters. |
| src/ai_company/observability/events.py | New module defining structured event-name constants. |
| src/ai_company/observability/correlation.py | Adds with_correlation_async() and updates messaging for async misuse. |
| src/ai_company/observability/init.py | Re-exports with_correlation_async. |
| src/ai_company/core/task_transitions.py | Adds structured logging for invalid transitions and config errors. |
| src/ai_company/core/task.py | Adds structured logging for task status transitions. |
| src/ai_company/core/role_catalog.py | Adds structured logging for role lookup misses. |
| src/ai_company/config/loader.py | Migrates to get_logger, adds structured config lifecycle logs, and adds bootstrap_logging(). |
| src/ai_company/config/init.py | Re-exports bootstrap_logging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.warning( | ||
| "Failed to compose YAML AST for line mapping; " | ||
| "validation errors will lack line/column information: %s", | ||
| exc, | ||
| "config.line_map.compose_failed", | ||
| error=str(exc), | ||
| ) |
There was a problem hiding this comment.
This log call uses a string literal event name ("config.line_map.compose_failed") instead of an ai_company.observability.events constant, which conflicts with the logging conventions added in this PR. Consider adding an event constant for this case and using it here.
| logger.debug( | ||
| PROVIDER_STREAM_START, | ||
| provider=self._provider_name, | ||
| model=model, | ||
| message_count=len(messages), | ||
| ) |
There was a problem hiding this comment.
Same duplication issue as _do_complete(): BaseCompletionProvider.stream() now emits PROVIDER_STREAM_START, and this driver emits it again. Consider logging this event in only one place (or use a driver-specific event) to avoid duplicate entries.
| logger.debug( | |
| PROVIDER_STREAM_START, | |
| provider=self._provider_name, | |
| model=model, | |
| message_count=len(messages), | |
| ) |
| logger.warning( | ||
| "provider.finish_reason.unknown", | ||
| reason=reason, | ||
| ) |
There was a problem hiding this comment.
This module logs events using string literals (e.g. "provider.finish_reason.unknown") instead of constants from ai_company.observability.events, which is now documented as mandatory in CLAUDE.md. Consider adding event constants for these mapper warnings and using them here so event naming stays consistent across providers.
src/ai_company/templates/loader.py
Outdated
| "template.pass1.float_fallback", | ||
| value=repr(value), |
There was a problem hiding this comment.
This log call uses a string literal event name ("template.pass1.float_fallback") rather than an ai_company.observability.events constant, which conflicts with the logging conventions introduced in this PR. Consider adding a corresponding template event constant and referencing it here.
| def _patched_configure( | ||
| _original: object = structlog.configure, | ||
| **kwargs: object, | ||
| ) -> None: |
There was a problem hiding this comment.
The global monkeypatch of structlog.configure only accepts **kwargs, which can break callers that pass positional args to structlog.configure(...) (it’s part of structlog’s public API). Consider accepting *args, **kwargs and forwarding both to the original function, while still forcing cache_logger_on_first_use=False.
| @pytest.fixture | ||
| def captured_logs() -> Iterator[list[MutableMapping[str, Any]]]: | ||
| """Capture structlog output as list of dicts for field-level assertions.""" | ||
| with structlog.testing.capture_logs() as cap: | ||
| yield cap |
There was a problem hiding this comment.
MutableMapping is imported only under TYPE_CHECKING, but it’s referenced in the runtime return annotation of captured_logs(). With Python 3.14 lazy annotations (and no __future__.annotations), evaluating annotations can raise NameError. Import MutableMapping at runtime (with # noqa: TC003 if needed) or use a string annotation.
| if TYPE_CHECKING: | ||
| from collections.abc import AsyncIterator | ||
|
|
||
| from ai_company.providers.capabilities import ModelCapabilities | ||
| from ai_company.providers.models import ( | ||
| ChatMessage, | ||
| CompletionConfig, | ||
| CompletionResponse, | ||
| StreamChunk, | ||
| ToolDefinition, | ||
| ) |
There was a problem hiding this comment.
Types used in annotations (AsyncIterator, ModelCapabilities, StreamChunk, etc.) are imported only under TYPE_CHECKING. In this repo (Python 3.14, no __future__.annotations), evaluating annotations can raise NameError. Prefer runtime imports with # noqa: TC003 (as used elsewhere in tests) or string annotations.
| return | ||
| yield # make it an async generator |
There was a problem hiding this comment.
This async generator helper uses return before yield, leaving an unreachable yield solely to force generator-ness. This is hard to read and can trip “unreachable code” tooling. Consider using a clearer pattern (e.g., if False: yield ...) or yielding nothing without unreachable statements.
| return | |
| yield # make it an async generator | |
| # This async generator intentionally yields no items. | |
| if False: | |
| yield # make it an async generator |
| except TemplateRenderError, TemplateValidationError, OSError: | ||
| logger.exception( | ||
| "Built-in template %r is invalid (packaging defect)", | ||
| name, | ||
| TEMPLATE_BUILTIN_DEFECT, | ||
| template_name=name, | ||
| ) |
There was a problem hiding this comment.
This except TemplateRenderError, TemplateValidationError, OSError: clause is invalid Python 3 syntax and will raise a SyntaxError at import time. Use an exception tuple (and consider capturing as exc if you want to log details) so list_templates() can run.
| logger.debug( | ||
| "Skipping non-scalar YAML key: %s", | ||
| type(key_node).__name__, | ||
| "config.yaml.non_scalar_key", | ||
| key_type=type(key_node).__name__, | ||
| ) |
There was a problem hiding this comment.
These log calls use a bare string event name, but this PR’s logging conventions (CLAUDE.md ## Logging) require using constants from ai_company.observability.events. Add a corresponding constant (or reuse an existing one) and reference it here for consistency/auditability.
- Add 14 missing event constants to events.py and replace all bare string event names in litellm_driver, mappers, config/loader, and templates/loader - Upgrade all event constants from str to Final[str] - Remove duplicate lifecycle events from LiteLLMDriver that BaseCompletionProvider already emits - Narrow _map_exception return type from Exception to ProviderError - Add import-time exhaustiveness check for VALID_TRANSITIONS - Fix outdated with_correlation docstring to reference with_correlation_async - Add bootstrap_logging to config/__init__.py autosummary - Elevate terminal render failures to logger.exception - Add error logging in BaseCompletionProvider for _do_complete/_do_stream - Make _SINK_ROUTING immutable with MappingProxyType - Add input validation for _LoggerNameFilter prefixes - Add timeout markers to TestLoaderLogging, TestRegistryLogging, and test_renderer.py - Add integration test cleanup fixture for configure_logging isolation - Rename test_is_idempotent to test_forwards_each_call_to_configure_logging - Document captured_logs fixture with usage example - Remove unused syrupy dependency
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ai_company/observability/correlation.py (1)
136-146: 🧹 Nitpick | 🔵 TrivialConsider extracting the bindings logic to reduce duplication.
Both decorators build the same
bindingsdict. A small helper could DRY this up:♻️ Optional refactor
def _build_bindings( request_id: str | None, task_id: str | None, agent_id: str | None, ) -> dict[str, str]: bindings: dict[str, str] = {} if request_id is not None: bindings["request_id"] = request_id if task_id is not None: bindings["task_id"] = task_id if agent_id is not None: bindings["agent_id"] = agent_id return bindingsThen both wrappers just call
_build_bindings(request_id, task_id, agent_id).Also applies to: 196-206
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/observability/correlation.py` around lines 136 - 146, The two decorator inner functions (wrapper) duplicate the logic that constructs the bindings dict before calling structlog.contextvars.bound_contextvars; extract that logic into a small helper function named _build_bindings(request_id: str | None, task_id: str | None, agent_id: str | None) -> dict[str, str] that returns the bindings dict, then replace the duplicated block in each wrapper with a call to _build_bindings(request_id, task_id, agent_id) and pass its result into structlog.contextvars.bound_contextvars; keep the existing keys ("request_id", "task_id", "agent_id") and the None checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/observability/sinks.py`:
- Around line 64-73: The PrefixFilter.filter correctly gives exclude precedence,
but build_handler is only passing include_prefixes and never wires up
exclude_prefixes; update build_handler to pass the exclude_prefixes argument
into PrefixFilter (or, if omission was intentional, add a brief comment in
build_handler explaining exclude_prefixes is intentionally unused for future
extensibility). Locate the PrefixFilter.filter implementation and the
build_handler function to either forward the exclude_prefixes list into
PrefixFilter(...) or annotate build_handler so readers know the omission is
intentional.
In `@src/ai_company/providers/drivers/litellm_driver.py`:
- Around line 669-674: The warning for PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED
currently includes an args_preview (self.arguments[:200]) which may leak
sensitive content; update the logger.warning call in litellm_driver (the call
site using logger.warning, PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED, self.name,
self.id, self.arguments) to remove args_preview entirely and only emit metadata
(e.g., tool_name=self.name and tool_id=self.id) or a non-sensitive boolean flag
indicating presence of arguments, ensuring no argument content is included in
the log payload.
In `@src/ai_company/providers/drivers/mappers.py`:
- Around line 153-156: The logger.warning call using
PROVIDER_TOOL_CALL_MISSING_FUNCTION currently logs raw payload via
item=repr(item), which can leak sensitive user/tool arguments; change both
occurrences (the call around PROVIDER_TOOL_CALL_MISSING_FUNCTION and the similar
logger usage in the 198-208 block) to log only structural metadata: replace
repr(item) with a small metadata object like {"type": type(item).__name__,
"length": len(item) if hasattr(item, "__len__") else None, "is_bytes":
isinstance(item, (bytes, bytearray))} and include a clear message via
logger.warning(PROVIDER_TOOL_CALL_MISSING_FUNCTION, item_meta=metadata) so logs
avoid content previews while preserving useful debugging info.
In `@src/ai_company/templates/loader.py`:
- Around line 163-164: The start-event is logged with the raw template name
(logger.debug(TEMPLATE_LOAD_START, template_name=name)) while success/failure
events use the normalized key (name_clean), causing split metrics; change the
start-event to log the normalized key by trimming and lowercasing the name first
(i.e., compute name_clean = name.strip().lower() before calling logger.debug for
TEMPLATE_LOAD_START) so all events use the same template_name value.
In `@tests/integration/observability/conftest.py`:
- Around line 13-32: The _clear_logging_state() helper and _reset_logging()
fixture are duplicated in tests/integration/observability/conftest.py and
tests/unit/observability/conftest.py; extract _clear_logging_state into a single
shared test helper (e.g., tests/conftest.py or tests/observability_helpers.py)
and update both observability conftest files to import and call that shared
_clear_logging_state, keeping the local _reset_logging() fixture as-is but
delegating to the imported helper to remove duplication.
---
Outside diff comments:
In `@src/ai_company/observability/correlation.py`:
- Around line 136-146: The two decorator inner functions (wrapper) duplicate the
logic that constructs the bindings dict before calling
structlog.contextvars.bound_contextvars; extract that logic into a small helper
function named _build_bindings(request_id: str | None, task_id: str | None,
agent_id: str | None) -> dict[str, str] that returns the bindings dict, then
replace the duplicated block in each wrapper with a call to
_build_bindings(request_id, task_id, agent_id) and pass its result into
structlog.contextvars.bound_contextvars; keep the existing keys ("request_id",
"task_id", "agent_id") and the None checks.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (19)
src/ai_company/config/__init__.pysrc/ai_company/config/loader.pysrc/ai_company/core/task_transitions.pysrc/ai_company/observability/correlation.pysrc/ai_company/observability/events.pysrc/ai_company/observability/sinks.pysrc/ai_company/providers/base.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/providers/drivers/mappers.pysrc/ai_company/templates/loader.pysrc/ai_company/templates/renderer.pytests/conftest.pytests/integration/observability/conftest.pytests/unit/config/test_bootstrap_logging.pytests/unit/config/test_loader.pytests/unit/observability/conftest.pytests/unit/providers/test_base_provider.pytests/unit/providers/test_registry.pytests/unit/templates/test_renderer.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses (PEP 758) — ruff enforces this on Python 3.14
Add type hints to all public functions — mypy strict mode enforced
Use Google-style docstrings on all public classes and functions — enforced by ruff D rules
Create new objects instead of mutating existing ones — practice immutability
Use Pydantic v2 withBaseModel,model_validator, andConfigDictfor data models
Enforce 88 character line length — ruff configured
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly — never silently swallow exceptions
Validate at system boundaries — validate user input, external APIs, and config files explicitly
Files:
src/ai_company/observability/correlation.pysrc/ai_company/templates/loader.pysrc/ai_company/providers/base.pytests/unit/providers/test_registry.pytests/integration/observability/conftest.pysrc/ai_company/config/loader.pytests/unit/config/test_loader.pysrc/ai_company/core/task_transitions.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/templates/renderer.pytests/unit/providers/test_base_provider.pytests/conftest.pysrc/ai_company/observability/sinks.pytests/unit/config/test_bootstrap_logging.pytests/unit/observability/conftest.pysrc/ai_company/providers/drivers/mappers.pysrc/ai_company/config/__init__.pytests/unit/templates/test_renderer.pysrc/ai_company/observability/events.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain minimum 80% code coverage — enforced in CI
Files:
src/ai_company/observability/correlation.pysrc/ai_company/templates/loader.pysrc/ai_company/providers/base.pysrc/ai_company/config/loader.pysrc/ai_company/core/task_transitions.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/templates/renderer.pysrc/ai_company/observability/sinks.pysrc/ai_company/providers/drivers/mappers.pysrc/ai_company/config/__init__.pysrc/ai_company/observability/events.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slowmarkers on all tests
Set test timeout to 30 seconds per test
Use parallel test execution withpytest-xdistvia-n autoflag
Files:
tests/unit/providers/test_registry.pytests/integration/observability/conftest.pytests/unit/config/test_loader.pytests/unit/providers/test_base_provider.pytests/conftest.pytests/unit/config/test_bootstrap_logging.pytests/unit/observability/conftest.pytests/unit/templates/test_renderer.py
🧠 Learnings (14)
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All functions and methods should have appropriate logging for debugging and traceability. Use `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.
Applied to files:
src/ai_company/templates/loader.pysrc/ai_company/config/loader.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/templates/renderer.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/**/*.py : All functions and methods should have appropriate logging using `logger.debug()` for routine operations, `logger.info()` for significant events, `logger.warning()` for unexpected but recoverable situations, and `logger.error()` for failures.
Applied to files:
src/ai_company/templates/loader.pysrc/ai_company/config/loader.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/templates/renderer.py
📚 Learning: 2026-03-01T18:03:07.755Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T18:03:07.755Z
Learning: Applies to tests/**/*.py : Set test timeout to 30 seconds per test
Applied to files:
tests/unit/providers/test_registry.pytests/unit/config/test_loader.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`
Applied to files:
tests/integration/observability/conftest.pytests/conftest.pytests/unit/observability/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures
Applied to files:
tests/integration/observability/conftest.pytests/unit/observability/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings
Applied to files:
tests/integration/observability/conftest.pytests/unit/observability/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`
Applied to files:
tests/integration/observability/conftest.pytests/unit/observability/conftest.py
📚 Learning: 2026-03-01T18:03:07.755Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T18:03:07.755Z
Learning: Applies to tests/**/*.py : Use `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow` markers on all tests
Applied to files:
tests/unit/config/test_loader.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Use logging levels: debug (routine), info (significant), warning (unexpected but recoverable), error (failures)
Applied to files:
src/ai_company/templates/renderer.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to **/*.py : Never dismiss warnings from `ruff check`, `pytest`, or other tools. Investigate and fix them, even if they appear unrelated to the current task.
Applied to files:
src/ai_company/templates/renderer.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
src/ai_company/templates/renderer.py
📚 Learning: 2026-03-01T18:03:07.755Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T18:03:07.755Z
Learning: Applies to **/*.py : Do not use `from __future__ import annotations` — Python 3.14 has PEP 649 native lazy annotations
Applied to files:
tests/unit/providers/test_base_provider.pytests/unit/observability/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : NEVER add `from __future__ import annotations` - Python 3.14+ has native support for deferred annotation evaluation and does not require this import
Applied to files:
tests/unit/providers/test_base_provider.pytests/unit/observability/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Applied to files:
tests/unit/observability/conftest.py
🧬 Code graph analysis (11)
src/ai_company/templates/loader.py (3)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/templates/errors.py (3)
TemplateNotFoundError(10-11)TemplateRenderError(14-20)TemplateValidationError(23-59)src/ai_company/templates/schema.py (1)
CompanyTemplate(175-268)
src/ai_company/providers/base.py (4)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)tests/unit/providers/test_base_provider.py (2)
_do_complete(42-61)_do_stream(63-75)tests/unit/providers/test_registry.py (2)
_do_complete(61-69)_do_stream(71-79)tests/unit/providers/test_protocol.py (5)
_do_complete(89-102)_do_complete(133-147)_do_complete(246-251)_do_stream(104-119)_do_stream(149-162)
src/ai_company/config/loader.py (2)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/observability/setup.py (1)
configure_logging(159-195)
tests/unit/config/test_loader.py (3)
tests/unit/config/conftest.py (2)
tmp_config_file(148-154)ConfigFileFactory(24-27)src/ai_company/config/loader.py (2)
_parse_yaml_string(96-147)_validate_config_dict(220-269)src/ai_company/config/errors.py (2)
ConfigParseError(66-67)ConfigValidationError(70-106)
src/ai_company/core/task_transitions.py (2)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/core/enums.py (1)
TaskStatus(122-143)
src/ai_company/providers/drivers/litellm_driver.py (2)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/providers/errors.py (4)
ProviderError(22-69)RateLimitError(78-104)AuthenticationError(72-75)ProviderConnectionError(131-134)
src/ai_company/templates/renderer.py (1)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)
tests/unit/providers/test_base_provider.py (5)
src/ai_company/providers/base.py (2)
complete(51-95)stream(97-136)src/ai_company/providers/capabilities.py (1)
ModelCapabilities(10-83)src/ai_company/providers/models.py (6)
ChatMessage(114-186)CompletionConfig(189-230)CompletionResponse(233-282)StreamChunk(285-363)ToolDefinition(45-70)TokenUsage(12-42)src/ai_company/providers/enums.py (2)
FinishReason(15-22)MessageRole(6-12)src/ai_company/providers/errors.py (1)
InvalidRequestError(113-116)
tests/unit/config/test_bootstrap_logging.py (3)
src/ai_company/config/loader.py (1)
bootstrap_logging(484-499)src/ai_company/config/schema.py (1)
RootConfig(204-328)src/ai_company/observability/config.py (1)
LogConfig(115-192)
src/ai_company/config/__init__.py (1)
src/ai_company/config/loader.py (1)
bootstrap_logging(484-499)
tests/unit/templates/test_renderer.py (2)
src/ai_company/templates/loader.py (1)
load_template(151-207)src/ai_company/templates/renderer.py (1)
render_template(49-104)
🔇 Additional comments (42)
tests/conftest.py (1)
6-28: LGTM — effective solution for structlog proxy caching in tests.The monkey-patch approach using a default argument to capture the original
structlog.configureis a clean pattern. The docstring clearly explains the caching problem and solution.Minor: The
objecttype hints are overly broad. Consider usingCallable[..., None]for_originalto avoid thetype: ignorecomment, though this is purely cosmetic given the isolated test-infrastructure scope.src/ai_company/core/task_transitions.py (3)
18-24: LGTM — proper observability integration.Clean integration of structured logging with appropriate event constants. Module-level logger initialization follows the project pattern.
55-58: Good defensive validation at import time.This ensures
VALID_TRANSITIONSstays synchronized withTaskStatusenum members. Any missing entry will fail fast at module import rather than at runtime.
73-89: Appropriate log levels for transition errors.
logger.criticalfor configuration errors (internal defect) is correct.logger.warningfor invalid transitions (caller error) is appropriate.Both log before raising, ensuring events are captured even if exceptions are swallowed upstream.
src/ai_company/observability/correlation.py (1)
153-210: LGTM — correct async correlation implementation.The decorator properly validates async functions, binds IDs using
bound_contextvarsfor automatic cleanup, and mirrors the sync version's behavior. Type annotations correctly useCoroutine[object, object, _T].tests/unit/observability/conftest.py (1)
72-84: Well-implemented fixture with clear documentation.The type annotation
list[MutableMapping[str, Any]]accurately reflects structlog's capture format, and the docstring provides a helpful usage example.src/ai_company/observability/sinks.py (3)
28-34: Good use ofMappingProxyTypefor immutability.The routing configuration is correctly protected against accidental mutation, aligning with coding guidelines.
50-62: Solid input validation for filter prefixes.The validation prevents empty or whitespace-only prefixes that would cause unexpected matching behavior.
171-176: LGTM — defensive check before attaching filter.Correctly guards against
Nonefile_path and only applies routing for configured sinks.src/ai_company/observability/events.py (1)
1-81: LGTM — well-organized event constants with consistent naming.The
Final[str]typing prevents accidental reassignment, and thedomain.noun.verbconvention is consistently applied. The section comments improve navigability.tests/unit/templates/test_renderer.py (2)
23-23: Good — module-level timeout marker covers all tests.The
pytestmark = pytest.mark.timeout(30)correctly applies the 30-second timeout to all tests in the module, per coding guidelines.
203-212: LGTM — focused logging verification test.The test correctly verifies that
render_templateemits exactly oneTEMPLATE_RENDER_STARTand oneTEMPLATE_RENDER_SUCCESSevent. The assertions are appropriately scoped.Consider using the
captured_logsfixture fromtests/unit/observability/conftest.pyfor consistency with the broader test infrastructure.src/ai_company/providers/base.py (4)
13-32: LGTM!The observability integration is well-implemented. The module-level logger is correctly initialized using
get_logger(__name__), and the event constants are properly imported from the centralized events module.
76-95: LGTM!The logging around
complete()follows the expected pattern: DEBUG for start/success events, ERROR withexc_info=Truefor failures. The try/except properly re-raises exceptions after logging.
122-136: Stream error handling only covers iterator creation, not iteration.The try/except wraps only the
await self._do_stream(...)call, which handles errors during iterator creation. However, errors that occur during async iteration (when the caller consumes chunks) won't be logged here. This appears intentional given the docstring states implementations must "return an AsyncIterator", but worth confirming this is the desired behavior.If you want errors during iteration to also be logged, consider wrapping the returned iterator. Otherwise, document that iteration errors are the caller's responsibility.
284-285: Static method uses module-level logger.
_validate_messagesis a@staticmethodbut uses the module-levellogger. This works correctly, but note that if this class is ever subclassed and someone expects per-instance logger customization, it won't apply here. The current design is acceptable for this use case.src/ai_company/templates/renderer.py (7)
22-46: LGTM!The observability imports and logger initialization follow the established pattern. Event constants are properly imported from the centralized module.
77-104: LGTM!INFO-level logging for render start and success events is appropriate for significant operations. The structured fields (
source_name) enable effective log filtering and tracing.
135-140: LGTM!Using
logger.errorfor missing required variables is appropriate since this is a terminal failure that raises an exception. Based on learnings: "Uselogger.error()for failures."
192-196: LGTM — usinglogger.exceptionis appropriate here.The code now uses
logger.exceptionwhich logs at ERROR level and automatically includes the exception traceback. This addresses the concern from the past review about using the correct severity for terminal failures.
223-227: LGTM —logger.exceptioncorrectly used for YAML parse errors.Consistent with the Jinja2 error handling pattern.
371-376: LGTM!Using
logger.warningfor unknown personality presets is appropriate — this is recoverable (the agent is still created, just without the preset personality), which aligns with the guideline: "Uselogger.warning()for unexpected but recoverable situations."
448-452: LGTM —logger.exceptioncorrectly used for validation errors.Consistent error logging pattern for terminal failures.
tests/unit/providers/test_registry.py (2)
6-12: LGTM!Imports for structlog and observability events are correctly added to support the new logging tests.
291-314: LGTM!The logging tests are well-structured with proper markers (
@pytest.mark.unitand@pytest.mark.timeout(30)). The tests correctly verify that:
PROVIDER_REGISTRY_BUILTis emitted withprovider_countwhen building from configPROVIDER_DRIVER_NOT_REGISTEREDis emitted with the missing providernamewhen lookup failsThis addresses the past review comment about adding the timeout marker.
tests/unit/providers/test_base_provider.py (3)
1-36: LGTM!The test module is well-structured with:
- Proper module docstring
- Module-level
pytestmark = pytest.mark.timeout(30)ensuring all tests have the required timeout- Clean imports following the pattern (no
from __future__ import annotations)
39-86: LGTM!The
_StubProvideris a clean minimal implementation for testing, and the_msghelper provides a convenient way to create test messages. The stub correctly implements all required abstract methods.
89-127: LGTM!The test class comprehensively covers the logging behavior:
test_complete_emits_call_start_and_success: Verifies happy path loggingtest_stream_emits_stream_start: Verifies stream-specific eventtest_empty_messages_emits_error: Verifies error logging for validation failuretest_blank_model_emits_error: Verifies error logging for invalid modelAll tests properly use
structlog.testing.capture_logs()and have the required@pytest.mark.unitmarker.src/ai_company/config/loader.py (4)
20-33: LGTM!The observability imports and logger initialization follow the established pattern. All event constants are properly imported from the centralized module.
177-180: LGTM — Event constant now used.This addresses the past review comment. The hardcoded string has been replaced with
CONFIG_YAML_NON_SCALAR_KEYconstant.
208-211: LGTM — Event constant now used.This addresses the past review comment (marked as addressed in commit 1a304cb). The hardcoded string has been replaced with
CONFIG_LINE_MAP_COMPOSE_FAILEDconstant.
484-499: LGTM!The
bootstrap_loggingfunction is well-implemented:
- Proper type hints and Google-style docstring
- Lazy import of
configure_loggingavoids circular imports at module load time- Correctly extracts
config.loggingwhen config is provided, or passesNonefor defaults- Clear documentation about when to call it (once at startup after
load_config)tests/unit/config/test_bootstrap_logging.py (1)
1-41: LGTM!The test module is comprehensive and well-structured:
- Module-level timeout marker ensures all tests have the required 30s limit
test_calls_configure_logging_with_none: VerifiesNoneis correctly forwardedtest_calls_configure_logging_with_config_logging: Verifiesconfig.loggingextractiontest_forwards_each_call_to_configure_logging: Renamed fromtest_is_idempotentper past review feedback, now accurately describes the behavior being testedThe mock target
"ai_company.observability.configure_logging"correctly patches at the import location.src/ai_company/config/__init__.py (3)
9-9: LGTM!The
bootstrap_loggingfunction is correctly added to the autosummary documentation.
33-33: LGTM!Import added correctly.
59-59: LGTM!
bootstrap_loggingis correctly added to__all__, maintaining alphabetical order.tests/unit/config/test_loader.py (3)
7-7: LGTM!The structlog import is correctly added to support the new logging tests.
29-33: LGTM!The observability event constants are correctly imported for use in test assertions.
666-698: LGTM!The logging tests are well-structured with proper markers (
@pytest.mark.unitand@pytest.mark.timeout(30)). This addresses the past review comment about adding the timeout marker.The tests comprehensively verify:
CONFIG_LOADEDevent on successful load withconfig_pathCONFIG_PARSE_FAILEDevent on YAML syntax error withsourceCONFIG_VALIDATION_FAILEDevent on validation failure witherror_countsrc/ai_company/templates/loader.py (1)
23-31: Structured template event wiring is consistent.The migration to
get_logger(__name__)plus centralizedTEMPLATE_*event constants is clean and keeps template observability stable across warning/error/debug paths.Also applies to: 39-39, 116-119, 135-137, 180-185, 190-195, 198-202, 435-437
src/ai_company/providers/drivers/mappers.py (1)
12-18: Observability migration in mapper paths looks good.Event-based warnings and module-level
get_logger(__name__)are applied consistently on non-happy paths.Also applies to: 22-22, 124-127, 164-168
src/ai_company/providers/drivers/litellm_driver.py (1)
43-57: Provider observability hooks are well integrated.The new structured events in model resolution, stream completion, exception mapping, and LiteLLM model-info fallbacks are coherent and actionable.
Also applies to: 85-85, 268-273, 387-391, 475-496, 548-557
| logger.warning( | ||
| PROVIDER_TOOL_CALL_MISSING_FUNCTION, | ||
| item=repr(item), | ||
| ) |
There was a problem hiding this comment.
Avoid logging raw tool-call payload fragments.
These fields can contain user/tool argument content and leak sensitive data into logs. Prefer structural metadata (type/length) over payload previews.
🛡️ Suggested patch
if func is None:
logger.warning(
PROVIDER_TOOL_CALL_MISSING_FUNCTION,
- item=repr(item),
+ item_type=type(item).__name__,
)
continue
@@
except json.JSONDecodeError, ValueError:
logger.warning(
PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED,
- args_preview=raw[:200],
+ args_len=len(raw),
)
return {}
@@
logger.warning(
PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED,
- args_preview=raw[:200],
+ args_len=len(raw),
parsed_type=type(parsed).__name__,
)Also applies to: 198-208
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/providers/drivers/mappers.py` around lines 153 - 156, The
logger.warning call using PROVIDER_TOOL_CALL_MISSING_FUNCTION currently logs raw
payload via item=repr(item), which can leak sensitive user/tool arguments;
change both occurrences (the call around PROVIDER_TOOL_CALL_MISSING_FUNCTION and
the similar logger usage in the 198-208 block) to log only structural metadata:
replace repr(item) with a small metadata object like {"type":
type(item).__name__, "length": len(item) if hasattr(item, "__len__") else None,
"is_bytes": isinstance(item, (bytes, bytearray))} and include a clear message
via logger.warning(PROVIDER_TOOL_CALL_MISSING_FUNCTION, item_meta=metadata) so
logs avoid content previews while preserving useful debugging info.
- Remove sensitive args_preview from tool-call warning logs (mappers.py, litellm_driver.py) — log args_length/item_type metadata instead - Use normalized name_clean in TEMPLATE_LOAD_START event (loader.py) - Consolidate duplicated _clear_logging_state into shared tests/conftest.py::clear_logging_state
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 32 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (1)
src/ai_company/observability/sinks.py:16
Pathis imported only underTYPE_CHECKING, but it’s referenced in runtime annotations (log_dir: Pathin_build_file_handler/build_handler). With Python 3.14 (PEP 649) in this repo, annotations may be evaluated at runtime, soPathshould be available at runtime to avoidNameErrorduring annotation introspection. Prefer importingPathunconditionally (optionally with the project’s# noqa: TC00xconvention) instead of gating it behindTYPE_CHECKING.
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from pathlib import Path
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except TemplateRenderError, TemplateValidationError, OSError: | ||
| logger.exception( | ||
| "Built-in template %r is invalid (packaging defect)", | ||
| name, | ||
| TEMPLATE_BUILTIN_DEFECT, | ||
| template_name=name, | ||
| ) |
There was a problem hiding this comment.
The except clause uses Python 2-style comma syntax (except TemplateRenderError, TemplateValidationError, OSError:), which is a SyntaxError in Python 3. Wrap the exceptions in parentheses (tuple) and bind the exception as as exc if you need it, consistent with the earlier user-template block.
| try: | ||
| parsed = json.loads(raw) | ||
| except json.JSONDecodeError, ValueError: | ||
| _logger.warning( | ||
| "Failed to parse tool call arguments: %r", | ||
| raw[:200], | ||
| logger.warning( | ||
| PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED, | ||
| args_length=len(raw), | ||
| ) | ||
| return {} |
There was a problem hiding this comment.
This except clause uses invalid comma syntax (except json.JSONDecodeError, ValueError:), which will raise a SyntaxError in Python 3 and prevent the module from importing. Use except (json.JSONDecodeError, ValueError): instead.
| try: | ||
| return float(raw) | ||
| except ValueError, TypeError: | ||
| _logger.debug( | ||
| "Could not parse retry-after header as seconds: %r", | ||
| raw, | ||
| logger.debug( | ||
| PROVIDER_RETRY_AFTER_PARSE_FAILED, | ||
| raw_value=repr(raw), | ||
| ) | ||
| return None |
There was a problem hiding this comment.
This except clause uses invalid comma syntax (except ValueError, TypeError:), which is a SyntaxError in Python 3. Use except (ValueError, TypeError): (or separate except blocks) so this module can import.
| try: | ||
| raw = _litellm.get_model_info(model=litellm_model) | ||
| info: dict[str, Any] = dict(raw) if raw else {} | ||
| except KeyError, ValueError: | ||
| _logger.info( | ||
| "No LiteLLM metadata for model %r, using config defaults", | ||
| litellm_model, | ||
| logger.info( | ||
| PROVIDER_MODEL_INFO_UNAVAILABLE, | ||
| model=litellm_model, | ||
| ) | ||
| return {} |
There was a problem hiding this comment.
This except clause uses invalid comma syntax (except KeyError, ValueError:), which is a SyntaxError in Python 3. Use except (KeyError, ValueError): so exception handling works and the module remains importable.
| try: | ||
| parsed = json.loads(self.arguments) if self.arguments else {} | ||
| except json.JSONDecodeError, ValueError: | ||
| _logger.warning( | ||
| "Failed to parse tool call arguments for tool %r (id=%r): %r", | ||
| self.name, | ||
| self.id, | ||
| self.arguments[:200] if self.arguments else "", | ||
| logger.warning( | ||
| PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED, | ||
| tool_name=self.name, | ||
| tool_id=self.id, | ||
| args_length=len(self.arguments) if self.arguments else 0, | ||
| ) | ||
| parsed = {} |
There was a problem hiding this comment.
This except clause uses invalid comma syntax (except json.JSONDecodeError, ValueError:), which is a SyntaxError in Python 3. Change it to except (json.JSONDecodeError, ValueError): so streamed tool-call argument parsing failures are handled without breaking imports.
| from typing import TYPE_CHECKING, Self | ||
|
|
||
| from ai_company.observability import get_logger | ||
| from ai_company.observability.events import ( | ||
| PROVIDER_DRIVER_FACTORY_MISSING, | ||
| PROVIDER_DRIVER_INSTANTIATED, | ||
| PROVIDER_DRIVER_NOT_REGISTERED, | ||
| PROVIDER_REGISTRY_BUILT, | ||
| ) | ||
|
|
||
| from .base import BaseCompletionProvider | ||
| from .errors import ( | ||
| DriverFactoryNotFoundError, | ||
| DriverNotRegisteredError, | ||
| ) | ||
|
|
||
| logger = get_logger(__name__) | ||
|
|
||
| if TYPE_CHECKING: | ||
| from ai_company.config.schema import ProviderConfig | ||
|
|
There was a problem hiding this comment.
ProviderConfig is imported only under TYPE_CHECKING, but it’s referenced in runtime annotations (from_config() and _build_driver() signatures). With this repo targeting Python 3.14 (PEP 649 lazy annotations), these annotation expressions may be evaluated at runtime (e.g., via typing.get_type_hints), which would raise NameError if ProviderConfig isn’t importable at runtime. Prefer a runtime import using the project’s # noqa: TC00x pattern instead of if TYPE_CHECKING: for types used in annotations.
| try: | ||
| return float(value) | ||
| except TypeError, ValueError: | ||
| logger.debug( | ||
| "Cannot convert %r to float in Pass 1 " | ||
| "(may be a Jinja2 placeholder), using 0.0", | ||
| value, | ||
| TEMPLATE_PASS1_FLOAT_FALLBACK, | ||
| value=repr(value), | ||
| ) | ||
| return 0.0 |
There was a problem hiding this comment.
This except clause uses invalid comma syntax (except TypeError, ValueError:), which is a SyntaxError in Python 3. Use except (TypeError, ValueError): so the float-coercion fallback works without breaking imports.
| from typing import TYPE_CHECKING, ParamSpec, TypeVar | ||
|
|
||
| import structlog | ||
|
|
||
| if TYPE_CHECKING: | ||
| from collections.abc import Callable | ||
| from collections.abc import Callable, Coroutine | ||
|
|
There was a problem hiding this comment.
Callable/Coroutine are only imported under TYPE_CHECKING, but they’re referenced in runtime annotations below (e.g., return types for with_correlation*). In this repo (Python 3.14 / PEP 649), annotations can be evaluated at runtime, so these names should be importable at runtime (similar to from collections.abc import AsyncIterator # noqa: TC003 in providers/base.py). Import them unconditionally (optionally with the project’s # noqa: TC00x pattern) to avoid NameError when annotations are introspected.
🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
Closes #91
observability/events.pywith ~30 structured event constants followingdomain.noun.verbpatternwith_correlation_async()decorator mirroring the sync version for async functions_LoggerNameFilterand_SINK_ROUTINGdict routing security→audit.log, budget+providers→cost_usage.log, engine+core→agent_activity.loglogging.getLoggerto structlogget_logger(config/loader, templates/loader, templates/renderer, litellm_driver, mappers)bootstrap_logging()function to wireconfigure_logging()into the startup pathcache_logger_on_first_useduring tests## Loggingsection to CLAUDE.md with mandatory conventions, added logging-audit agent to PR review skillVerification
logging.getLoggerin app code (only inobservability/setup.py)print()calls in app codeTest plan
uv run ruff check src/ tests/— zero errorsuv run ruff format --check src/ tests/— all formatteduv run mypy src/ tests/— zero errorsuv run pytest tests/ -m "unit or integration" --timeout=30— 1380 passedrg "logging\.getLogger" src/ai_company/ --glob '!observability/setup.py'— 0 matches