-
Notifications
You must be signed in to change notification settings - Fork 0
Wire all modules into observability system and harden logging infrastructure #91
Description
Context
Issue #72 (PR #73) built a comprehensive observability system in src/ai_company/observability/ — structlog integration, 7-sink layout, correlation tracking, sensitive field sanitization, rotation, and per-logger level configuration. The system is well-tested (65+ unit tests, 80%+ coverage) and architecturally sound.
However, nothing actually uses it.
The get_logger(__name__) API was designed as the single entry point for all application logging. In practice:
- 0 modules use
get_loggerfrom observability - 5 modules use raw
logging.getLogger(__name__)(stdlib), bypassing structlog, correlation IDs, and field sanitization - ~57 modules have no logging at all — including critical business logic (agent lifecycle, task state, config loading, budget tracking, provider calls)
- CLAUDE.md has no enforcement rules — issue Implement enterprise logging system with comprehensive configuration #72 specified a post-merge deliverable to add mandatory logging rules, but this was never done
Additionally, PR #73 left a TODO(#59): Integrate LogConfig with central config/ YAML loading. Issue #59 (YAML config loader) has since been merged. The structural wiring exists — RootConfig.logging: LogConfig | None is in config/schema.py — but configure_logging() is never actually called from the config loader pipeline. So YAML config can hold logging settings, but nothing activates them at startup.
This issue rectifies all of that — but starts with a deep investigation phase before making any changes.
Part 0: Deep investigation — validate the current approach (MUST DO FIRST)
Before wiring anything in, critically evaluate whether our current observability stack is the right foundation to build on. Issue #72 made technology decisions (structlog, 7-sink file layout, custom correlation tracking) that were never stress-tested against real usage. This phase is a safety gate.
0a. Evaluate structlog as the logging backbone
- Is structlog still the best choice for structured logging in Python 3.14+? Compare against alternatives:
- stdlib logging + python-json-logger — simpler, no extra dependency, native async support?
- loguru — batteries-included, less boilerplate, but less configurable?
- structlog (current) — powerful processors pipeline, but adds complexity
- OpenTelemetry Logs SDK — if we plan OTel for M7+, should we just use their logging SDK now?
- Evaluate: does structlog's processor pipeline actually buy us anything we couldn't achieve with stdlib + a formatter?
- Check: are there known structlog issues with Python 3.14 / PEP 649?
0b. Evaluate the 7-sink architecture
- Is file-based sink routing (7 separate log files) the right approach for this project?
- Consider alternatives:
- Single structured log file + post-hoc filtering (simpler ops, grep/jq friendly)
- stdout/stderr only (12-factor app style, let the deployment platform handle routing)
- Hybrid: structured stdout for dev, sink routing for production
- For each sink, ask: will this actually be consumed by a human or tool? If not, it's dead weight.
- Evaluate rotation config: are the defaults sensible? Are they tested?
0c. Evaluate correlation tracking
- Is our custom ContextVar-based correlation tracking sufficient, or should we adopt OpenTelemetry's trace_id/span_id from the start?
- If OTel is planned for M7+, is it cheaper to adopt opentelemetry-api (zero-cost if no SDK installed) now rather than building custom and migrating later?
- Check: does the async correlation decorator (with_correlation_async) need to integrate with Python's TaskGroup / asyncio.gather for proper context propagation?
0d. Evaluate sensitive field sanitization
- Is the current sanitization approach (regex on field names) robust enough?
- Are there edge cases it misses (nested dicts, Pydantic model dumps, binary data)?
- Should we use an allowlist instead of a blocklist for fields that appear in logs?
0e. Produce a recommendation document
Write findings to a brief decision record (either in the PR description or a comment on this issue):
- Keep as-is: list what's validated and ready
- Change before wiring: list any changes needed before we commit to wiring 57+ modules into this system
- Defer: list anything that should wait for a later milestone
Only proceed to Parts 1-5 after the investigation is reviewed and approved.
Part 1: Migrate existing stdlib loggers to get_logger
Replace logging.getLogger(__name__) with get_logger(__name__) in these 5 files:
| File | Current | Impact |
|---|---|---|
config/loader.py |
logging.getLogger |
Bypasses structlog pipeline |
templates/loader.py |
logging.getLogger |
No correlation tracking |
templates/renderer.py |
logging.getLogger |
No field sanitization |
providers/drivers/litellm_driver.py |
logging.getLogger |
LLM calls not in structured format |
providers/drivers/mappers.py |
logging.getLogger |
Warnings not captured by sinks |
Each migration:
- Replace
import loggingwithfrom ai_company.observability import get_logger - Replace
logging.getLogger(__name__)withget_logger(__name__) - Normalize variable naming to
logger(the convention per issue Implement enterprise logging system with comprehensive configuration #72) - Verify existing log calls still work (structlog is API-compatible with stdlib)
Part 1b: Wire configure_logging() into config loader pipeline
PR #73's TODO(#59) — the config loader must call configure_logging(log_config) after loading RootConfig so that YAML-defined log settings (levels, sinks, rotation) are actually activated at startup. Currently the LogConfig is parsed and stored but never consumed.
- In
config/loader.py, afterRootConfigis validated, callconfigure_logging(config.logging)ifconfig.loggingis notNone - Add a sensible default so logging works even without explicit YAML config (e.g. INFO to console)
- Ensure
configure_logging()is idempotent (safe to call multiple times in tests)
Part 2: Instrument all unlogged modules
Add structured logging to every module that performs work worth observing. Follow the principle from DESIGN_SPEC Section 1.2: "Every agent action, communication, and decision is logged and visible."
Modules that MUST have logging added
Core domain (ai_company.core):
agent.py— Agent creation, status changes, validationcompany.py— Company/department structure changesrole.py— Role assignment, authority validationrole_catalog.py— Catalog lookups, missing rolestask.py— Task creation, status transitionstask_transitions.py— State machine transitions (critical for debugging)project.py— Project lifecycle events
Budget (ai_company.budget):
hierarchy.py— Budget allocation changes, limit checksspending_summary.py— Spending aggregation eventscost_record.py— Cost record creation
Communication (ai_company.communication):
channel.py— Channel operationsmessage.py— Message creation/validation
Config (ai_company.config):
utils.py— Env var substitution (log what was substituted, without values)errors.py— Config validation error details
Providers (ai_company.providers):
base.py— Input validation, hook delegationregistry.py— Provider registration, lookups, factory resolutionerrors.py— Error creation with context
Templates (ai_company.templates):
schema.py— Template validationerrors.py— Template error details
Logging level guidelines
| Event type | Level | Examples |
|---|---|---|
| Object creation | DEBUG |
Agent created, task instantiated |
| State transitions | INFO |
Task status change, agent status change |
| Validation failures | WARNING |
Invalid config, unknown model |
| Error recovery | WARNING |
Fallback used, retry triggered |
| Unrecoverable errors | ERROR |
Provider failure, config load failure |
| Security events | INFO via audit logger |
Permission checks, approval gates |
Part 3: Harden and extend the observability system
3a. Implement logger-to-sink routing (TODO in sinks.py)
Currently all sinks receive all log events. The audit sink should only receive security-related logs, cost sink should only receive cost events, etc. Implement logger name filters:
ai_company.security.*routes toaudit.logai_company.budget.*+ai_company.providers.*(cost events) routes tocost_usage.logai_company.engine.*+ai_company.core.*(agent/task events) routes toagent_activity.log
3b. Implement async correlation decorator (TODO in correlation.py)
Add with_correlation_async() for async functions — needed by the engine and API layers in M3/M6. The sync with_correlation() context manager exists but async equivalents are missing.
3c. Add structured log event constants
Define standard event names as constants to prevent typos and enable grep-ability:
# ai_company/observability/events.py
TASK_CREATED = "task.created"
TASK_STATUS_CHANGED = "task.status_changed"
AGENT_CREATED = "agent.created"
PROVIDER_CALL_START = "provider.call.start"
PROVIDER_CALL_COMPLETE = "provider.call.complete"
CONFIG_LOADED = "config.loaded"3d. Add log testing utilities
Add a pytest fixture that captures structured log output for asserting on log fields (not just message text). Issue #72 specified this but it wasn't implemented:
@pytest.fixture
def captured_logs():
"""Capture structlog output for test assertions."""
...3e. Future-proof: add OpenTelemetry-ready span hooks
Add optional span context injection so when OpenTelemetry is added later (M7+), existing log entries automatically include trace/span IDs. This is just a processor stub — no OpenTelemetry dependency needed now.
Part 4: Enforce in CLAUDE.md and PR review pipeline
4a. Add to CLAUDE.md
Add a ## Logging section with rules:
- Every module must use:
from ai_company.observability import get_loggerthenlogger = get_logger(__name__) - Never use
print(),logging.getLogger(), or rawlogging— always useai_company.observability - All error paths must log at WARNING or ERROR with context
- All state transitions must log at INFO
- Debug-level logs for object creation and internal flow
4b. Add logging-audit agent to PR review skill
Add a new sub-agent to .claude/skills/aurelio-review-pr/skill.md that:
- Checks all files touched by the PR for
import logging(stdlib) — flag as violation - Checks all new/modified functions for appropriate log statements
- Verifies
get_logger(__name__)is present in every module with business logic
Part 5: Test everything
- Unit tests for all new log statements (assert on structured fields)
- Integration test: configure logging -> execute a workflow -> verify all sinks received correct events
- Test logger-to-sink routing filters
- Test async correlation decorator
- Maintain 80%+ coverage
Acceptance Criteria
- Part 0 investigation complete — recommendation document reviewed and approved before proceeding
-
configure_logging()called from config loader pipeline (resolvesTODO(#59)) - All 5 stdlib logger modules migrated to
get_logger - All ~57 unlogged modules instrumented with appropriate log statements
- Logger-to-sink routing implemented (audit, cost, agent activity)
- Async correlation decorator implemented
- Structured event constants defined
- Log testing fixture available for pytest
- CLAUDE.md updated with mandatory logging rules
- PR review skill updated with logging-audit agent
- All new logging tested (unit + integration)
- 80%+ coverage maintained
- Zero uses of raw
logging.getLoggerin application code - Zero uses of
print()in application code (except observability setup fallback)
Dependencies
- Issue Implement enterprise logging system with comprehensive configuration #72 (PR feat: implement enterprise logging system with structlog #73) — observability system (merged)
- Should be done before or alongside Implement per-call cost tracking and usage logging #7 (cost tracking) and Implement retry logic, rate limiting, and provider error handling #9 (retry logic), since both need proper logging
References
- DESIGN_SPEC Section 1.2: Observable design principle
- DESIGN_SPEC Section 5.5: Loop detection logging
- DESIGN_SPEC Section 12.3: Security audit logging
- DESIGN_SPEC Section 13.2: API middleware logging
- DESIGN_SPEC Section 15.3: audit.py module
- Issue Implement enterprise logging system with comprehensive configuration #72: Original logging system spec (deliverables partially incomplete)
- PR feat: implement enterprise logging system with structlog #73: Observability system implementation