Skip to content

Wire all modules into observability system and harden logging infrastructure #91

@Aureliolo

Description

@Aureliolo

Context

Issue #72 (PR #73) built a comprehensive observability system in src/ai_company/observability/ — structlog integration, 7-sink layout, correlation tracking, sensitive field sanitization, rotation, and per-logger level configuration. The system is well-tested (65+ unit tests, 80%+ coverage) and architecturally sound.

However, nothing actually uses it.

The get_logger(__name__) API was designed as the single entry point for all application logging. In practice:

  • 0 modules use get_logger from observability
  • 5 modules use raw logging.getLogger(__name__) (stdlib), bypassing structlog, correlation IDs, and field sanitization
  • ~57 modules have no logging at all — including critical business logic (agent lifecycle, task state, config loading, budget tracking, provider calls)
  • CLAUDE.md has no enforcement rules — issue Implement enterprise logging system with comprehensive configuration #72 specified a post-merge deliverable to add mandatory logging rules, but this was never done

Additionally, PR #73 left a TODO(#59): Integrate LogConfig with central config/ YAML loading. Issue #59 (YAML config loader) has since been merged. The structural wiring exists — RootConfig.logging: LogConfig | None is in config/schema.py — but configure_logging() is never actually called from the config loader pipeline. So YAML config can hold logging settings, but nothing activates them at startup.

This issue rectifies all of that — but starts with a deep investigation phase before making any changes.

Part 0: Deep investigation — validate the current approach (MUST DO FIRST)

Before wiring anything in, critically evaluate whether our current observability stack is the right foundation to build on. Issue #72 made technology decisions (structlog, 7-sink file layout, custom correlation tracking) that were never stress-tested against real usage. This phase is a safety gate.

0a. Evaluate structlog as the logging backbone

  • Is structlog still the best choice for structured logging in Python 3.14+? Compare against alternatives:
    • stdlib logging + python-json-logger — simpler, no extra dependency, native async support?
    • loguru — batteries-included, less boilerplate, but less configurable?
    • structlog (current) — powerful processors pipeline, but adds complexity
    • OpenTelemetry Logs SDK — if we plan OTel for M7+, should we just use their logging SDK now?
  • Evaluate: does structlog's processor pipeline actually buy us anything we couldn't achieve with stdlib + a formatter?
  • Check: are there known structlog issues with Python 3.14 / PEP 649?

0b. Evaluate the 7-sink architecture

  • Is file-based sink routing (7 separate log files) the right approach for this project?
  • Consider alternatives:
    • Single structured log file + post-hoc filtering (simpler ops, grep/jq friendly)
    • stdout/stderr only (12-factor app style, let the deployment platform handle routing)
    • Hybrid: structured stdout for dev, sink routing for production
  • For each sink, ask: will this actually be consumed by a human or tool? If not, it's dead weight.
  • Evaluate rotation config: are the defaults sensible? Are they tested?

0c. Evaluate correlation tracking

  • Is our custom ContextVar-based correlation tracking sufficient, or should we adopt OpenTelemetry's trace_id/span_id from the start?
  • If OTel is planned for M7+, is it cheaper to adopt opentelemetry-api (zero-cost if no SDK installed) now rather than building custom and migrating later?
  • Check: does the async correlation decorator (with_correlation_async) need to integrate with Python's TaskGroup / asyncio.gather for proper context propagation?

0d. Evaluate sensitive field sanitization

  • Is the current sanitization approach (regex on field names) robust enough?
  • Are there edge cases it misses (nested dicts, Pydantic model dumps, binary data)?
  • Should we use an allowlist instead of a blocklist for fields that appear in logs?

0e. Produce a recommendation document

Write findings to a brief decision record (either in the PR description or a comment on this issue):

  • Keep as-is: list what's validated and ready
  • Change before wiring: list any changes needed before we commit to wiring 57+ modules into this system
  • Defer: list anything that should wait for a later milestone

Only proceed to Parts 1-5 after the investigation is reviewed and approved.

Part 1: Migrate existing stdlib loggers to get_logger

Replace logging.getLogger(__name__) with get_logger(__name__) in these 5 files:

File Current Impact
config/loader.py logging.getLogger Bypasses structlog pipeline
templates/loader.py logging.getLogger No correlation tracking
templates/renderer.py logging.getLogger No field sanitization
providers/drivers/litellm_driver.py logging.getLogger LLM calls not in structured format
providers/drivers/mappers.py logging.getLogger Warnings not captured by sinks

Each migration:

  1. Replace import logging with from ai_company.observability import get_logger
  2. Replace logging.getLogger(__name__) with get_logger(__name__)
  3. Normalize variable naming to logger (the convention per issue Implement enterprise logging system with comprehensive configuration #72)
  4. Verify existing log calls still work (structlog is API-compatible with stdlib)

Part 1b: Wire configure_logging() into config loader pipeline

PR #73's TODO(#59) — the config loader must call configure_logging(log_config) after loading RootConfig so that YAML-defined log settings (levels, sinks, rotation) are actually activated at startup. Currently the LogConfig is parsed and stored but never consumed.

  1. In config/loader.py, after RootConfig is validated, call configure_logging(config.logging) if config.logging is not None
  2. Add a sensible default so logging works even without explicit YAML config (e.g. INFO to console)
  3. Ensure configure_logging() is idempotent (safe to call multiple times in tests)

Part 2: Instrument all unlogged modules

Add structured logging to every module that performs work worth observing. Follow the principle from DESIGN_SPEC Section 1.2: "Every agent action, communication, and decision is logged and visible."

Modules that MUST have logging added

Core domain (ai_company.core):

  • agent.py — Agent creation, status changes, validation
  • company.py — Company/department structure changes
  • role.py — Role assignment, authority validation
  • role_catalog.py — Catalog lookups, missing roles
  • task.py — Task creation, status transitions
  • task_transitions.py — State machine transitions (critical for debugging)
  • project.py — Project lifecycle events

Budget (ai_company.budget):

  • hierarchy.py — Budget allocation changes, limit checks
  • spending_summary.py — Spending aggregation events
  • cost_record.py — Cost record creation

Communication (ai_company.communication):

  • channel.py — Channel operations
  • message.py — Message creation/validation

Config (ai_company.config):

  • utils.py — Env var substitution (log what was substituted, without values)
  • errors.py — Config validation error details

Providers (ai_company.providers):

  • base.py — Input validation, hook delegation
  • registry.py — Provider registration, lookups, factory resolution
  • errors.py — Error creation with context

Templates (ai_company.templates):

  • schema.py — Template validation
  • errors.py — Template error details

Logging level guidelines

Event type Level Examples
Object creation DEBUG Agent created, task instantiated
State transitions INFO Task status change, agent status change
Validation failures WARNING Invalid config, unknown model
Error recovery WARNING Fallback used, retry triggered
Unrecoverable errors ERROR Provider failure, config load failure
Security events INFO via audit logger Permission checks, approval gates

Part 3: Harden and extend the observability system

3a. Implement logger-to-sink routing (TODO in sinks.py)

Currently all sinks receive all log events. The audit sink should only receive security-related logs, cost sink should only receive cost events, etc. Implement logger name filters:

  • ai_company.security.* routes to audit.log
  • ai_company.budget.* + ai_company.providers.* (cost events) routes to cost_usage.log
  • ai_company.engine.* + ai_company.core.* (agent/task events) routes to agent_activity.log

3b. Implement async correlation decorator (TODO in correlation.py)

Add with_correlation_async() for async functions — needed by the engine and API layers in M3/M6. The sync with_correlation() context manager exists but async equivalents are missing.

3c. Add structured log event constants

Define standard event names as constants to prevent typos and enable grep-ability:

# ai_company/observability/events.py
TASK_CREATED = "task.created"
TASK_STATUS_CHANGED = "task.status_changed"
AGENT_CREATED = "agent.created"
PROVIDER_CALL_START = "provider.call.start"
PROVIDER_CALL_COMPLETE = "provider.call.complete"
CONFIG_LOADED = "config.loaded"

3d. Add log testing utilities

Add a pytest fixture that captures structured log output for asserting on log fields (not just message text). Issue #72 specified this but it wasn't implemented:

@pytest.fixture
def captured_logs():
    """Capture structlog output for test assertions."""
    ...

3e. Future-proof: add OpenTelemetry-ready span hooks

Add optional span context injection so when OpenTelemetry is added later (M7+), existing log entries automatically include trace/span IDs. This is just a processor stub — no OpenTelemetry dependency needed now.

Part 4: Enforce in CLAUDE.md and PR review pipeline

4a. Add to CLAUDE.md

Add a ## Logging section with rules:

  • Every module must use: from ai_company.observability import get_logger then logger = get_logger(__name__)
  • Never use print(), logging.getLogger(), or raw logging — always use ai_company.observability
  • All error paths must log at WARNING or ERROR with context
  • All state transitions must log at INFO
  • Debug-level logs for object creation and internal flow

4b. Add logging-audit agent to PR review skill

Add a new sub-agent to .claude/skills/aurelio-review-pr/skill.md that:

  • Checks all files touched by the PR for import logging (stdlib) — flag as violation
  • Checks all new/modified functions for appropriate log statements
  • Verifies get_logger(__name__) is present in every module with business logic

Part 5: Test everything

  • Unit tests for all new log statements (assert on structured fields)
  • Integration test: configure logging -> execute a workflow -> verify all sinks received correct events
  • Test logger-to-sink routing filters
  • Test async correlation decorator
  • Maintain 80%+ coverage

Acceptance Criteria

  • Part 0 investigation complete — recommendation document reviewed and approved before proceeding
  • configure_logging() called from config loader pipeline (resolves TODO(#59))
  • All 5 stdlib logger modules migrated to get_logger
  • All ~57 unlogged modules instrumented with appropriate log statements
  • Logger-to-sink routing implemented (audit, cost, agent activity)
  • Async correlation decorator implemented
  • Structured event constants defined
  • Log testing fixture available for pytest
  • CLAUDE.md updated with mandatory logging rules
  • PR review skill updated with logging-audit agent
  • All new logging tested (unit + integration)
  • 80%+ coverage maintained
  • Zero uses of raw logging.getLogger in application code
  • Zero uses of print() in application code (except observability setup fallback)

Dependencies

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:criticalBlocks other work, must do firstprio:highImportant, should be prioritizedscope:large3+ days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent Systemspec:architectureDESIGN_SPEC Section 15 - Technical Architecturespec:budgetDESIGN_SPEC Section 10 - Cost & Budget Managementspec:providersDESIGN_SPEC Section 9 - Model Provider Layertype:featureNew feature implementationtype:infraCI/CD, tooling, project setuptype:refactorCode restructuring, cleanuptype:testTest coverage, test infrastructure

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions