Wire all modules into observability system and harden logging infrastructure

## Context

Issue #72 (PR #73) built a comprehensive observability system in `src/ai_company/observability/` — structlog integration, 7-sink layout, correlation tracking, sensitive field sanitization, rotation, and per-logger level configuration. The system is well-tested (65+ unit tests, 80%+ coverage) and architecturally sound.

**However, nothing actually uses it.**

The `get_logger(__name__)` API was designed as the single entry point for all application logging. In practice:
- **0 modules** use `get_logger` from observability
- **5 modules** use raw `logging.getLogger(__name__)` (stdlib), bypassing structlog, correlation IDs, and field sanitization
- **~57 modules** have **no logging at all** — including critical business logic (agent lifecycle, task state, config loading, budget tracking, provider calls)
- **CLAUDE.md** has no enforcement rules — issue #72 specified a post-merge deliverable to add mandatory logging rules, but this was never done

Additionally, PR #73 left a `TODO(#59): Integrate LogConfig with central config/ YAML loading`. Issue #59 (YAML config loader) has since been merged. The **structural** wiring exists — `RootConfig.logging: LogConfig | None` is in `config/schema.py` — but `configure_logging()` is never actually called from the config loader pipeline. So YAML config can *hold* logging settings, but nothing *activates* them at startup.

This issue rectifies all of that — but **starts with a deep investigation phase** before making any changes.

## Part 0: Deep investigation — validate the current approach (MUST DO FIRST)

Before wiring anything in, critically evaluate whether our current observability stack is the right foundation to build on. Issue #72 made technology decisions (structlog, 7-sink file layout, custom correlation tracking) that were never stress-tested against real usage. This phase is a safety gate.

### 0a. Evaluate structlog as the logging backbone

- Is structlog still the best choice for structured logging in Python 3.14+? Compare against alternatives:
  - **stdlib logging + python-json-logger** — simpler, no extra dependency, native async support?
  - **loguru** — batteries-included, less boilerplate, but less configurable?
  - **structlog** (current) — powerful processors pipeline, but adds complexity
  - **OpenTelemetry Logs SDK** — if we plan OTel for M7+, should we just use their logging SDK now?
- Evaluate: does structlog's processor pipeline actually buy us anything we couldn't achieve with stdlib + a formatter?
- Check: are there known structlog issues with Python 3.14 / PEP 649?

### 0b. Evaluate the 7-sink architecture

- Is file-based sink routing (7 separate log files) the right approach for this project?
- Consider alternatives:
  - **Single structured log file** + post-hoc filtering (simpler ops, grep/jq friendly)
  - **stdout/stderr only** (12-factor app style, let the deployment platform handle routing)
  - **Hybrid**: structured stdout for dev, sink routing for production
- For each sink, ask: will this actually be consumed by a human or tool? If not, it's dead weight.
- Evaluate rotation config: are the defaults sensible? Are they tested?

### 0c. Evaluate correlation tracking

- Is our custom ContextVar-based correlation tracking sufficient, or should we adopt OpenTelemetry's trace_id/span_id from the start?
- If OTel is planned for M7+, is it cheaper to adopt opentelemetry-api (zero-cost if no SDK installed) now rather than building custom and migrating later?
- Check: does the async correlation decorator (with_correlation_async) need to integrate with Python's TaskGroup / asyncio.gather for proper context propagation?

### 0d. Evaluate sensitive field sanitization

- Is the current sanitization approach (regex on field names) robust enough?
- Are there edge cases it misses (nested dicts, Pydantic model dumps, binary data)?
- Should we use an allowlist instead of a blocklist for fields that appear in logs?

### 0e. Produce a recommendation document

Write findings to a brief decision record (either in the PR description or a comment on this issue):
- **Keep as-is**: list what's validated and ready
- **Change before wiring**: list any changes needed before we commit to wiring 57+ modules into this system
- **Defer**: list anything that should wait for a later milestone

**Only proceed to Parts 1-5 after the investigation is reviewed and approved.**

## Part 1: Migrate existing stdlib loggers to get_logger

Replace `logging.getLogger(__name__)` with `get_logger(__name__)` in these 5 files:

| File | Current | Impact |
|------|---------|--------|
| `config/loader.py` | `logging.getLogger` | Bypasses structlog pipeline |
| `templates/loader.py` | `logging.getLogger` | No correlation tracking |
| `templates/renderer.py` | `logging.getLogger` | No field sanitization |
| `providers/drivers/litellm_driver.py` | `logging.getLogger` | LLM calls not in structured format |
| `providers/drivers/mappers.py` | `logging.getLogger` | Warnings not captured by sinks |

Each migration:
1. Replace `import logging` with `from ai_company.observability import get_logger`
2. Replace `logging.getLogger(__name__)` with `get_logger(__name__)`
3. Normalize variable naming to `logger` (the convention per issue #72)
4. Verify existing log calls still work (structlog is API-compatible with stdlib)

## Part 1b: Wire configure_logging() into config loader pipeline

PR #73's `TODO(#59)` — the config loader must call `configure_logging(log_config)` after loading `RootConfig` so that YAML-defined log settings (levels, sinks, rotation) are actually activated at startup. Currently the `LogConfig` is parsed and stored but never consumed.

1. In `config/loader.py`, after `RootConfig` is validated, call `configure_logging(config.logging)` if `config.logging` is not `None`
2. Add a sensible default so logging works even without explicit YAML config (e.g. INFO to console)
3. Ensure `configure_logging()` is idempotent (safe to call multiple times in tests)

## Part 2: Instrument all unlogged modules

Add structured logging to every module that performs work worth observing. Follow the principle from DESIGN_SPEC Section 1.2: *"Every agent action, communication, and decision is logged and visible."*

### Modules that MUST have logging added

**Core domain** (`ai_company.core`):
- `agent.py` — Agent creation, status changes, validation
- `company.py` — Company/department structure changes
- `role.py` — Role assignment, authority validation
- `role_catalog.py` — Catalog lookups, missing roles
- `task.py` — Task creation, status transitions
- `task_transitions.py` — State machine transitions (critical for debugging)
- `project.py` — Project lifecycle events

**Budget** (`ai_company.budget`):
- `hierarchy.py` — Budget allocation changes, limit checks
- `spending_summary.py` — Spending aggregation events
- `cost_record.py` — Cost record creation

**Communication** (`ai_company.communication`):
- `channel.py` — Channel operations
- `message.py` — Message creation/validation

**Config** (`ai_company.config`):
- `utils.py` — Env var substitution (log what was substituted, without values)
- `errors.py` — Config validation error details

**Providers** (`ai_company.providers`):
- `base.py` — Input validation, hook delegation
- `registry.py` — Provider registration, lookups, factory resolution
- `errors.py` — Error creation with context

**Templates** (`ai_company.templates`):
- `schema.py` — Template validation
- `errors.py` — Template error details

### Logging level guidelines

| Event type | Level | Examples |
|------------|-------|---------|
| Object creation | `DEBUG` | Agent created, task instantiated |
| State transitions | `INFO` | Task status change, agent status change |
| Validation failures | `WARNING` | Invalid config, unknown model |
| Error recovery | `WARNING` | Fallback used, retry triggered |
| Unrecoverable errors | `ERROR` | Provider failure, config load failure |
| Security events | `INFO` via audit logger | Permission checks, approval gates |

## Part 3: Harden and extend the observability system

### 3a. Implement logger-to-sink routing (TODO in sinks.py)

Currently all sinks receive all log events. The audit sink should only receive security-related logs, cost sink should only receive cost events, etc. Implement logger name filters:

- `ai_company.security.*` routes to `audit.log`
- `ai_company.budget.*` + `ai_company.providers.*` (cost events) routes to `cost_usage.log`
- `ai_company.engine.*` + `ai_company.core.*` (agent/task events) routes to `agent_activity.log`

### 3b. Implement async correlation decorator (TODO in correlation.py)

Add `with_correlation_async()` for async functions — needed by the engine and API layers in M3/M6. The sync `with_correlation()` context manager exists but async equivalents are missing.

### 3c. Add structured log event constants

Define standard event names as constants to prevent typos and enable grep-ability:

```python
# ai_company/observability/events.py
TASK_CREATED = "task.created"
TASK_STATUS_CHANGED = "task.status_changed"
AGENT_CREATED = "agent.created"
PROVIDER_CALL_START = "provider.call.start"
PROVIDER_CALL_COMPLETE = "provider.call.complete"
CONFIG_LOADED = "config.loaded"
```

### 3d. Add log testing utilities

Add a pytest fixture that captures structured log output for asserting on log fields (not just message text). Issue #72 specified this but it wasn't implemented:

```python
@pytest.fixture
def captured_logs():
    """Capture structlog output for test assertions."""
    ...
```

### 3e. Future-proof: add OpenTelemetry-ready span hooks

Add optional span context injection so when OpenTelemetry is added later (M7+), existing log entries automatically include trace/span IDs. This is just a processor stub — no OpenTelemetry dependency needed now.

## Part 4: Enforce in CLAUDE.md and PR review pipeline

### 4a. Add to CLAUDE.md

Add a `## Logging` section with rules:
- **Every module** must use: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`
- **Never** use `print()`, `logging.getLogger()`, or raw `logging` — always use `ai_company.observability`
- **All error paths** must log at WARNING or ERROR with context
- **All state transitions** must log at INFO
- **Debug-level** logs for object creation and internal flow

### 4b. Add logging-audit agent to PR review skill

Add a new sub-agent to `.claude/skills/aurelio-review-pr/skill.md` that:
- Checks all files touched by the PR for `import logging` (stdlib) — flag as violation
- Checks all new/modified functions for appropriate log statements
- Verifies `get_logger(__name__)` is present in every module with business logic

## Part 5: Test everything

- Unit tests for all new log statements (assert on structured fields)
- Integration test: configure logging -> execute a workflow -> verify all sinks received correct events
- Test logger-to-sink routing filters
- Test async correlation decorator
- Maintain 80%+ coverage

## Acceptance Criteria

- [ ] **Part 0 investigation complete** — recommendation document reviewed and approved before proceeding
- [ ] `configure_logging()` called from config loader pipeline (resolves `TODO(#59)`)
- [ ] All 5 stdlib logger modules migrated to `get_logger`
- [ ] All ~57 unlogged modules instrumented with appropriate log statements
- [ ] Logger-to-sink routing implemented (audit, cost, agent activity)
- [ ] Async correlation decorator implemented
- [ ] Structured event constants defined
- [ ] Log testing fixture available for pytest
- [ ] CLAUDE.md updated with mandatory logging rules
- [ ] PR review skill updated with logging-audit agent
- [ ] All new logging tested (unit + integration)
- [ ] 80%+ coverage maintained
- [ ] Zero uses of raw `logging.getLogger` in application code
- [ ] Zero uses of `print()` in application code (except observability setup fallback)

## Dependencies

- Issue #72 (PR #73) — observability system (merged)
- Should be done before or alongside #7 (cost tracking) and #9 (retry logic), since both need proper logging

## References

- DESIGN_SPEC Section 1.2: Observable design principle
- DESIGN_SPEC Section 5.5: Loop detection logging
- DESIGN_SPEC Section 12.3: Security audit logging
- DESIGN_SPEC Section 13.2: API middleware logging
- DESIGN_SPEC Section 15.3: audit.py module
- Issue #72: Original logging system spec (deliverables partially incomplete)
- PR #73: Observability system implementation


File	Current	Impact
`config/loader.py`	`logging.getLogger`	Bypasses structlog pipeline
`templates/loader.py`	`logging.getLogger`	No correlation tracking
`templates/renderer.py`	`logging.getLogger`	No field sanitization
`providers/drivers/litellm_driver.py`	`logging.getLogger`	LLM calls not in structured format
`providers/drivers/mappers.py`	`logging.getLogger`	Warnings not captured by sinks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire all modules into observability system and harden logging infrastructure #91

Context

Part 0: Deep investigation — validate the current approach (MUST DO FIRST)

0a. Evaluate structlog as the logging backbone

0b. Evaluate the 7-sink architecture

0c. Evaluate correlation tracking

0d. Evaluate sensitive field sanitization

0e. Produce a recommendation document

Part 1: Migrate existing stdlib loggers to get_logger

Part 1b: Wire configure_logging() into config loader pipeline

Part 2: Instrument all unlogged modules

Modules that MUST have logging added

Logging level guidelines

Part 3: Harden and extend the observability system

3a. Implement logger-to-sink routing (TODO in sinks.py)

3b. Implement async correlation decorator (TODO in correlation.py)

3c. Add structured log event constants

3d. Add log testing utilities

3e. Future-proof: add OpenTelemetry-ready span hooks

Part 4: Enforce in CLAUDE.md and PR review pipeline

4a. Add to CLAUDE.md

4b. Add logging-audit agent to PR review skill

Part 5: Test everything

Acceptance Criteria

Dependencies

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Event type	Level	Examples
Object creation	`DEBUG`	Agent created, task instantiated
State transitions	`INFO`	Task status change, agent status change
Validation failures	`WARNING`	Invalid config, unknown model
Error recovery	`WARNING`	Fallback used, retry triggered
Unrecoverable errors	`ERROR`	Provider failure, config load failure
Security events	`INFO` via audit logger	Permission checks, approval gates

Wire all modules into observability system and harden logging infrastructure #91

Description

Context

Part 0: Deep investigation — validate the current approach (MUST DO FIRST)

0a. Evaluate structlog as the logging backbone

0b. Evaluate the 7-sink architecture

0c. Evaluate correlation tracking

0d. Evaluate sensitive field sanitization

0e. Produce a recommendation document

Part 1: Migrate existing stdlib loggers to get_logger

Part 1b: Wire configure_logging() into config loader pipeline

Part 2: Instrument all unlogged modules

Modules that MUST have logging added

Logging level guidelines

Part 3: Harden and extend the observability system

3a. Implement logger-to-sink routing (TODO in sinks.py)

3b. Implement async correlation decorator (TODO in correlation.py)

3c. Add structured log event constants

3d. Add log testing utilities

3e. Future-proof: add OpenTelemetry-ready span hooks

Part 4: Enforce in CLAUDE.md and PR review pipeline

4a. Add to CLAUDE.md

4b. Add logging-audit agent to PR review skill

Part 5: Test everything

Acceptance Criteria

Dependencies

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions