-
Notifications
You must be signed in to change notification settings - Fork 0
Implement enterprise logging system with comprehensive configuration #72
Description
Summary
Implement a foundational enterprise-grade logging system that all subsequent code must use. This is a core infrastructure requirement — once merged, CLAUDE.md and contributing guidelines must mandate its use in every module.
The design spec lists Observable as a core design principle: "Every agent action, communication, and decision is logged and visible." Currently the codebase has zero logging — no imports, no loggers, no configuration. This issue fills that gap before M2+ modules start building on top.
Requirements
1. Structured Logging Library
- Use structlog for structured, key-value logging (JSON in production, colored console in dev)
- Wrap stdlib
loggingso third-party libraries (uvicorn, FastAPI, httpx, etc.) are captured too - All log entries must be machine-parseable (JSON lines format for files/streams)
2. Log Categories (Named Loggers)
Every subsystem gets its own named logger with independent configuration:
| Logger Name | Purpose | Default Level |
|---|---|---|
ai_company.core |
Domain model lifecycle events | INFO |
ai_company.engine |
Agent execution, task lifecycle | DEBUG |
ai_company.communication |
Inter-agent messages, bus events | INFO |
ai_company.providers |
LLM API calls, tokens, latency | INFO |
ai_company.budget |
Cost tracking, limit alerts, spending | INFO |
ai_company.security |
Audit trail, approvals, denials | INFO |
ai_company.memory |
Memory reads/writes, consolidation | DEBUG |
ai_company.tools |
Tool invocations, results, permissions | INFO |
ai_company.api |
HTTP requests, responses, middleware | INFO |
ai_company.cli |
CLI command invocations | INFO |
ai_company.config |
Config loading, validation, changes | INFO |
ai_company.templates |
Template loading, company building | INFO |
3. Multiple Output Sinks (Handlers)
All configurable, all toggleable independently:
| Sink | Format | Purpose |
|---|---|---|
| Console (stderr) | Colored, human-readable | Development / interactive use |
| Main log file | JSON lines | General application log |
| Audit log file | JSON lines | Security events only (approvals, denials, permission checks) |
| Error log file | JSON lines | WARNING+ only, for quick triage |
| Agent activity log | JSON lines | Per-agent actions, task state changes |
| Cost/usage log | JSON lines | LLM calls, token counts, costs (feeds analytics) |
| Debug log file | JSON lines | ALL levels including DEBUG/TRACE (rotated aggressively) |
4. Configuration System
Logging must be fully configurable via:
a) Python API (programmatic)
from ai_company.logging import configure_logging, LogConfig
configure_logging(LogConfig(
level="DEBUG",
console_enabled=True,
console_format="colored", # or "json"
file_enabled=True,
file_path="logs/ai_company.log",
file_rotation="10 MB",
file_retention=30, # days
audit_log_enabled=True,
audit_log_path="logs/audit.log",
error_log_enabled=True,
cost_log_enabled=True,
per_logger_levels={
"ai_company.engine": "DEBUG",
"ai_company.providers": "WARNING",
},
json_indent=None, # compact in production
include_caller=True, # file:line info
include_timestamp=True,
timestamp_format="iso", # or "unix" or custom
correlation_id_enabled=True, # request/task correlation
))b) YAML configuration file
logging:
level: INFO
console:
enabled: true
format: colored # colored | json | plain
level: null # inherits from root
files:
main:
enabled: true
path: logs/ai_company.log
format: json
rotation: "10 MB"
retention_days: 30
level: null
audit:
enabled: true
path: logs/audit.log
format: json
rotation: "50 MB"
retention_days: 365
errors:
enabled: true
path: logs/errors.log
format: json
level: WARNING
debug:
enabled: false
path: logs/debug.log
format: json
rotation: "50 MB"
retention_days: 7
cost:
enabled: true
path: logs/cost.log
format: json
loggers:
ai_company.engine: DEBUG
ai_company.security: INFO
ai_company.providers: WARNING
context:
include_caller: true
include_timestamp: true
timestamp_format: iso
correlation_id: truec) Environment variable overrides
AI_COMPANY_LOG_LEVEL=DEBUG
AI_COMPANY_LOG_CONSOLE_FORMAT=json
AI_COMPANY_LOG_FILE_ENABLED=true
AI_COMPANY_LOG_FILE_PATH=logs/app.log
AI_COMPANY_LOG_AUDIT_ENABLED=true
AI_COMPANY_LOG_PER_LOGGER="engine=DEBUG,providers=WARNING"5. Contextual Enrichment (Automatic Fields)
Every log entry automatically includes (when available):
timestamp— ISO 8601level— DEBUG/INFO/WARNING/ERROR/CRITICALlogger— dotted logger nameevent— the log messagecorrelation_id— ties related operations together (e.g., a single task execution)agent_id— which agent produced this logtask_id— which task is being worked oncaller—file:lineof the log callthread/async_task— concurrency context
6. Log Rotation & Retention
- File rotation by size (default 10 MB) and/or time
- Configurable retention period per sink
- Compressed archived logs (
.gz) - Audit logs: long retention (365 days default)
- Debug logs: short retention (7 days default)
7. Performance Requirements
- Logging must not block the main event loop (async-safe)
- File I/O on background thread (via
QueueHandleror structlog async) - Minimal overhead when a log level is disabled (lazy evaluation)
- Support for sampling high-volume debug logs in production
8. Testing & Development Support
caplog/log_capturefixture for testing log output in pytest- Ability to assert on structured log fields (not just message text)
configure_logging(testing=True)preset that captures to list- Log output in test runs configurable via
--log-cli-level
Deliverables
-
src/ai_company/logging/module with all components - Pydantic models for
LogConfigand sub-configs -
configure_logging()function (programmatic API) - YAML config integration (loads from company config file)
- Environment variable override support
- All 6+ output sinks implemented and toggleable
- Per-logger level configuration
- Contextual enrichment (correlation ID, agent ID, task ID, caller info)
- File rotation and retention
- Async-safe implementation
- Pytest fixtures for log testing
- Unit tests with 80%+ coverage
- Update CLAUDE.md: all new code must use the logging system (no bare
print(), no rawlogging) - Update contributing guide with logging usage examples
- Backfill logging into existing M1 domain models (core/, budget/)
Post-Implementation Enforcement
Once merged, add to CLAUDE.md:
## Logging (MANDATORY)
- **Every module** must obtain a logger: `logger = get_logger(__name__)`
- **Never** use `print()` or raw `logging.getLogger()` — always use `ai_company.logging`
- **All public functions** must log entry/exit at DEBUG level
- **All error paths** must log at ERROR/WARNING with context
- **All state transitions** (task status, agent lifecycle) must log at INFO
- **Security events** (approvals, denials, permission checks) must use the audit logger
- **Cost events** (LLM calls, token usage) must include cost metadataDependencies
structlog— structured logging- Potentially
python-json-loggeror rely on structlog's JSON renderer - No other external deps needed (rotation via stdlib
logging.handlers)
References
- Design Spec Section 1.2: Observable design principle
- Design Spec Section 5.5: Loop detection logging
- Design Spec Section 12.3: Security audit logging
- Design Spec Section 13.2: API middleware logging
- Design Spec Section 15.3:
audit.pymodule - Issue Implement per-call cost tracking and usage logging #7: Per-call cost tracking (depends on this logging system)
- Issue Implement Security Operations agent (action validation, audit logging) #40: SecOps audit logging (depends on this logging system)