chore: post-audit cleanup — code quality, test coverage, and prevention hooks

## Context

Full codebase audit of M0–M2.5 + closed M3 issues (8 parallel agents, all 39 closed issues checked). Overall health is excellent — 95.36% coverage, 0 mypy errors, 0 ruff errors, 0 test failures, zero DESIGN_SPEC deviations. This issue captures all decided fixes + prevention mechanisms.

**Update (2026-03-07):** Cross-referenced with 3 pending branches (`feat/file-system-tools`, `feat/git-tools`, `feat/crash-recovery`). Added violations found in those branches below. Line numbers reference the branch code — they may shift after merge. When tackling each section, **re-scan the full codebase** for any additional violations introduced by these or other branches merged since this audit.

**Update 2 (2026-03-07):** All 3 branches now merged (PRs #149, #150, #151). Re-scanned full codebase — updated counts, added 17 previously-missed long functions, 3 new low-coverage files, and 6 post-merge reviewer findings. Line numbers now reference `main`.

## Checklist

### Code Fixes

- [ ] **PEP 758 except syntax** — fix 12 known violations (`except (A, B):` → `except A, B:`). **Re-scan after future merges for any new violations.**
  - `config/utils.py:25`
  - `engine/agent_engine.py:566`
  - `templates/loader.py:114,365`
  - `tools/file_system/edit_file.py:186,220`
  - `tools/file_system/read_file.py:166,209`
  - `tools/invoker.py:224,312,338,408`

- [ ] **Refactor 32 long functions** (all >50 lines, CLAUDE.md limit) — extract helper methods. **Re-scan after future merges for any new long functions.**
  - `tools/file_system/edit_file.py:129` — `execute()` ~126 lines
  - `tools/file_system/list_directory.py:163` — `execute()` ~112 lines
  - `tools/file_system/read_file.py:122` — `execute()` ~112 lines
  - `engine/react_loop.py:57` — `execute()` ~106 lines
  - `engine/agent_engine.py:109` — `run()` ~94 lines
  - `engine/prompt.py:143` — `build_system_prompt()` ~94 lines
  - `tools/_git_base.py:166` — `_run_git()` ~94 lines
  - `tools/file_system/write_file.py:119` — `execute()` ~85 lines
  - `config/loader.py:399` — `load_config()` ~78 lines *(new)*
  - `engine/prompt.py:446` — `_trim_sections()` ~72 lines
  - `tools/file_system/list_directory.py:30` — `_list_sync()` ~71 lines *(new)*
  - `engine/agent_engine.py:662` — `_handle_fatal_error()` ~68 lines
  - `templates/renderer.py:251` — `_build_config_dict()` ~68 lines *(new)*
  - `engine/agent_engine.py:503` — `_apply_post_execution_transitions()` ~67 lines
  - `engine/agent_engine.py:214` — `_execute()` ~66 lines
  - `engine/prompt.py:531` — `_render_with_trimming()` ~63 lines *(new)*
  - `tools/invoker.py:411` — `invoke_all()` ~63 lines *(new)*
  - `engine/react_loop.py:339` — `_execute_tool_calls()` ~60 lines *(new)*
  - `tools/invoker.py:299` — `_execute_tool()` ~59 lines *(new)*
  - `providers/registry.py:163` — `_build_driver()` ~58 lines *(new)*
  - `providers/resilience/retry.py:42` — `execute()` ~57 lines *(new)*
  - `tools/file_system/delete_file.py:81` — `execute()` ~57 lines
  - `templates/loader.py:151` — `load_template()` ~56 lines *(new)*
  - `templates/renderer.py:326` — `_expand_agents()` ~56 lines *(new)*
  - `budget/tracker.py:171` — `build_summary()` ~54 lines *(new)*
  - `templates/loader.py:312` — `_parse_template_yaml()` ~54 lines *(new)*
  - `engine/task_execution.py:135` — `with_transition()` ~53 lines
  - `providers/routing/_strategy_helpers.py:193` — `_fastest_within_budget()` ~53 lines *(new)*
  - `templates/loader.py:87` — `list_templates()` ~52 lines *(new)*
  - `templates/renderer.py:49` — `render_template()` ~52 lines *(new)*
  - `tools/git_tools.py:129` — `__init__()` ~52 lines *(new)*
  - `providers/base.py:279` — `_rate_limited_call()` ~51 lines *(new)*

- [ ] **Add loggers to 3 business modules** + WARNING/ERROR logs before raises. **Re-scan for any new modules without loggers.**
  - `templates/presets.py` — `get_personality_preset()` raises `KeyError` without logging
  - `observability/correlation.py` — raises `TypeError` without logging
  - `config/utils.py` — no logger at all

- [ ] **NotBlankStr audit + fix** — change 2 known `str` fields to `NotBlankStr`. **Re-scan after merging pending branches for any new `str` identifier fields.**
  - `templates/schema.py:212` — `CompanyTemplate.workflow`
  - `templates/schema.py:216` — `CompanyTemplate.communication`
  - Then sweep entire codebase for any other `str` fields that should be `NotBlankStr`

- [ ] **Sync ruff version** — `.pre-commit-config.yaml` pins v0.15.4, `pyproject.toml` pins ==0.15.5. Investigate making Dependabot update both files together (grouped update config or post-bump automation).

### Post-Merge Reviewer Findings (NEW)

Findings from external reviewers (greptile) that arrived **after** PRs #149, #150, #151 were merged. All are valid and should be fixed.

- [ ] **`raise exc from build_exc` reverses exception causality** (PR #149, `agent_engine.py:741`) — `exc` is the original error, `build_exc` is secondary. Should be `raise exc from None` to suppress the misleading chain. The `build_exc` is already captured in the structured log on line 738.

- [ ] **Unknown `action` silently falls through to `delete`** in `GitBranchTool` (PR #150, `git_tools.py:449-454`) — the final `if/elif` chain has no `else` guard for the delete branch. If schema validation is bypassed, any unrecognised action would silently execute `git branch -d/-D <name>`. Add a defensive guard before the delete path.

- [ ] **`ref2` accepted without `ref1` produces misleading diff** (PR #150, `git_tools.py:310-317`) — when only `ref2` is supplied, git interprets it as `git diff <ref2>` (diff against working tree), not a two-ref comparison. Return an error when `ref2` is provided without `ref1`.

- [ ] **Clone URL credentials logged in plaintext** (PR #150, `_git_base.py:192-197`) — security issue. The full `args` list including clone URLs is logged at debug and warning levels. URLs with embedded credentials (`user:password@host`) are written verbatim. Sanitise URLs before logging, pass originals only to subprocess.

- [ ] **`_UNSAFE_GLOB_RE` misses trailing `..` without separator** (PR #151, `list_directory.py:27`) — regex `(^|[/\\])\.\.[/\\]|^\.\.$` doesn't catch `subdir/..` (trailing `..`). Fix: change to `(^|[/\\])\.\.([/\\]|$)|^\.\.$` to anchor on `$` as well. Without this, a `ValueError` from pathlib escapes the tool boundary.

- [ ] **`recursive=False` doesn't prevent `**` glob patterns** (PR #151, `list_directory.py:54-60`) — when `pattern` contains `**`, `resolved.glob("**/*.py")` still recurses via pathlib regardless of the `recursive` flag. Detect and reject `**` in pattern when `recursive=False`.

### Test Coverage

- [ ] **`templates/errors.py` (29% → 90%+)** — create `tests/unit/templates/test_errors.py`, test all `__str__()` methods and `ConfigLocation` formatting

- [ ] **`templates/renderer.py` (75% → 90%+)** — add tests for type-guard branches, YAML parse errors, department building, `_validate_as_root_config` error path

- [ ] **`templates/loader.py` (78% → 90%+)** — add tests for `list_templates()` user-dir scanning, pass-1 float fallback, defective builtin detection, error paths

- [ ] **`tools/file_system/_base_fs_tool.py` (66% → 90%+)** — test base class logic *(new — from merged feat/file-system-tools)*

- [ ] **`tools/file_system/list_directory.py` (76% → 90%+)** — test glob edge cases, symlink handling, error paths *(new — from merged feat/file-system-tools)*

- [ ] **`tools/file_system/write_file.py` (82% → 90%+)** — test write error paths, permission handling *(new — from merged feat/file-system-tools)*

- [ ] **Module-level safety guards** — test via monkeypatch in all 3 files:
  - `engine/prompt_template.py` (67%) — `AUTONOMY_INSTRUCTIONS` exhaustiveness guard
  - `core/task_transitions.py` (72%) — `VALID_TRANSITIONS` exhaustiveness guard + `validate_transition` defensive branch
  - `core/role_catalog.py` (87%) — duplicate/missing guards (lines 398-411)

- [ ] **`observability/sinks.py` (87% → 95%+)** — add OS-agnostic mocked tests for file handler creation, `OSError` paths, `WatchedFileHandler` branch

- [ ] **`agent_engine.py` defensive paths** — test all 3. **Re-scan — line numbers may have shifted and new defensive paths may exist.**
  - Lines 284-287: `try/except` around `_log_completion` (error swallowing)
  - Lines 456-466: task-assigned-to-different-agent check
  - Line 329: timeout code path

- [ ] **`react_loop.py:185` budget checker exception** — test that the loop continues when budget checker throws, warning is logged

- [ ] **Refactor repetitive tests to use `@pytest.mark.parametrize`** — convert the most repetitive test files. **Re-scan after future merges for any new repetitive patterns.**
  - `test_task_transitions.py` — 32 individual transition tests → parametrized (0 parametrize currently)
  - `test_permissions.py` — 43 per-access-level tests → parametrized (0 parametrize currently)
  - `test_role_catalog.py` — 27 per-role validation tests → parametrized (2 parametrize currently)
  - `test_task.py` — 63 tests, 0 parametrized (520 lines) — `TestTaskAssignmentConsistency` and `TestTaskStringValidation` are prime candidates
  - Add CLAUDE.md note: "Prefer `@pytest.mark.parametrize` for testing similar cases"

### Prevention (Hookify Rules)

- [ ] **Hookify rule: parenthesized except** — warn on `except (` pattern in src/ files (PostToolUse on Write/Edit)
- [ ] **Hookify rule: missing get_logger** — warn when creating/editing .py files in `src/ai_company/` that contain function/class definitions but no `get_logger` import (exclude `__init__.py`, `enums.py`, `errors.py`, `types.py`, `models.py`, `protocol.py`)
- [ ] **Hookify rule: function length** — warn when a function body exceeds 50 lines in src/ files

### NOT in scope (tracked elsewhere)

- E2E tests — covered by open issue #24
- DESIGN_SPEC deviations — none found, nothing to fix

## Audit Evidence

- 8 parallel audit agents verified all 39 closed issues (M0: 8, M1: 9, M2: 8, M2.5: 8, M3-closed: 6)
- Full test suite: 2054 tests passed, 0 failures (now 2194 after branch merges)
- Coverage: 95.36% (now 94.54% — slight dip from new code in merged branches)
- mypy: 0 errors in 227 files
- ruff: all checks passed
- DESIGN_SPEC alignment: zero breaking deviations
- *(2026-03-07)* 3 pending branches audited: `feat/file-system-tools`, `feat/git-tools`, `feat/crash-recovery` — findings integrated above
- *(2026-03-07, update 2)* All 3 branches merged. Full re-scan done. 6 post-merge reviewer findings added. 17 additional long functions added. 3 new low-coverage files added.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: post-audit cleanup — code quality, test coverage, and prevention hooks #148

Context

Checklist

Code Fixes

Post-Merge Reviewer Findings (NEW)

Test Coverage

Prevention (Hookify Rules)

NOT in scope (tracked elsewhere)

Audit Evidence

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

chore: post-audit cleanup — code quality, test coverage, and prevention hooks #148

Description

Context

Checklist

Code Fixes

Post-Merge Reviewer Findings (NEW)

Test Coverage

Prevention (Hookify Rules)

NOT in scope (tracked elsewhere)

Audit Evidence

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions