Skip to content

chore: post-audit cleanup — code quality, test coverage, and prevention hooks #148

@Aureliolo

Description

@Aureliolo

Context

Full codebase audit of M0–M2.5 + closed M3 issues (8 parallel agents, all 39 closed issues checked). Overall health is excellent — 95.36% coverage, 0 mypy errors, 0 ruff errors, 0 test failures, zero DESIGN_SPEC deviations. This issue captures all decided fixes + prevention mechanisms.

Update (2026-03-07): Cross-referenced with 3 pending branches (feat/file-system-tools, feat/git-tools, feat/crash-recovery). Added violations found in those branches below. Line numbers reference the branch code — they may shift after merge. When tackling each section, re-scan the full codebase for any additional violations introduced by these or other branches merged since this audit.

Update 2 (2026-03-07): All 3 branches now merged (PRs #149, #150, #151). Re-scanned full codebase — updated counts, added 17 previously-missed long functions, 3 new low-coverage files, and 6 post-merge reviewer findings. Line numbers now reference main.

Checklist

Code Fixes

  • PEP 758 except syntax — fix 12 known violations (except (A, B):except A, B:). Re-scan after future merges for any new violations.

    • config/utils.py:25
    • engine/agent_engine.py:566
    • templates/loader.py:114,365
    • tools/file_system/edit_file.py:186,220
    • tools/file_system/read_file.py:166,209
    • tools/invoker.py:224,312,338,408
  • Refactor 32 long functions (all >50 lines, CLAUDE.md limit) — extract helper methods. Re-scan after future merges for any new long functions.

    • tools/file_system/edit_file.py:129execute() ~126 lines
    • tools/file_system/list_directory.py:163execute() ~112 lines
    • tools/file_system/read_file.py:122execute() ~112 lines
    • engine/react_loop.py:57execute() ~106 lines
    • engine/agent_engine.py:109run() ~94 lines
    • engine/prompt.py:143build_system_prompt() ~94 lines
    • tools/_git_base.py:166_run_git() ~94 lines
    • tools/file_system/write_file.py:119execute() ~85 lines
    • config/loader.py:399load_config() ~78 lines (new)
    • engine/prompt.py:446_trim_sections() ~72 lines
    • tools/file_system/list_directory.py:30_list_sync() ~71 lines (new)
    • engine/agent_engine.py:662_handle_fatal_error() ~68 lines
    • templates/renderer.py:251_build_config_dict() ~68 lines (new)
    • engine/agent_engine.py:503_apply_post_execution_transitions() ~67 lines
    • engine/agent_engine.py:214_execute() ~66 lines
    • engine/prompt.py:531_render_with_trimming() ~63 lines (new)
    • tools/invoker.py:411invoke_all() ~63 lines (new)
    • engine/react_loop.py:339_execute_tool_calls() ~60 lines (new)
    • tools/invoker.py:299_execute_tool() ~59 lines (new)
    • providers/registry.py:163_build_driver() ~58 lines (new)
    • providers/resilience/retry.py:42execute() ~57 lines (new)
    • tools/file_system/delete_file.py:81execute() ~57 lines
    • templates/loader.py:151load_template() ~56 lines (new)
    • templates/renderer.py:326_expand_agents() ~56 lines (new)
    • budget/tracker.py:171build_summary() ~54 lines (new)
    • templates/loader.py:312_parse_template_yaml() ~54 lines (new)
    • engine/task_execution.py:135with_transition() ~53 lines
    • providers/routing/_strategy_helpers.py:193_fastest_within_budget() ~53 lines (new)
    • templates/loader.py:87list_templates() ~52 lines (new)
    • templates/renderer.py:49render_template() ~52 lines (new)
    • tools/git_tools.py:129__init__() ~52 lines (new)
    • providers/base.py:279_rate_limited_call() ~51 lines (new)
  • Add loggers to 3 business modules + WARNING/ERROR logs before raises. Re-scan for any new modules without loggers.

    • templates/presets.pyget_personality_preset() raises KeyError without logging
    • observability/correlation.py — raises TypeError without logging
    • config/utils.py — no logger at all
  • NotBlankStr audit + fix — change 2 known str fields to NotBlankStr. Re-scan after merging pending branches for any new str identifier fields.

    • templates/schema.py:212CompanyTemplate.workflow
    • templates/schema.py:216CompanyTemplate.communication
    • Then sweep entire codebase for any other str fields that should be NotBlankStr
  • Sync ruff version.pre-commit-config.yaml pins v0.15.4, pyproject.toml pins ==0.15.5. Investigate making Dependabot update both files together (grouped update config or post-bump automation).

Post-Merge Reviewer Findings (NEW)

Findings from external reviewers (greptile) that arrived after PRs #149, #150, #151 were merged. All are valid and should be fixed.

  • raise exc from build_exc reverses exception causality (PR feat: implement crash recovery with fail-and-reassign strategy #149, agent_engine.py:741) — exc is the original error, build_exc is secondary. Should be raise exc from None to suppress the misleading chain. The build_exc is already captured in the structured log on line 738.

  • Unknown action silently falls through to delete in GitBranchTool (PR feat: implement built-in git tools with security hardening #150, git_tools.py:449-454) — the final if/elif chain has no else guard for the delete branch. If schema validation is bypassed, any unrecognised action would silently execute git branch -d/-D <name>. Add a defensive guard before the delete path.

  • ref2 accepted without ref1 produces misleading diff (PR feat: implement built-in git tools with security hardening #150, git_tools.py:310-317) — when only ref2 is supplied, git interprets it as git diff <ref2> (diff against working tree), not a two-ref comparison. Return an error when ref2 is provided without ref1.

  • Clone URL credentials logged in plaintext (PR feat: implement built-in git tools with security hardening #150, _git_base.py:192-197) — security issue. The full args list including clone URLs is logged at debug and warning levels. URLs with embedded credentials (user:password@host) are written verbatim. Sanitise URLs before logging, pass originals only to subprocess.

  • _UNSAFE_GLOB_RE misses trailing .. without separator (PR feat: implement built-in file system tools (#18) #151, list_directory.py:27) — regex (^|[/\\])\.\.[/\\]|^\.\.$ doesn't catch subdir/.. (trailing ..). Fix: change to (^|[/\\])\.\.([/\\]|$)|^\.\.$ to anchor on $ as well. Without this, a ValueError from pathlib escapes the tool boundary.

  • recursive=False doesn't prevent ** glob patterns (PR feat: implement built-in file system tools (#18) #151, list_directory.py:54-60) — when pattern contains **, resolved.glob("**/*.py") still recurses via pathlib regardless of the recursive flag. Detect and reject ** in pattern when recursive=False.

Test Coverage

  • templates/errors.py (29% → 90%+) — create tests/unit/templates/test_errors.py, test all __str__() methods and ConfigLocation formatting

  • templates/renderer.py (75% → 90%+) — add tests for type-guard branches, YAML parse errors, department building, _validate_as_root_config error path

  • templates/loader.py (78% → 90%+) — add tests for list_templates() user-dir scanning, pass-1 float fallback, defective builtin detection, error paths

  • tools/file_system/_base_fs_tool.py (66% → 90%+) — test base class logic (new — from merged feat/file-system-tools)

  • tools/file_system/list_directory.py (76% → 90%+) — test glob edge cases, symlink handling, error paths (new — from merged feat/file-system-tools)

  • tools/file_system/write_file.py (82% → 90%+) — test write error paths, permission handling (new — from merged feat/file-system-tools)

  • Module-level safety guards — test via monkeypatch in all 3 files:

    • engine/prompt_template.py (67%) — AUTONOMY_INSTRUCTIONS exhaustiveness guard
    • core/task_transitions.py (72%) — VALID_TRANSITIONS exhaustiveness guard + validate_transition defensive branch
    • core/role_catalog.py (87%) — duplicate/missing guards (lines 398-411)
  • observability/sinks.py (87% → 95%+) — add OS-agnostic mocked tests for file handler creation, OSError paths, WatchedFileHandler branch

  • agent_engine.py defensive paths — test all 3. Re-scan — line numbers may have shifted and new defensive paths may exist.

    • Lines 284-287: try/except around _log_completion (error swallowing)
    • Lines 456-466: task-assigned-to-different-agent check
    • Line 329: timeout code path
  • react_loop.py:185 budget checker exception — test that the loop continues when budget checker throws, warning is logged

  • Refactor repetitive tests to use @pytest.mark.parametrize — convert the most repetitive test files. Re-scan after future merges for any new repetitive patterns.

    • test_task_transitions.py — 32 individual transition tests → parametrized (0 parametrize currently)
    • test_permissions.py — 43 per-access-level tests → parametrized (0 parametrize currently)
    • test_role_catalog.py — 27 per-role validation tests → parametrized (2 parametrize currently)
    • test_task.py — 63 tests, 0 parametrized (520 lines) — TestTaskAssignmentConsistency and TestTaskStringValidation are prime candidates
    • Add CLAUDE.md note: "Prefer @pytest.mark.parametrize for testing similar cases"

Prevention (Hookify Rules)

  • Hookify rule: parenthesized except — warn on except ( pattern in src/ files (PostToolUse on Write/Edit)
  • Hookify rule: missing get_logger — warn when creating/editing .py files in src/ai_company/ that contain function/class definitions but no get_logger import (exclude __init__.py, enums.py, errors.py, types.py, models.py, protocol.py)
  • Hookify rule: function length — warn when a function body exceeds 50 lines in src/ files

NOT in scope (tracked elsewhere)

Audit Evidence

  • 8 parallel audit agents verified all 39 closed issues (M0: 8, M1: 9, M2: 8, M2.5: 8, M3-closed: 6)
  • Full test suite: 2054 tests passed, 0 failures (now 2194 after branch merges)
  • Coverage: 95.36% (now 94.54% — slight dip from new code in merged branches)
  • mypy: 0 errors in 227 files
  • ruff: all checks passed
  • DESIGN_SPEC alignment: zero breaking deviations
  • (2026-03-07) 3 pending branches audited: feat/file-system-tools, feat/git-tools, feat/crash-recovery — findings integrated above
  • (2026-03-07, update 2) All 3 branches merged. Full re-scan done. 6 post-merge reviewer findings added. 17 additional long functions added. 3 new low-coverage files added.

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:highImportant, should be prioritizedscope:large3+ days of workspec:architectureDESIGN_SPEC Section 15 - Technical Architecturetype:infraCI/CD, tooling, project setuptype:refactorCode restructuring, cleanuptype:testTest coverage, test infrastructure

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions