-
Notifications
You must be signed in to change notification settings - Fork 0
chore: post-audit cleanup — code quality, test coverage, and prevention hooks #148
Description
Context
Full codebase audit of M0–M2.5 + closed M3 issues (8 parallel agents, all 39 closed issues checked). Overall health is excellent — 95.36% coverage, 0 mypy errors, 0 ruff errors, 0 test failures, zero DESIGN_SPEC deviations. This issue captures all decided fixes + prevention mechanisms.
Update (2026-03-07): Cross-referenced with 3 pending branches (feat/file-system-tools, feat/git-tools, feat/crash-recovery). Added violations found in those branches below. Line numbers reference the branch code — they may shift after merge. When tackling each section, re-scan the full codebase for any additional violations introduced by these or other branches merged since this audit.
Update 2 (2026-03-07): All 3 branches now merged (PRs #149, #150, #151). Re-scanned full codebase — updated counts, added 17 previously-missed long functions, 3 new low-coverage files, and 6 post-merge reviewer findings. Line numbers now reference main.
Checklist
Code Fixes
-
PEP 758 except syntax — fix 12 known violations (
except (A, B):→except A, B:). Re-scan after future merges for any new violations.config/utils.py:25engine/agent_engine.py:566templates/loader.py:114,365tools/file_system/edit_file.py:186,220tools/file_system/read_file.py:166,209tools/invoker.py:224,312,338,408
-
Refactor 32 long functions (all >50 lines, CLAUDE.md limit) — extract helper methods. Re-scan after future merges for any new long functions.
tools/file_system/edit_file.py:129—execute()~126 linestools/file_system/list_directory.py:163—execute()~112 linestools/file_system/read_file.py:122—execute()~112 linesengine/react_loop.py:57—execute()~106 linesengine/agent_engine.py:109—run()~94 linesengine/prompt.py:143—build_system_prompt()~94 linestools/_git_base.py:166—_run_git()~94 linestools/file_system/write_file.py:119—execute()~85 linesconfig/loader.py:399—load_config()~78 lines (new)engine/prompt.py:446—_trim_sections()~72 linestools/file_system/list_directory.py:30—_list_sync()~71 lines (new)engine/agent_engine.py:662—_handle_fatal_error()~68 linestemplates/renderer.py:251—_build_config_dict()~68 lines (new)engine/agent_engine.py:503—_apply_post_execution_transitions()~67 linesengine/agent_engine.py:214—_execute()~66 linesengine/prompt.py:531—_render_with_trimming()~63 lines (new)tools/invoker.py:411—invoke_all()~63 lines (new)engine/react_loop.py:339—_execute_tool_calls()~60 lines (new)tools/invoker.py:299—_execute_tool()~59 lines (new)providers/registry.py:163—_build_driver()~58 lines (new)providers/resilience/retry.py:42—execute()~57 lines (new)tools/file_system/delete_file.py:81—execute()~57 linestemplates/loader.py:151—load_template()~56 lines (new)templates/renderer.py:326—_expand_agents()~56 lines (new)budget/tracker.py:171—build_summary()~54 lines (new)templates/loader.py:312—_parse_template_yaml()~54 lines (new)engine/task_execution.py:135—with_transition()~53 linesproviders/routing/_strategy_helpers.py:193—_fastest_within_budget()~53 lines (new)templates/loader.py:87—list_templates()~52 lines (new)templates/renderer.py:49—render_template()~52 lines (new)tools/git_tools.py:129—__init__()~52 lines (new)providers/base.py:279—_rate_limited_call()~51 lines (new)
-
Add loggers to 3 business modules + WARNING/ERROR logs before raises. Re-scan for any new modules without loggers.
templates/presets.py—get_personality_preset()raisesKeyErrorwithout loggingobservability/correlation.py— raisesTypeErrorwithout loggingconfig/utils.py— no logger at all
-
NotBlankStr audit + fix — change 2 known
strfields toNotBlankStr. Re-scan after merging pending branches for any newstridentifier fields.templates/schema.py:212—CompanyTemplate.workflowtemplates/schema.py:216—CompanyTemplate.communication- Then sweep entire codebase for any other
strfields that should beNotBlankStr
-
Sync ruff version —
.pre-commit-config.yamlpins v0.15.4,pyproject.tomlpins ==0.15.5. Investigate making Dependabot update both files together (grouped update config or post-bump automation).
Post-Merge Reviewer Findings (NEW)
Findings from external reviewers (greptile) that arrived after PRs #149, #150, #151 were merged. All are valid and should be fixed.
-
raise exc from build_excreverses exception causality (PR feat: implement crash recovery with fail-and-reassign strategy #149,agent_engine.py:741) —excis the original error,build_excis secondary. Should beraise exc from Noneto suppress the misleading chain. Thebuild_excis already captured in the structured log on line 738. -
Unknown
actionsilently falls through todeleteinGitBranchTool(PR feat: implement built-in git tools with security hardening #150,git_tools.py:449-454) — the finalif/elifchain has noelseguard for the delete branch. If schema validation is bypassed, any unrecognised action would silently executegit branch -d/-D <name>. Add a defensive guard before the delete path. -
ref2accepted withoutref1produces misleading diff (PR feat: implement built-in git tools with security hardening #150,git_tools.py:310-317) — when onlyref2is supplied, git interprets it asgit diff <ref2>(diff against working tree), not a two-ref comparison. Return an error whenref2is provided withoutref1. -
Clone URL credentials logged in plaintext (PR feat: implement built-in git tools with security hardening #150,
_git_base.py:192-197) — security issue. The fullargslist including clone URLs is logged at debug and warning levels. URLs with embedded credentials (user:password@host) are written verbatim. Sanitise URLs before logging, pass originals only to subprocess. -
_UNSAFE_GLOB_REmisses trailing..without separator (PR feat: implement built-in file system tools (#18) #151,list_directory.py:27) — regex(^|[/\\])\.\.[/\\]|^\.\.$doesn't catchsubdir/..(trailing..). Fix: change to(^|[/\\])\.\.([/\\]|$)|^\.\.$to anchor on$as well. Without this, aValueErrorfrom pathlib escapes the tool boundary. -
recursive=Falsedoesn't prevent**glob patterns (PR feat: implement built-in file system tools (#18) #151,list_directory.py:54-60) — whenpatterncontains**,resolved.glob("**/*.py")still recurses via pathlib regardless of therecursiveflag. Detect and reject**in pattern whenrecursive=False.
Test Coverage
-
templates/errors.py(29% → 90%+) — createtests/unit/templates/test_errors.py, test all__str__()methods andConfigLocationformatting -
templates/renderer.py(75% → 90%+) — add tests for type-guard branches, YAML parse errors, department building,_validate_as_root_configerror path -
templates/loader.py(78% → 90%+) — add tests forlist_templates()user-dir scanning, pass-1 float fallback, defective builtin detection, error paths -
tools/file_system/_base_fs_tool.py(66% → 90%+) — test base class logic (new — from merged feat/file-system-tools) -
tools/file_system/list_directory.py(76% → 90%+) — test glob edge cases, symlink handling, error paths (new — from merged feat/file-system-tools) -
tools/file_system/write_file.py(82% → 90%+) — test write error paths, permission handling (new — from merged feat/file-system-tools) -
Module-level safety guards — test via monkeypatch in all 3 files:
engine/prompt_template.py(67%) —AUTONOMY_INSTRUCTIONSexhaustiveness guardcore/task_transitions.py(72%) —VALID_TRANSITIONSexhaustiveness guard +validate_transitiondefensive branchcore/role_catalog.py(87%) — duplicate/missing guards (lines 398-411)
-
observability/sinks.py(87% → 95%+) — add OS-agnostic mocked tests for file handler creation,OSErrorpaths,WatchedFileHandlerbranch -
agent_engine.pydefensive paths — test all 3. Re-scan — line numbers may have shifted and new defensive paths may exist.- Lines 284-287:
try/exceptaround_log_completion(error swallowing) - Lines 456-466: task-assigned-to-different-agent check
- Line 329: timeout code path
- Lines 284-287:
-
react_loop.py:185budget checker exception — test that the loop continues when budget checker throws, warning is logged -
Refactor repetitive tests to use
@pytest.mark.parametrize— convert the most repetitive test files. Re-scan after future merges for any new repetitive patterns.test_task_transitions.py— 32 individual transition tests → parametrized (0 parametrize currently)test_permissions.py— 43 per-access-level tests → parametrized (0 parametrize currently)test_role_catalog.py— 27 per-role validation tests → parametrized (2 parametrize currently)test_task.py— 63 tests, 0 parametrized (520 lines) —TestTaskAssignmentConsistencyandTestTaskStringValidationare prime candidates- Add CLAUDE.md note: "Prefer
@pytest.mark.parametrizefor testing similar cases"
Prevention (Hookify Rules)
- Hookify rule: parenthesized except — warn on
except (pattern in src/ files (PostToolUse on Write/Edit) - Hookify rule: missing get_logger — warn when creating/editing .py files in
src/ai_company/that contain function/class definitions but noget_loggerimport (exclude__init__.py,enums.py,errors.py,types.py,models.py,protocol.py) - Hookify rule: function length — warn when a function body exceeds 50 lines in src/ files
NOT in scope (tracked elsewhere)
- E2E tests — covered by open issue End-to-end integration test: single agent receives and completes a task #24
- DESIGN_SPEC deviations — none found, nothing to fix
Audit Evidence
- 8 parallel audit agents verified all 39 closed issues (M0: 8, M1: 9, M2: 8, M2.5: 8, M3-closed: 6)
- Full test suite: 2054 tests passed, 0 failures (now 2194 after branch merges)
- Coverage: 95.36% (now 94.54% — slight dip from new code in merged branches)
- mypy: 0 errors in 227 files
- ruff: all checks passed
- DESIGN_SPEC alignment: zero breaking deviations
- (2026-03-07) 3 pending branches audited:
feat/file-system-tools,feat/git-tools,feat/crash-recovery— findings integrated above - (2026-03-07, update 2) All 3 branches merged. Full re-scan done. 6 post-merge reviewer findings added. 17 additional long functions added. 3 new low-coverage files added.