Skip to content

chore: post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules#152

Merged
Aureliolo merged 3 commits intomainfrom
chore/post-audit-cleanup
Mar 7, 2026
Merged

chore: post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules#152
Aureliolo merged 3 commits intomainfrom
chore/post-audit-cleanup

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

@Aureliolo Aureliolo commented Mar 7, 2026

Summary

Comprehensive post-audit cleanup across 49 files (+2195/-1099 lines):

  • PEP 758 except syntax: Converted parenthesized except (A, B): to except A, B: throughout (where no as binding)
  • Missing loggers: Added logger = get_logger(__name__) to config/utils.py, observability/correlation.py, templates/presets.py + new events/correlation.py constants
  • Bug fixes: Unsafe glob regex (trailing ..), ref2 without ref1 in GitDiffTool, unknown branch action fallback, credential sanitization in git command logging, ** without recursive=True guard
  • Long function refactoring: 32+ functions decomposed into focused helpers (all < 50 lines per CLAUDE.md)
  • Test improvements: Repetitive tests consolidated with @pytest.mark.parametrize, new edge-case coverage (template errors, file system helpers, git tools, guard logic), 3 new test files
  • Hookify rules: Added function-length, missing-logger, pep758-except prevention rules
  • Security: Flag-injection guard on --author/--since/--until in GitLogTool, credential redaction in git logging
  • Type tightening: TemplateAgentConfig.model and CompanyTemplate.workflow/communication upgraded to NotBlankStr
  • Docs: DESIGN_SPEC.md updated (added events/correlation.py to §15.3, crash recovery status → "Adopted" in §15.5)
  • Metadata fix: list_directory total_entries now reflects pre-truncation count with truncated flag

Closes #148

Pre-PR Review

Pre-reviewed by 10 specialized agents in parallel:

  • code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency

Findings: 16 total (4 MAJOR, 8 MEDIUM, 4 MINOR) → 13 implemented, 3 skipped (intentional design choices)

Test Plan

  • uv run ruff check src/ tests/ — all checks passed
  • uv run ruff format src/ tests/ — all formatted
  • uv run mypy src/ tests/ — no issues in 257 source files
  • uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 2324 passed, 4 skipped, 96.58% coverage

Aureliolo and others added 2 commits March 7, 2026 10:58
… tests, hookify rules (#148)

Phase 1: Sync ruff rev to v0.15.5, add NotBlankStr to template schema fields,
add parametrize convention to CLAUDE.md.

Phase 2: Add missing loggers to config/utils.py, templates/presets.py, and
observability/correlation.py. Create events/correlation.py module.

Phase 3: Fix exception causality in agent_engine (raise from None), add
GitBranchTool unknown-action guard, add ref2-without-ref1 validation in
GitDiffTool, add git credential sanitization in log output, fix _UNSAFE_GLOB_RE
trailing .. regex, block ** glob without recursive flag.

Phase 4: Refactor 32 long functions (>50 lines) by extracting private helpers.
No behavioral changes — same class/module, same public API.

Phase 5: Add 3 new test files (test_errors, test_base_fs_tool,
test_prompt_template), expand 9 existing test files with 46+ new tests,
parametrize 4 test files to reduce duplication. Coverage: 96.52%.

Phase 6: Add 3 hookify prevention rules (PEP 758 except syntax, missing
logger detection, function length reminder).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 10 agents, 13 findings addressed:

- Remove redundant isinstance check in react_loop._process_turn_response
- Add MemoryError/RecursionError to Raises docs (run, execute, invoke_all)
- Add `as exc` to personality preset KeyError catch in renderer
- Add _check_ref guard on --author/--since/--until in GitLogTool
- Extract _build_filter_args helper to reduce complexity
- Fix _handle_retryable_error docstring to clarify return semantics
- Change TemplateAgentConfig.model from str to NotBlankStr
- Add events/correlation.py to DESIGN_SPEC.md §15.3
- Update crash recovery status to "Adopted" in DESIGN_SPEC.md §15.5
- Add `as exc` to built-in template except block in loader
- Restore discovery failure info in load_config Raises docstring
- Fix total_entries metadata to reflect pre-truncation count
- Add truncated flag to listing metadata
Copilot AI review requested due to automatic review settings March 7, 2026 10:29
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 7, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8df532e4-158f-4969-8ef6-e752f58e33ba

📥 Commits

Reviewing files that changed from the base of the PR and between ac24f02 and 61ab402.

📒 Files selected for processing (19)
  • .claude/hookify.function-length.md
  • .claude/hookify.missing-logger.md
  • .claude/hookify.pep758-except.md
  • DESIGN_SPEC.md
  • src/ai_company/observability/correlation.py
  • src/ai_company/providers/registry.py
  • src/ai_company/templates/loader.py
  • src/ai_company/templates/presets.py
  • src/ai_company/templates/renderer.py
  • src/ai_company/templates/schema.py
  • src/ai_company/tools/_git_base.py
  • src/ai_company/tools/file_system/list_directory.py
  • src/ai_company/tools/file_system/read_file.py
  • tests/unit/core/test_role_catalog.py
  • tests/unit/core/test_task.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/templates/test_loader.py
  • tests/unit/tools/file_system/test_write_file.py
  • tests/unit/tools/git/test_git_tools.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added repository hooks for code conventions, credential redaction for git commands, and correlation logging for decorator misuse.
  • Bug Fixes
    • Improved error handling across git and filesystem tools; tightened validation for templates and config loading.
  • Refactor
    • Modularized engine execution, template loading, file operations, and provider initialization for clearer flows.
  • Documentation
    • Updated testing guidance and streamlined docstrings.
  • Tests
    • Expanded unit coverage with parameterized tests and additional edge-case guards.

Walkthrough

Adds file-event hook configs; introduces new observability event constants and logging in several modules; refactors engine, prompt, react loop, provider, and tool code into smaller helpers; modularizes file-system tools; tightens template schema types; expands and parametrizes tests; bumps ruff pre-commit pin.

Changes

Cohort / File(s) Summary
Hook configurations
​.claude/hookify.function-length.md, ​.claude/hookify.missing-logger.md, ​.claude/hookify.pep758-except.md
Add three file-event hooks: function-length reminder, missing-logger warning, and PEP-758 except-pattern rule.
Observability events & wiring
src/ai_company/observability/events/correlation.py, src/ai_company/observability/events/config.py, src/ai_company/observability/correlation.py, src/ai_company/observability/sinks.py
Add correlation event constants and CONFIG_CONVERSION_ERROR; add logger usage and emit specific correlation misuse events in decorator error paths; expose internal file handler builder for tests.
Engine & control-flow refactors
src/ai_company/engine/agent_engine.py, src/ai_company/engine/react_loop.py, src/ai_company/engine/prompt.py, src/ai_company/engine/task_execution.py
Extract post-execution pipeline, result-building, transition helpers, react-loop preparation and per-turn processing, max-token validation and prompt-building helpers, and transition update helper. Significant control-flow restructuring.
Providers & routing
src/ai_company/providers/registry.py, src/ai_company/providers/base.py, src/ai_company/providers/resilience/retry.py, src/ai_company/providers/routing/_strategy_helpers.py
Centralize factory resolution (_resolve_factory), add retryable-error helper, minor docstring changes and small routing doc edits.
Templates: schema, loader, renderer, presets
src/ai_company/templates/schema.py, src/ai_company/templates/loader.py, src/ai_company/templates/renderer.py, src/ai_company/templates/presets.py
Change several schema fields to NotBlankStr; add/centralize user-template collection and validation helpers; split renderer into validation/expansion helpers; add module logger.
Config utils & loader
src/ai_company/config/utils.py, src/ai_company/config/loader.py
Add logging for numeric conversion errors; refactor load_config flow to apply defaults/overrides/env substitution and finalize via helpers.
File-system tools (read/write/edit/delete/list)
src/ai_company/tools/file_system/read_file.py, .../write_file.py, .../edit_file.py, .../delete_file.py, .../list_directory.py, tools/_base_fs_tool.py
Modularize each tool into validation, preflight, and perform helpers; centralize OS-error handling and metadata population; tighten glob safety regex and workspace checks.
Git & tool orchestration
src/ai_company/tools/_git_base.py, src/ai_company/tools/git_tools.py, src/ai_company/tools/invoker.py
Add credential redaction and hardened env for git subprocesses; extract git process helpers; add git filter arg builder and flag-injection checks; centralize safe deepcopy and fatal-error re-raise handling in invoker.
Other modules & docs
src/ai_company/budget/tracker.py, src/ai_company/tools/git_tools.py, CLAUDE.md, DESIGN_SPEC.md, .pre-commit-config.yaml
Minor docstring adjustments, add design note for correlation constants, and bump ruff pre-commit revision.
Tests (many modules)
tests/unit/... (core, engine, templates, observability, tools, file_system, git, permissions)
Add and expand unit tests: parametrize repetitive cases, add edge-case tests for templates, file-system tools, git sanitization/flag checks, observability sinks, and engine error/guard cases. Several tests expose internal helpers (e.g., _to_float, _build_file_handler).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AgentEngine
    participant ReactLoop
    participant PostPipeline
    participant ErrorHandler

    Client->>AgentEngine: run(identity, task, ctx)
    AgentEngine->>ReactLoop: execute/react loop (turns, tool invocations)
    ReactLoop-->>AgentEngine: ExecutionResult

    alt success
        AgentEngine->>PostPipeline: _post_execution_pipeline(result, identity, agent_id, task_id)
        PostPipeline->>PostPipeline: record costs & apply transitions
        PostPipeline->>AgentEngine: _build_and_log_result -> AgentRunResult
        AgentEngine-->>Client: AgentRunResult
    else fatal error
        AgentEngine->>ErrorHandler: _handle_fatal_error(exc)
        ErrorHandler->>ErrorHandler: _build_error_execution / _build_error_prompt
        ErrorHandler-->>AgentEngine: error ExecutionResult
        AgentEngine->>PostPipeline: _post_execution_pipeline(error_result, ...)
        PostPipeline-->>AgentEngine: AgentRunResult (error)
        AgentEngine-->>Client: AgentRunResult (error)
    end
Loading
sequenceDiagram
    participant User
    participant TemplateLoader
    participant Builtins
    participant UserStore
    participant Validator

    User->>TemplateLoader: load_template(name)
    TemplateLoader->>Builtins: _load_builtin(name)
    alt builtin found
        Builtins-->>TemplateLoader: Template
    else not found
        TemplateLoader->>UserStore: _collect_user_templates / _load_from_file
        UserStore->>Validator: _validate_template_structure(data, source)
        Validator-->>UserStore: validated dict
        UserStore-->>TemplateLoader: Template
    end
    TemplateLoader-->>User: TemplateInfo
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: post-audit cleanup covering PEP 758 syntax, loggers, bug fixes, refactoring, tests, and hookify rules.
Description check ✅ Passed The description comprehensively details all major changes, including PEP 758 conversion, added loggers, bug fixes, refactoring, test improvements, security enhancements, and type tightening, directly relating to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/post-audit-cleanup
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch chore/post-audit-cleanup

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 7, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a comprehensive cleanup and enhancement of the codebase following a recent audit. The changes focus on improving code quality through extensive refactoring, bolstering security measures, and expanding test coverage to ensure robustness. It also introduces new development guardrails and refines logging practices for better system understanding.

Highlights

  • Code Quality & Refactoring: Refactored over 30 functions into smaller, more focused helpers (all under 50 lines) to improve readability and maintainability across various modules, including config loading, agent engine execution, prompt building, ReAct loop, and task execution.
  • New Hookify Rules: Introduced three new hookify rules: function-length to enforce function size limits, missing-logger to ensure proper logger instantiation, and pep758-except to prevent incorrect exception syntax.
  • Bug Fixes & Security Enhancements: Addressed several bugs, including an unsafe glob regex in file system tools, ref2 usage without ref1 in GitDiffTool, and an unknown branch action fallback. Enhanced security by implementing flag-injection guards for GitLogTool parameters and credential redaction in git command logging.
  • Logging Improvements: Integrated loggers into config/utils.py, observability/correlation.py, and templates/presets.py, along with new correlation event constants, to provide better observability and error reporting.
  • Test Suite Expansion: Consolidated repetitive tests using @pytest.mark.parametrize, added new edge-case coverage for template errors, file system helpers, git tools, and guard logic, and introduced three new test files.
  • Type Tightening: Upgraded TemplateAgentConfig.model and CompanyTemplate.workflow/communication fields to NotBlankStr for stricter type validation.
Changelog
  • .claude/hookify.function-length.md
    • Added a new hookify rule to warn about functions exceeding 50 lines.
  • .claude/hookify.missing-logger.md
    • Added a new hookify rule to warn about Python modules missing a logger instance.
  • .claude/hookify.pep758-except.md
    • Added a new hookify rule to warn about PEP 758 violations in except statements.
  • .pre-commit-config.yaml
    • Updated the ruff pre-commit hook version to v0.15.5.
  • CLAUDE.md
    • Added a guideline to prefer @pytest.mark.parametrize for testing similar cases.
  • DESIGN_SPEC.md
    • Added correlation.py to the events package in the file structure documentation.
    • Updated the status of 'Crash recovery' from 'Planned (M3)' to 'Adopted (M3)' in the design specification.
  • src/ai_company/budget/tracker.py
    • Simplified the docstring for build_summary function.
  • src/ai_company/config/loader.py
    • Refactored the load_config function by extracting logic into _load_and_merge_overrides and _finalize_config helper functions.
    • Simplified the docstring for load_config.
  • src/ai_company/config/utils.py
    • Added a logger instance to the module.
    • Integrated logging for CONFIG_CONVERSION_ERROR when coercing values to float fails.
  • src/ai_company/engine/agent_engine.py
    • Simplified docstrings for run and _execute methods.
    • Refactored _execute into _post_execution_pipeline and _build_and_log_result helper methods.
    • Extracted error handling logic from _handle_fatal_error into _build_error_execution and _build_error_prompt helper functions.
    • Modified _handle_fatal_error to re-raise exceptions without chaining the original cause when building error results fails.
  • src/ai_company/engine/prompt.py
    • Refactored build_system_prompt into _validate_max_tokens and _log_and_return helper functions.
    • Simplified docstrings for build_system_prompt, _trim_sections, and _render_with_trimming.
    • Extracted logging for prompt trimming results into _log_trim_results.
    • Extracted system prompt result assembly into _build_prompt_result.
  • src/ai_company/engine/react_loop.py
    • Simplified the main execute method's docstring.
    • Refactored execute into _prepare_loop and _process_turn_response helper methods.
    • Extracted error handling for missing tool invoker into _missing_invoker_error.
  • src/ai_company/engine/task_execution.py
    • Simplified the docstring for with_transition.
    • Extracted the logic for building transition updates into _build_transition_updates helper function.
  • src/ai_company/observability/correlation.py
    • Added a logger instance to the module.
    • Integrated logging for CORRELATION_SYNC_DECORATOR_MISUSE and CORRELATION_ASYNC_DECORATOR_MISUSE events.
  • src/ai_company/observability/events/config.py
    • Added CONFIG_CONVERSION_ERROR constant for configuration conversion errors.
  • src/ai_company/observability/events/correlation.py
    • Added CORRELATION_SYNC_DECORATOR_MISUSE and CORRELATION_ASYNC_DECORATOR_MISUSE event constants.
  • src/ai_company/providers/base.py
    • Simplified the docstring for _rate_limited_call.
  • src/ai_company/providers/registry.py
    • Refactored _build_driver into a _resolve_factory helper function to separate factory resolution from driver instantiation.
  • src/ai_company/providers/resilience/retry.py
    • Extracted retryable error handling logic into _handle_retryable_error helper method.
  • src/ai_company/providers/routing/_strategy_helpers.py
    • Simplified the docstring for _fastest_within_budget.
  • src/ai_company/templates/loader.py
    • Simplified the docstring for list_templates.
    • Extracted user template collection logic into _collect_user_templates helper function.
    • Extracted _parse_template_yaml structure validation into _validate_template_structure helper function.
  • src/ai_company/templates/presets.py
    • Added a logger instance to the module.
    • Integrated logging for TEMPLATE_PERSONALITY_PRESET_UNKNOWN when an unknown personality preset is requested.
  • src/ai_company/templates/renderer.py
    • Simplified the docstring for render_template.
    • Refactored _build_config_dict into _validate_list and _extract_numeric_config helper functions.
    • Extracted single agent expansion logic into _expand_single_agent helper function.
  • src/ai_company/templates/schema.py
    • Changed the type of TemplateAgentConfig.model from str to NotBlankStr.
    • Changed the type of CompanyTemplate.workflow and CompanyTemplate.communication from str to NotBlankStr.
  • src/ai_company/tools/_git_base.py
    • Added _sanitize_command function to redact credentials from git command arguments for logging.
    • Refactored _run_git into _build_git_env, _start_git_process, _await_git_process, and _process_git_output helper methods.
  • src/ai_company/tools/file_system/delete_file.py
    • Extracted OS error handling for file deletion into _handle_delete_error static method.
  • src/ai_company/tools/file_system/edit_file.py
    • Refactored execute into _validate_edit_args, _preflight_check_file, and _perform_edit helper methods.
  • src/ai_company/tools/file_system/list_directory.py
    • Corrected the _UNSAFE_GLOB_RE regex to properly handle trailing .. patterns.
    • Refactored _list_sync and execute into _classify_entry, _validate_list_args, _resolve_and_check_dir, and _format_listing_result helper methods.
    • Updated _format_listing_result to include total_entries and truncated flags in metadata.
  • src/ai_company/tools/file_system/read_file.py
    • Refactored execute into _validate_read_args, _preflight_check_file, and _perform_read helper methods.
  • src/ai_company/tools/file_system/write_file.py
    • Refactored execute into _validate_write_args, _resolve_write_path, and _perform_write helper methods.
  • src/ai_company/tools/git_tools.py
    • Added _GIT_LOG_SCHEMA constant for GitLogTool parameters.
    • Implemented flag injection guards for author, since, and until parameters in GitLogTool.
    • Added validation to GitDiffTool to ensure ref2 is not used without ref1.
    • Added a fallback error for unknown branch actions in GitBranchTool.
  • src/ai_company/tools/invoker.py
    • Refactored _execute_tool into _safe_deepcopy_args helper method.
    • Extracted fatal error re-raising logic into _raise_fatal_errors static method.
    • Simplified docstrings for invoke_all.
  • tests/unit/core/test_role_catalog.py
    • Added tests to verify uniqueness of role names and seniority levels, and coverage of all SeniorityLevel enum values.
  • tests/unit/core/test_task.py
    • Consolidated multiple tests for whitespace field rejection into a single parameterized test.
    • Consolidated multiple tests for active status requiring assigned_to into a single parameterized test.
    • Consolidated multiple tests for non-active status allowing/disallowing assigned_to into parameterized tests.
  • tests/unit/core/test_task_transitions.py
    • Consolidated multiple tests for valid task status transitions into a single parameterized test.
    • Consolidated multiple tests for invalid task status transitions into a single parameterized test.
    • Added tests for module-level guard logic detecting missing task statuses.
  • tests/unit/engine/test_agent_engine.py
    • Added a test case to ensure an ExecutionStateError is raised if a task is assigned to a different agent.
  • tests/unit/engine/test_agent_engine_errors.py
    • Updated assertion in test_handle_fatal_error_secondary_failure_raises_original to check __cause__ is None.
  • tests/unit/engine/test_prompt_template.py
    • Added a new test file to verify AUTONOMY_INSTRUCTIONS coverage and content, and DEFAULT_TEMPLATE non-emptiness.
  • tests/unit/observability/test_events.py
    • Updated the expected set of discovered domain modules to include 'correlation'.
  • tests/unit/observability/test_sinks.py
    • Added tests for _build_file_handler to cover OSError during directory creation and file opening.
    • Added a test to verify WatchedFileHandler is used for external rotation strategy.
    • Added a test for ValueError when file_path is missing for a FILE sink.
  • tests/unit/templates/test_errors.py
    • Added a new test file to verify the string representation and construction of TemplateValidationError, TemplateNotFoundError, and TemplateRenderError.
  • tests/unit/templates/test_loader.py
    • Added tests for list_templates to ensure unreadable or invalid user templates are skipped.
    • Added a test to ensure defective built-in templates are skipped without crashing.
    • Added tests to reject path traversal attempts in load_template.
    • Added parameterized tests for _to_float coercion behavior.
  • tests/unit/templates/test_renderer.py
    • Added tests for _parse_rendered_yaml to handle non-dict top-level, missing 'template' key, and non-dict 'template' value scenarios.
    • Added a test for _build_departments to handle invalid budget percentages.
    • Added a test for _validate_as_root_config to ensure TemplateValidationError is raised on validation failure.
    • Added a test for _collect_variables to ensure extra user variables are passed through.
  • tests/unit/templates/test_schema.py
    • Added tests to reject blank or whitespace values for TemplateAgentConfig.model, CompanyTemplate.workflow, and CompanyTemplate.communication.
  • tests/unit/tools/file_system/test_base_fs_tool.py
    • Added a new test file to verify the _map_os_error helper function correctly maps OS errors to log keys and messages.
  • tests/unit/tools/file_system/test_list_directory.py
    • Added tests for unsafe glob patterns, including trailing .. and ** without recursive=True.
    • Added tests for doublestar glob patterns with and without recursive flag.
    • Added tests for OSError during directory iteration and absolute glob pattern rejection.
    • Added a test to ensure non-symlink entries resolving outside the workspace are excluded from listings.
  • tests/unit/tools/file_system/test_write_file.py
    • Added tests for _perform_write to handle IsADirectoryError and generic OSError from _write_sync.
    • Added a test to ensure temporary files are cleaned up on write failures and original files remain unchanged.
  • tests/unit/tools/git/test_git_tools.py
    • Added parameterized tests for flag injection vulnerabilities in GitLogTool's filter parameters (author, since, until).
    • Added a test to ensure GitDiffTool returns an error if ref2 is provided without ref1.
    • Added a test to ensure GitBranchTool returns an error for unknown branch actions.
    • Added tests for _sanitize_command to verify credential redaction in git command arguments.
  • tests/unit/tools/test_permissions.py
    • Consolidated multiple tests for ToolAccessLevel categories (Sandboxed, Restricted, Standard) into a single parameterized test.
Activity
  • The pull request description indicates a comprehensive pre-audit cleanup across 49 files, involving 10 specialized agents for review.
  • The pre-review process identified 16 findings (4 MAJOR, 8 MEDIUM, 4 MINOR), with 13 implemented and 3 intentionally skipped.
  • The author has confirmed that ruff check, ruff format, and mypy checks passed successfully.
  • The test plan includes pytest execution with automatic parallelism and 96.58% code coverage, with 2324 tests passing and 4 skipped.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request performs a wide-ranging cleanup and refactoring across the codebase, including converting except syntax to follow PEP 758, adding missing loggers, fixing bugs related to unsafe glob patterns and git command handling, and refactoring long functions. The test suite has been substantially improved by consolidating repetitive tests using @pytest.mark.parametrize and adding new tests, and new hookify rules have been introduced. While the PR aims to enhance security through general credential sanitization and redaction of credentials in logs, a critical security issue was identified: authentication tokens in URLs may still be leaked to logs if they do not follow a specific 'user:pass' format within the git command logging sanitization logic. This vulnerability requires immediate attention. Additionally, the adoption of Python 3.13 syntax for exception handling (PEP 758) is a deliberate design choice but may impact compatibility with older Python versions.


_DEFAULT_TIMEOUT: Final[float] = 30.0

_CREDENTIAL_RE = re.compile(r"(https?://)[^@/:]+:[^@/:]+@")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The regular expression used for redacting credentials in git command logs is too restrictive. It specifically looks for a colon between the username and password/token ([^@/:]+:[^@/:]+@). However, many git hosting services (like GitHub) allow using a token as the username without a password (e.g., https://<token>@github.com). In such cases, the regex will fail to match, and the full URL including the sensitive token will be logged in plain text. This poses a risk of credential leakage in application logs.

Suggested change
_CREDENTIAL_RE = re.compile(r"(https?://)[^@/:]+:[^@/:]+@")
_CREDENTIAL_RE = re.compile(r"(https?://)[^@/]+@")

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 7, 2026

Greptile Summary

Large-scale post-audit cleanup across 49 files touching the engine, tools, templates, providers, and observability layers. The changes are primarily structural (32+ long functions decomposed into focused helpers), with targeted bug fixes (unsafe glob regex, ref2-without-ref1, branch action fallback, credential sanitization) and security hardening (flag-injection guards on --author/--since/--until in GitLogTool). Test coverage is meaningfully extended with parametrized cases and three new test files.

Key changes and observations:

  • Behavior change in _expand_single_agent (renderer.py): Unknown personality preset names now raise TemplateRenderError instead of logging a warning and continuing without a personality. This is a fail-fast improvement, but it's a breaking semantic change — any deployed template with a misspelled or removed preset name will now fail to render rather than degrade gracefully. Worth confirming this is intentional and documented for callers.
  • [DIR] output format changed (list_directory.py): _classify_entry returns [DIR] name/ (one space) instead of the original [DIR] name/ (two spaces), which was intentional padding to align the content column with [FILE] entries. Consumers or tests parsing the formatted output may be affected.
  • total_entries vs directories + files inconsistency (list_directory.py): total_entries is now correctly the pre-truncation count, but when truncated it is capped at MAX_ENTRIES + 1 = 1001 (not the actual directory size), while directories and files are computed from the post-truncation slice (at most MAX_ENTRIES = 1000). The two sides of the metadata therefore do not add up when truncated=True.
  • _check_ref reused for author/date validation (git_tools.py): The function name and docstring describe it as a guard for "ref or branch name strings," but it now also validates --author, --since, and --until values. The injection protection is correct, but the mismatch in naming and docs will confuse future maintainers.
  • raise exc from None (agent_engine.py): Suppresses the chained exception context in the error-result build path; build_exc is logged but not preserved in the exception chain.
  • All other refactoring (engine, config loader, provider registry, retry, react loop, prompt builder) is clean with no logic changes.

Confidence Score: 4/5

  • Safe to merge after confirming the personality-preset fail-fast behavior change is intentional and the minor formatting/metadata issues are acceptable.
  • The vast majority of changes are clean structural refactors with no logic changes, backed by 2324 passing tests at 96.58% coverage. Security fixes (credential redaction, flag-injection guards, glob regex) are correct. The score is not 5 due to a notable breaking behavior change (unknown preset → raises vs. warns), an output format change to list_directory that may affect consumers, and a metadata inconsistency in total_entries vs directories+files counts when truncated.
  • src/ai_company/templates/renderer.py (personality preset behavior change) and src/ai_company/tools/file_system/list_directory.py (formatting and metadata consistency).

Important Files Changed

Filename Overview
src/ai_company/tools/git_tools.py Good security improvements: ref2-without-ref1 guard, flag-injection checks for --author/--since/--until, unknown branch action fallback. _check_ref reuse for non-ref values is slightly misleading.
src/ai_company/tools/_git_base.py Clean refactor of _run_git into _build_git_env, _start_git_process, _await_git_process, and _process_git_output. Credential sanitization added correctly throughout.
src/ai_company/tools/file_system/list_directory.py Unsafe glob regex fixed, ** guard added, truncated flag added to metadata. However, [DIR] output alignment changed (double-space to single-space), and total_entries vs directories+files counts are inconsistent when truncated.
src/ai_company/templates/renderer.py Good refactoring of _expand_agents and _build_config_dict. Breaking behavior change: unknown personality preset now raises TemplateRenderError instead of logging a warning and continuing silently.
src/ai_company/engine/agent_engine.py Good decomposition into _post_execution_pipeline, _build_and_log_result, _transition_to_complete, and _build_error_execution. raise exc from None suppresses the chained exception context unnecessarily.
src/ai_company/templates/loader.py Path traversal detection improved with explicit separator/.. check; _collect_user_templates extracted cleanly; builtin template exception now captures exc for logging. All changes look correct.
src/ai_company/engine/react_loop.py Clean extraction of _prepare_loop, _process_turn_response, and _missing_invoker_error. No logic changes; purely structural refactoring.
src/ai_company/providers/registry.py Factory resolution extracted into _resolve_factory. Instantiation failures now log with PROVIDER_DRIVER_FACTORY_MISSING event, which is slightly inaccurate (the factory was found; instantiation failed), but functionally acceptable.
src/ai_company/providers/resilience/retry.py _handle_retryable_error extracted cleanly. Returning None for non-retryable and the error for retryable is a clear contract; no logic changes.
src/ai_company/observability/correlation.py Logger added and decorator misuse cases now log structured warnings before raising. New events/correlation.py constants correctly introduced.
src/ai_company/templates/schema.py Type tightening: TemplateAgentConfig.model, CompanyTemplate.workflow/communication, and TemplateMetadata.tags upgraded to NotBlankStr. Clean and correct.
src/ai_company/config/loader.py Override application extracted to _load_and_merge_overrides and final validation to _finalize_config. Pure structural refactor with no behavior changes.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant _run_git
    participant _build_git_env
    participant _start_git_process
    participant _await_git_process
    participant _process_git_output

    Caller->>_run_git: args, cwd, deadline
    _run_git->>_build_git_env: ()
    _build_git_env-->>_run_git: hardened env dict
    _run_git->>_start_git_process: args, work_dir, env
    alt OSError
        _start_git_process-->>_run_git: ToolExecutionResult(is_error=True)
        _run_git-->>Caller: error result
    else success
        _start_git_process-->>_run_git: asyncio.Process
        _run_git->>_await_git_process: proc, args, deadline
        alt TimeoutError
            _await_git_process-->>_run_git: ToolExecutionResult(is_error=True)
            _run_git-->>Caller: error result
        else success
            _await_git_process-->>_run_git: (stdout_bytes, stderr_bytes)
            _run_git->>_process_git_output: args, returncode, stdout, stderr
            _process_git_output-->>_run_git: ToolExecutionResult
            _run_git-->>Caller: result
        end
    end
Loading

Comments Outside Diff (3)

  1. src/ai_company/templates/renderer.py, line 1143-1146 (link)

    Breaking behavior change: unknown preset now raises instead of warning

    The old code logged a warning and continued (the agent got no personality), but the new code raises a TemplateRenderError, which completely aborts template rendering. Any existing template that references a misspelled or deleted preset name will now fail to render rather than silently degrade.

    This is intentional fail-fast behavior, but it is a semantic breaking change that's not called out explicitly in the PR description (it's listed as "fixing duplicate logging"). Any callers relying on graceful degradation for unknown presets will now receive an exception instead of a rendered config.

    Consider confirming this behavior change is intentional and documenting it, or providing a strict=False option to preserve the old warn-and-skip path.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/ai_company/templates/renderer.py
    Line: 1143-1146
    
    Comment:
    **Breaking behavior change: unknown preset now raises instead of warning**
    
    The old code logged a warning and continued (the agent got no personality), but the new code raises a `TemplateRenderError`, which completely aborts template rendering. Any existing template that references a misspelled or deleted preset name will now fail to render rather than silently degrade.
    
    This is intentional fail-fast behavior, but it is a semantic breaking change that's not called out explicitly in the PR description (it's listed as "fixing duplicate logging"). Any callers relying on graceful degradation for unknown presets will now receive an exception instead of a rendered config.
    
    Consider confirming this behavior change is intentional and documenting it, or providing a `strict=False` option to preserve the old warn-and-skip path.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. src/ai_company/tools/file_system/list_directory.py, line 444-466 (link)

    total_entries vs directories + files inconsistency when truncated

    total is computed before truncating lines, and its max value is MAX_ENTRIES + 1 = 1001 (because _list_sync caps the iterator via islice). However, dir_count and file_count are computed from the already-truncated lines (max 1000 entries). When truncated=True, the metadata contains total_entries=1001 but directories + files <= 1000, so the two sets of counts are inconsistent — one entry is "counted" in total_entries but not reflected in either directories or files.

    The actual total count of entries in the directory also remains unknowable when truncated (it could be 1001 or 100000). The PR description says this "now reflects pre-truncation count", which is technically correct but the practical utility is limited since the true total is still not surfaced.

    This is an existing architectural limitation of the islice-based approach, but the new code makes the inconsistency slightly more visible. A comment in _format_listing_result or the docstring clarifying that total_entries means "entries scanned, up to MAX_ENTRIES + 1" would help consumers interpret this field correctly.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/ai_company/tools/file_system/list_directory.py
    Line: 444-466
    
    Comment:
    **`total_entries` vs `directories + files` inconsistency when truncated**
    
    `total` is computed before truncating `lines`, and its max value is `MAX_ENTRIES + 1 = 1001` (because `_list_sync` caps the iterator via `islice`). However, `dir_count` and `file_count` are computed from the already-truncated `lines` (max 1000 entries). When `truncated=True`, the metadata contains `total_entries=1001` but `directories + files <= 1000`, so the two sets of counts are inconsistent — one entry is "counted" in `total_entries` but not reflected in either `directories` or `files`.
    
    The actual total count of entries in the directory also remains unknowable when truncated (it could be 1001 or 100000). The PR description says this "now reflects pre-truncation count", which is technically correct but the practical utility is limited since the true total is still not surfaced.
    
    This is an existing architectural limitation of the `islice`-based approach, but the new code makes the inconsistency slightly more visible. A comment in `_format_listing_result` or the docstring clarifying that `total_entries` means "entries scanned, up to `MAX_ENTRIES + 1`" would help consumers interpret this field correctly.
    
    How can I resolve this? If you propose a fix, please make it concise.
  3. src/ai_company/tools/git_tools.py, line 819-838 (link)

    _check_ref repurposed for author and date validation

    _check_ref is documented in _git_base.py as a guard for "ref or branch name strings" — its docstring explicitly says "The ref or branch name string to validate." It is now also used here to validate author, since, and until strings, which are author patterns and date strings, not refs.

    The injection protection itself is appropriate: values are passed as flag-assignment arguments via exec (no shell), and rejecting leading - prevents flag-injection in all three cases. However, reusing a function named _check_ref for non-ref inputs is misleading and its documentation now contradicts its actual scope.

    A small clarifying rename or an updated docstring on _check_ref (e.g., "Reject any value starting with - to prevent flag injection in git arguments") would accurately reflect the broader usage and prevent future confusion.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/ai_company/tools/git_tools.py
    Line: 819-838
    
    Comment:
    **`_check_ref` repurposed for author and date validation**
    
    `_check_ref` is documented in `_git_base.py` as a guard for "ref or branch name strings" — its docstring explicitly says "The ref or branch name string to validate." It is now also used here to validate `author`, `since`, and `until` strings, which are author patterns and date strings, not refs.
    
    The injection protection itself is appropriate: values are passed as flag-assignment arguments via `exec` (no shell), and rejecting leading `-` prevents flag-injection in all three cases. However, reusing a function named `_check_ref` for non-ref inputs is misleading and its documentation now contradicts its actual scope.
    
    A small clarifying rename or an updated docstring on `_check_ref` (e.g., "Reject any value starting with `-` to prevent flag injection in git arguments") would accurately reflect the broader usage and prevent future confusion.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Last reviewed commit: 61ab402

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is a comprehensive post-audit cleanup PR touching ~49 files with refactoring, bug fixes, security improvements, and test consolidation. The changes stem from a structured review by 10 specialized agents and systematically address code quality issues across the codebase.

Changes:

  • Bug fixes & security: Fixes unsafe glob regex (trailing ..), ref2 without ref1 validation in GitDiffTool, unknown branch action fallback, credential sanitization in git command logging, ** without recursive=True guard, and flag-injection guards on --author/--since/--until in GitLogTool.
  • Function decomposition: Long functions (32+) decomposed into focused helpers under 50 lines per CLAUDE.md convention across tools, engine, templates, providers, and config modules.
  • Test improvements: Repetitive tests consolidated with @pytest.mark.parametrize, new edge-case coverage added (3 new test files), and test assertions tightened (e.g., total_entries metadata, __cause__ is None).

Reviewed changes

Copilot reviewed 49 out of 49 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/ai_company/tools/_git_base.py Credential sanitization via _sanitize_command(), decomposed _run_git() into helpers
src/ai_company/tools/git_tools.py Flag-injection guard, ref2/ref1 validation, unknown branch action fallback, schema extraction
src/ai_company/tools/file_system/list_directory.py Fixed glob regex, added **/recursive guard, extracted _classify_entry, _format_listing_result
src/ai_company/tools/file_system/write_file.py Decomposed execute() into _validate_write_args, _resolve_write_path, _perform_write
src/ai_company/tools/file_system/read_file.py Decomposed execute() into _validate_read_args, _preflight_check_file, _perform_read
src/ai_company/tools/file_system/edit_file.py Decomposed execute() into _validate_edit_args, _preflight_check_file, _perform_edit
src/ai_company/tools/file_system/delete_file.py Extracted _handle_delete_error from execute
src/ai_company/tools/invoker.py Extracted _safe_deepcopy_args, _raise_fatal_errors; docstring cleanup
src/ai_company/templates/schema.py model, workflow, communication upgraded to NotBlankStr
src/ai_company/templates/renderer.py Extracted _validate_list, _extract_numeric_config, _expand_single_agent
src/ai_company/templates/presets.py Added logger and warning log in get_personality_preset
src/ai_company/templates/loader.py Fixed PEP 758 except syntax bug, extracted _collect_user_templates, _validate_template_structure
src/ai_company/providers/registry.py Extracted _resolve_factory from _build_driver
src/ai_company/providers/resilience/retry.py Extracted _handle_retryable_error
src/ai_company/providers/base.py Docstring trimming
src/ai_company/providers/routing/_strategy_helpers.py Docstring trimming
src/ai_company/observability/events/correlation.py New module with correlation event constants
src/ai_company/observability/events/config.py Added CONFIG_CONVERSION_ERROR constant
src/ai_company/observability/correlation.py Added logger and structured event logging for decorator misuse
src/ai_company/engine/agent_engine.py Decomposed functions, raise exc from None fix, extracted _build_error_prompt
src/ai_company/engine/react_loop.py Extracted _prepare_loop, _process_turn_response, _missing_invoker_error
src/ai_company/engine/prompt.py Extracted _validate_max_tokens, _log_and_return, _build_prompt_result, _log_trim_results
src/ai_company/engine/task_execution.py Extracted _build_transition_updates
src/ai_company/config/utils.py Added logger and warning logs for conversion errors
src/ai_company/config/loader.py Decomposed load_config into _load_and_merge_overrides, _finalize_config
src/ai_company/budget/tracker.py Docstring trimming
DESIGN_SPEC.md Added events/correlation.py to tree, crash recovery status → "Adopted"
CLAUDE.md Added parametrize testing convention
.pre-commit-config.yaml Ruff version bump v0.15.4 → v0.15.5
.claude/hookify.*.md Three new hookify prevention rules
tests/unit/tools/test_permissions.py Consolidated 3 test classes into parametrized test
tests/unit/tools/git/test_git_tools.py New tests for flag injection, ref2 without ref1, unknown action, credential sanitization
tests/unit/tools/file_system/test_list_directory.py New tests for glob edge cases, ** guard, OSError, absolute patterns
tests/unit/tools/file_system/test_write_file.py New tests for write error handling, temp file cleanup
tests/unit/tools/file_system/test_base_fs_tool.py New file testing _map_os_error
tests/unit/templates/test_schema.py Tests for blank model, workflow, communication rejection
tests/unit/templates/test_renderer.py Tests for _parse_rendered_yaml, _build_departments, _validate_as_root_config, _collect_variables
tests/unit/templates/test_loader.py Tests for user template errors, path traversal, _to_float
tests/unit/templates/test_errors.py New file testing template error formatting
tests/unit/observability/test_sinks.py Tests for file handler error paths
tests/unit/observability/test_events.py Added correlation module to discovery test
tests/unit/engine/test_prompt_template.py New file testing autonomy instructions and default template
tests/unit/engine/test_agent_engine_errors.py Tightened assertion for __cause__ is None
tests/unit/engine/test_agent_engine.py New test for task assigned to different agent
tests/unit/core/test_task_transitions.py Consolidated tests with parametrize, added guard edge cases
tests/unit/core/test_task.py Consolidated whitespace/assignment tests with parametrize
tests/unit/core/test_role_catalog.py Added guard/invariant tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +373 to +380
except KeyError as exc:
logger.warning(
TEMPLATE_PERSONALITY_PRESET_UNKNOWN,
preset_name=preset_name,
agent_name=name,
available=sorted(PERSONALITY_PRESETS),
error=str(exc),
)
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When get_personality_preset() is called with an unknown name, the warning TEMPLATE_PERSONALITY_PRESET_UNKNOWN is now logged twice: once inside get_personality_preset() at presets.py:90-94 (newly added), and again in the except KeyError handler here in the renderer. This produces duplicate log entries for the same event. Either the logging in presets.py should be removed (since the caller already logs it with more context like agent_name), or this logging here should be removed since presets.py now handles it.

Suggested change
except KeyError as exc:
logger.warning(
TEMPLATE_PERSONALITY_PRESET_UNKNOWN,
preset_name=preset_name,
agent_name=name,
available=sorted(PERSONALITY_PRESETS),
error=str(exc),
)
except KeyError:
# Unknown personality preset: logging is handled inside get_personality_preset().
# Swallow the error so the agent is created without a personality.
pass

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 21

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
tests/unit/tools/git/test_git_tools.py (1)

56-69: ⚠️ Potential issue | 🟡 Minor

Fix: Pipeline failure due to outside_dir already existing.

The pipeline error indicates FileExistsError: [Errno 17] File exists: '/tmp/pytest-of-runner/pytest-0/popen-gw0/outside_dir'. This occurs because outside.mkdir() fails when running tests in parallel (pytest-xdist) and another test has already created this directory.

Use exist_ok=True or generate a unique directory name to avoid collisions in parallel test runs.

🛠️ Proposed fix
     async def test_symlink_escape_blocked(self, git_repo: Path) -> None:
-        outside = git_repo.parent / "outside_dir"
-        outside.mkdir()
+        outside = git_repo.parent / f"outside_dir_{id(self)}"
+        outside.mkdir(exist_ok=True)
         (outside / "secret.txt").write_text("secret")

Alternatively, use tmp_path_factory with a unique name or tempfile.mkdtemp() for guaranteed isolation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/tools/git/test_git_tools.py` around lines 56 - 69, The test
test_symlink_escape_blocked creates outside = git_repo.parent / "outside_dir"
and calls outside.mkdir() which can raise FileExistsError in parallel runs;
change the mkdir call to use exist_ok=True (i.e., outside.mkdir(exist_ok=True))
or generate a unique temp dir (e.g., via tmp_path_factory or tempfile.mkdtemp())
so the setup for outside_dir is robust under pytest-xdist, leaving the rest of
the test (link.symlink_to and invoking GitDiffTool.execute) unchanged.
src/ai_company/tools/file_system/edit_file.py (1)

193-249: 🧹 Nitpick | 🔵 Trivial

Consider splitting _perform_edit to meet the 50-line guideline.

This method spans 57 lines, exceeding the 50-line limit. While the logic is cohesive and readable, you could extract the result-construction (lines 218-249) into a small helper like _build_edit_result(user_path, count) to bring this under the threshold.

♻️ Suggested refactor to extract result building
+    def _build_edit_result(
+        self,
+        user_path: str,
+        count: int,
+    ) -> ToolExecutionResult:
+        """Build the result after a successful edit operation."""
+        if count == 0:
+            logger.warning(
+                TOOL_FS_EDIT_NOT_FOUND,
+                path=user_path,
+                old_text_len=len(old_text),
+            )
+            return ToolExecutionResult(
+                content=f"Text not found in {user_path}.",
+                is_error=True,
+                metadata={
+                    "path": user_path,
+                    "occurrences_found": 0,
+                    "occurrences_replaced": 0,
+                },
+            )
+        msg = f"Replaced 1 occurrence in {user_path}"
+        if count > 1:
+            msg += f" (warning: {count} total occurrences found, only first replaced)"
+        logger.info(
+            TOOL_FS_EDIT,
+            path=user_path,
+            occurrences_found=count,
+            occurrences_replaced=1,
+        )
+        return ToolExecutionResult(
+            content=msg,
+            metadata={
+                "path": user_path,
+                "occurrences_found": count,
+                "occurrences_replaced": 1,
+            },
+        )
+
     async def _perform_edit(
         self,
         user_path: str,
         resolved: Path,
         old_text: str,
         new_text: str,
     ) -> ToolExecutionResult:
         """Run the edit and return the result."""
         try:
             count = await asyncio.to_thread(
                 _edit_sync,
                 resolved,
                 old_text,
                 new_text,
             )
         except UnicodeDecodeError:
             logger.warning(TOOL_FS_ERROR, path=user_path, error="binary")
             return ToolExecutionResult(
                 content=f"Cannot edit binary file: {user_path}",
                 is_error=True,
             )
         except (FileNotFoundError, IsADirectoryError, PermissionError, OSError) as exc:
             log_key, msg = _map_os_error(exc, user_path, "editing")
             logger.warning(TOOL_FS_ERROR, path=user_path, error=log_key)
             return ToolExecutionResult(content=msg, is_error=True)
-        if count == 0:
-            logger.warning(
-                TOOL_FS_EDIT_NOT_FOUND,
-                path=user_path,
-                old_text_len=len(old_text),
-            )
-            return ToolExecutionResult(
-                content=f"Text not found in {user_path}.",
-                is_error=True,
-                metadata={
-                    "path": user_path,
-                    "occurrences_found": 0,
-                    "occurrences_replaced": 0,
-                },
-            )
-        msg = f"Replaced 1 occurrence in {user_path}"
-        if count > 1:
-            msg += f" (warning: {count} total occurrences found, only first replaced)"
-        logger.info(
-            TOOL_FS_EDIT,
-            path=user_path,
-            occurrences_found=count,
-            occurrences_replaced=1,
-        )
-        return ToolExecutionResult(
-            content=msg,
-            metadata={
-                "path": user_path,
-                "occurrences_found": count,
-                "occurrences_replaced": 1,
-            },
-        )
+        return self._build_edit_result(user_path, count, old_text)

As per coding guidelines: "Functions must be less than 50 lines".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/file_system/edit_file.py` around lines 193 - 249, The
method _perform_edit exceeds the 50-line limit; extract the result-building and
logging block that handles count==0 vs count>0 into a new helper (e.g.
_build_edit_result(user_path: str, count: int, old_text: str) ->
ToolExecutionResult) so _perform_edit only performs the edit and delegates
construction of the ToolExecutionResult and related logger calls
(TOOL_FS_EDIT_NOT_FOUND, TOOL_FS_EDIT) to that helper; ensure the helper returns
the same content, metadata keys
("path","occurrences_found","occurrences_replaced") and logging behavior so
callers of _perform_edit still receive identical results.
src/ai_company/tools/file_system/write_file.py (1)

119-140: ⚠️ Potential issue | 🟠 Major

Fix the size validation to account for text-mode newline expansion on Windows.

Line 125 counts UTF-8 bytes, but _write_sync() writes in text mode (line 55: os.fdopen(fd, "w", encoding="utf-8")). On Windows, text mode expands \n to \r\n, so content passing the preflight check can exceed MAX_WRITE_SIZE_BYTES when persisted, defeating the size limit. Use binary mode ("wb") or disable newline translation (newline="") to ensure the preflight check matches the actual file size.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/file_system/write_file.py` around lines 119 - 140, The
size preflight in _validate_write_args measures UTF-8 bytes but _write_sync
opens files in text mode which expands newlines on Windows; update _write_sync
to open the file without newline translation (e.g., use os.fdopen(fd, "w",
encoding="utf-8", newline="") or switch to binary mode "wb") so the actual bytes
written will match the UTF-8 byte count used by _validate_write_args and keep
MAX_WRITE_SIZE_BYTES enforcement consistent; verify references to
_validate_write_args, _write_sync, and MAX_WRITE_SIZE_BYTES when making the
change.
src/ai_company/templates/loader.py (1)

169-176: ⚠️ Potential issue | 🔴 Critical

Path traversal check fails to detect Windows-style separators on Unix.

The path traversal validation uses Path(name_clean).name which relies on OS-specific path parsing. On Unix systems, backslashes (\) are valid filename characters, not path separators. So Path("..\etc\passwd").name returns "..\etc\passwd" unchanged, allowing the check to pass.

This explains the pipeline failure: the test passes '..\etc\passwd' expecting rejection, but the code proceeds to the "Unknown template" error at Line 206 instead of the "Invalid template name" error at Line 172.

🐛 Proposed fix to handle both Unix and Windows path separators
     # Sanitize to prevent path traversal.
-    safe_name = Path(name_clean).name
-    if safe_name != name_clean:
+    if "/" in name_clean or "\\" in name_clean or ".." in name_clean:
         msg = f"Invalid template name {name!r}: must not contain path separators"
         raise TemplateNotFoundError(
             msg,
             locations=(ConfigLocation(file_path=f"<template:{name}>"),),
         )
+    safe_name = name_clean
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/templates/loader.py` around lines 169 - 176, The current
path-traversal check using Path(name_clean).name is OS-dependent and misses
Windows separators; update the validation in loader.py to explicitly reject any
template names containing either '/' or '\\' before further processing: check
name_clean for these characters and raise the existing TemplateNotFoundError
(with the same ConfigLocation(file_path=f"<template:{name}>")) when found,
rather than relying solely on Path(name_clean).name/safe_name; keep the existing
error message and use the same variables (name_clean, safe_name, name,
TemplateNotFoundError, ConfigLocation) so behavior and signatures remain
consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/hookify.function-length.md:
- Around line 9-12: The current matcher (field: new_text, operator: regex_match,
value) only matches "def" at column 0 or four spaces and ignores async functions
and deeper indentation; update the regex in the value to accept arbitrary
leading whitespace and an optional "async" token (e.g., match patterns like
"^\s*(?:async\s+)?def\s") so nested and async function definitions trigger the
rule and enforce the "Functions must be less than 50 lines" reminder
consistently.

In @.claude/hookify.missing-logger.md:
- Around line 27-32: The hook's conditions on field `new_text` are too narrow:
they only match top-level "def " or "class " and only search the diff hunk for
the literal "get_logger", causing false positives when a module already
imports/initializes logger or uses "async def" or indented members. Update the
rule to (1) match function/class declarations including leading whitespace and
optional "async" (e.g., regex that accepts optional leading whitespace and
"async "), and (2) check the entire file content (or a broader context) for the
presence of both the import symbol "from ai_company.observability import
get_logger" and the initialization pattern "logger = get_logger(__name__)"
instead of only scanning the diff hunk; ensure the operator uses a file-level
search or adds conditions that verify both symbols are present to suppress the
hook when a module is already compliant.

In @.claude/hookify.pep758-except.md:
- Around line 5-11: The rule in .claude/hookify.pep758-except.md currently flags
any parenthesized except via the "pattern: except\s*\(" entry, which wrongly
warns valid "except (A, B) as exc:" forms; update the pattern so it only warns
when a parenthesized except is NOT followed by an "as" clause (i.e., exclude
cases where the closing parenthesis is immediately followed by "as" and an
identifier), keeping the existing warning/action behavior and preserving the
note about the allowed "except (A, B) as exc:" form.

In `@DESIGN_SPEC.md`:
- Line 2472: Update the table row for Crash recovery to avoid overstating
automatic reassignment: change the description for M3/FailAndReassignStrategy so
it says the engine will "mark FAILED / eligible for reassignment" (or similar)
rather than asserting automated reassignment is performed; keep references to
the pluggable RecoveryStrategy and FailAndReassignStrategy, and preserve the
note about model_copy and future CheckpointStrategy/AgentContext checkpointing
so the wording aligns with §6.6.

In `@src/ai_company/engine/agent_engine.py`:
- Around line 229-242: When post-processing fails after
_run_loop_with_timeout(), the fallback path currently rebuilds from the stale
pre-loop ctx from run(), losing the true final state in
execution_result.context; update the error handling in the blocks that call
_post_execution_pipeline and _build_and_log_result (including the other
occurrences around the second and third locations) to detect a failed
post-processing and pass the final execution_result.context (or collapse into
execution_result) into _handle_fatal_error (or into whatever fallback builder
currently uses ctx) so the fallback rebuild uses execution_result.context
(conversation, turns, accumulated_cost, task state) rather than the pre-loop
ctx; ensure _handle_fatal_error signature or the fallback builder is adjusted to
accept and prefer execution_result.context when present.

In `@src/ai_company/engine/react_loop.py`:
- Around line 82-88: The loop condition is checking the bound method object
instead of its boolean result; change the while condition to call the method
(use ctx.has_turns_remaining()) so the loop respects the AgentContext max turns.
Locate the while loop in react_loop.py (after _prepare_loop and before
_check_budget) and replace the current bare reference to has_turns_remaining
with a call to has_turns_remaining() on ctx.

In `@src/ai_company/observability/correlation.py`:
- Around line 19-25: Replace the private import of get_logger with the public
package import: stop importing from ai_company.observability._logger and instead
import get_logger from ai_company.observability, then keep the existing logger =
get_logger(__name__) line; update the import statement that currently references
ai_company.observability._logger to use ai_company.observability and ensure
CORRELATION_ASYNC_DECORATOR_MISUSE and CORRELATION_SYNC_DECORATOR_MISUSE imports
remain unchanged.

In `@src/ai_company/providers/registry.py`:
- Around line 178-194: Add logging calls before raising in the error branches of
_build_driver and the non-callable branch in _resolve_factory: log the exception
when factory(name, config) fails (use logger.error or logger.warning with
context including provider name, driver_type and exc) and log a warning/error
when the produced driver is not an instance of BaseCompletionProvider (include
provider and driver_type). Also log when factory is not callable in
_resolve_factory with the same contextual fields. Ensure you reference the same
variables (driver_type, name, config, factory, driver) and keep the raise of
DriverFactoryNotFoundError unchanged after the log.
- Around line 178-194: The code mistakenly raises DriverFactoryNotFoundError for
runtime factory failures and wrong return types; change these paths to raise a
distinct error (e.g., DriverFactoryBrokenError or ProviderInitializationError)
instead of DriverFactoryNotFoundError, preserve the original exception as the
__cause__ (use "from exc") and include the original exception text in the
context, and do the same for the other occurrence around the
factory-result/type-check block so callers can distinguish "missing factory"
(DriverFactoryNotFoundError) from "factory exists but is broken/returned wrong
type" (new error) while still referencing DriverFactoryNotFoundError, factory,
driver, and BaseCompletionProvider to locate the code to change.

In `@src/ai_company/templates/renderer.py`:
- Around line 288-299: The _validate_list helper currently only checks that
rendered_data[key] is a list but not that each element is a dict, which leads to
AttributeError later in _expand_single_agent and _build_departments; update
_validate_list to iterate over the returned list and ensure each item is a dict
(or raise TemplateRenderError with a clear message mentioning the key and the
offending index/type) so callers like _expand_single_agent and
_build_departments receive validated dict items and get a clean
TemplateRenderError on bad input.
- Around line 369-380: The code currently catches a KeyError from
get_personality_preset(preset_name) and only logs a warning, allowing a template
with an unknown personality_preset to proceed silently; change this to reject
the invalid preset by re-raising a clear exception (or raising a ValueError)
after logging so template rendering fails fast. Locate the block using
agent.get("personality_preset"), get_personality_preset, and
agent_dict["personality"], keep or log the TEMPLATE_PERSONALITY_PRESET_UNKNOWN
diagnostic with preset_name/agent_name/available/error, and then raise an
exception (including the preset_name and agent name) instead of swallowing the
KeyError so callers know the template is invalid.

In `@src/ai_company/tools/file_system/list_directory.py`:
- Around line 50-65: The output prefixes are misaligned because entry.is_dir()
returns "[DIR]  {display}/" with two spaces while entry.is_symlink() and the
file branch use a single space; update the directory branch (the return in the
entry.is_dir() check) to use a single space like the others (e.g., "[DIR]
{display}/") so the prefixes for entry.is_symlink(), entry.is_dir(), and the
file return lines are consistently formatted.

In `@tests/unit/core/test_role_catalog.py`:
- Around line 252-265: The two new tests test_no_duplicate_seniority_levels and
test_no_missing_seniority_levels in TestCatalogGuards duplicate existing
assertions in TestSeniorityInfo (test_no_duplicate_levels and
test_covers_all_seniority_levels); remove these duplicate methods from
TestCatalogGuards to avoid redundancy, or alternatively replace them with
guard-specific tests that mock/patch SENIORITY_INFO (or the lookup function used
by TestCatalogGuards such as the mapping access) to assert the guard behavior
(like test_raises_lookup_error_for_missing_level) rather than re-checking the
same set/coverage logic.

In `@tests/unit/core/test_task_transitions.py`:
- Around line 151-158: The test test_module_level_guard_detects_missing_status
does not exercise the module-level import-time guard because it runs after
import; either remove this duplicate completeness check, or replace it with an
isolated import-time test that actually triggers the guard: write a new test
that spawns a fresh Python interpreter (e.g., via subprocess.run) which imports
the module that defines TaskStatus and VALID_TRANSITIONS and assert that the
import fails with ValueError (or that stderr contains the guard message); this
ensures the module-level guard is exercised at import time rather than checking
sets post-import. Include references to TaskStatus, VALID_TRANSITIONS and the
test name when making the change.

In `@tests/unit/core/test_task.py`:
- Around line 266-274: The test test_non_active_allows_assigned_to currently
only checks task.assigned_to; update it to also assert that the constructed task
retains the provided status by adding an assertion like assert task.status ==
status inside the same test (locate the test_non_active_allows_assigned_to
function and the _make_task call that uses TaskStatus values), ensuring the
parametrized TaskStatus cases (BLOCKED, CANCELLED, FAILED) are verified for both
assigned_to and status.

In `@tests/unit/engine/test_prompt_template.py`:
- Around line 44-50: The test currently recomputes the missing levels locally
instead of exercising the real guard in ai_company.engine.prompt_template;
change the test to invoke the actual guard path by either (A) importing and
calling a guard helper (e.g., validate_autonomy_instructions) from
ai_company.engine.prompt_template with the mutated AUTONOMY_INSTRUCTIONS and
asserting it raises the expected exception, or (B) monkeypatching
ai_company.engine.prompt_template.AUTONOMY_INSTRUCTIONS to the incomplete dict
and using importlib.reload(prompt_template) to trigger the module-level guard,
then assert the guard reports the removed SeniorityLevel; reference
AUTONOMY_INSTRUCTIONS, SeniorityLevel, ai_company.engine.prompt_template, and
importlib.reload or validate_autonomy_instructions to locate the code.

In `@tests/unit/observability/test_sinks.py`:
- Around line 205-219: The test test_file_open_oserror_raises_runtime_error
currently patches logging.handlers.RotatingFileHandler.__init__ which prevents
the handler object from being constructed and is brittle; change the mock to
target the actual file-open call (e.g., patch logging.FileHandler.__init__ or
the built-in open used by logging) so that _build_file_handler executes its
normal constructor path and raises OSError at the file-open step, ensuring the
RuntimeError path in _build_file_handler/SinkConfig handling is exercised
reliably.

In `@tests/unit/templates/test_loader.py`:
- Around line 252-260: The test test_defective_builtin_skipped currently only
checks type and can miss builtin entries; update it to assert that builtins were
actually dropped by either asserting templates == () (since no user templates
exist) or iterating templates from list_templates() and asserting no item has
source == "builtin"; keep the same patch of _load_builtin raising
TemplateRenderError and reference list_templates and _load_builtin in the
updated assertion to ensure the defective built-in was skipped.
- Around line 274-278: The Windows-style backslash case slips past the current
Path(name_clean).name normalization on POSIX; update load_template to explicitly
reject any template name containing either "/" or "\\" before using
Path(name_clean).name (e.g., check if "/" in name_clean or "\\" in name_clean
and raise TemplateNotFoundError("must not contain path separators")), or
alternatively normalize backslashes to the native separator first; ensure the
check references the existing name_clean variable and the load_template function
so the traversal rejection triggers consistently across platforms.

In `@tests/unit/tools/file_system/test_base_fs_tool.py`:
- Around line 37-44: The public test function test_map_os_error is missing the
required Google-style docstring; add a concise docstring above test_map_os_error
that describes the test purpose (verifies _map_os_error maps an OSError to the
expected key and message fragment), and include Google-style Args for exc,
expected_key, and expected_msg_fragment and an Optional Returns/Raises line if
applicable so it satisfies the ruff D rule.

In `@tests/unit/tools/file_system/test_write_file.py`:
- Around line 149-163: The test patches
ai_company.tools.file_system.write_file._write_sync which throws before
mkstemp() runs, so it never exercises the temp-file cleanup/rollback path;
update the test to force failure after the temp file is created by patching a
symbol executed after mkstemp(), e.g. patch pathlib.Path.replace or os.fsync (or
the replace call inside write_file) to raise OSError during the commit/rename
step while still calling write_tool.execute(arguments={"path":
"fail_target.txt", "content": "new"}), then assert no .tmp files remain and the
original file content is unchanged to verify cleanup.

---

Outside diff comments:
In `@src/ai_company/templates/loader.py`:
- Around line 169-176: The current path-traversal check using
Path(name_clean).name is OS-dependent and misses Windows separators; update the
validation in loader.py to explicitly reject any template names containing
either '/' or '\\' before further processing: check name_clean for these
characters and raise the existing TemplateNotFoundError (with the same
ConfigLocation(file_path=f"<template:{name}>")) when found, rather than relying
solely on Path(name_clean).name/safe_name; keep the existing error message and
use the same variables (name_clean, safe_name, name, TemplateNotFoundError,
ConfigLocation) so behavior and signatures remain consistent.

In `@src/ai_company/tools/file_system/edit_file.py`:
- Around line 193-249: The method _perform_edit exceeds the 50-line limit;
extract the result-building and logging block that handles count==0 vs count>0
into a new helper (e.g. _build_edit_result(user_path: str, count: int, old_text:
str) -> ToolExecutionResult) so _perform_edit only performs the edit and
delegates construction of the ToolExecutionResult and related logger calls
(TOOL_FS_EDIT_NOT_FOUND, TOOL_FS_EDIT) to that helper; ensure the helper returns
the same content, metadata keys
("path","occurrences_found","occurrences_replaced") and logging behavior so
callers of _perform_edit still receive identical results.

In `@src/ai_company/tools/file_system/write_file.py`:
- Around line 119-140: The size preflight in _validate_write_args measures UTF-8
bytes but _write_sync opens files in text mode which expands newlines on
Windows; update _write_sync to open the file without newline translation (e.g.,
use os.fdopen(fd, "w", encoding="utf-8", newline="") or switch to binary mode
"wb") so the actual bytes written will match the UTF-8 byte count used by
_validate_write_args and keep MAX_WRITE_SIZE_BYTES enforcement consistent;
verify references to _validate_write_args, _write_sync, and MAX_WRITE_SIZE_BYTES
when making the change.

In `@tests/unit/tools/git/test_git_tools.py`:
- Around line 56-69: The test test_symlink_escape_blocked creates outside =
git_repo.parent / "outside_dir" and calls outside.mkdir() which can raise
FileExistsError in parallel runs; change the mkdir call to use exist_ok=True
(i.e., outside.mkdir(exist_ok=True)) or generate a unique temp dir (e.g., via
tmp_path_factory or tempfile.mkdtemp()) so the setup for outside_dir is robust
under pytest-xdist, leaving the rest of the test (link.symlink_to and invoking
GitDiffTool.execute) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a25c05c5-cc72-4677-a883-83a6a7c4c507

📥 Commits

Reviewing files that changed from the base of the PR and between 325ef98 and ac24f02.

📒 Files selected for processing (49)
  • .claude/hookify.function-length.md
  • .claude/hookify.missing-logger.md
  • .claude/hookify.pep758-except.md
  • .pre-commit-config.yaml
  • CLAUDE.md
  • DESIGN_SPEC.md
  • src/ai_company/budget/tracker.py
  • src/ai_company/config/loader.py
  • src/ai_company/config/utils.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/engine/task_execution.py
  • src/ai_company/observability/correlation.py
  • src/ai_company/observability/events/config.py
  • src/ai_company/observability/events/correlation.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/registry.py
  • src/ai_company/providers/resilience/retry.py
  • src/ai_company/providers/routing/_strategy_helpers.py
  • src/ai_company/templates/loader.py
  • src/ai_company/templates/presets.py
  • src/ai_company/templates/renderer.py
  • src/ai_company/templates/schema.py
  • src/ai_company/tools/_git_base.py
  • src/ai_company/tools/file_system/delete_file.py
  • src/ai_company/tools/file_system/edit_file.py
  • src/ai_company/tools/file_system/list_directory.py
  • src/ai_company/tools/file_system/read_file.py
  • src/ai_company/tools/file_system/write_file.py
  • src/ai_company/tools/git_tools.py
  • src/ai_company/tools/invoker.py
  • tests/unit/core/test_role_catalog.py
  • tests/unit/core/test_task.py
  • tests/unit/core/test_task_transitions.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_agent_engine_errors.py
  • tests/unit/engine/test_prompt_template.py
  • tests/unit/observability/test_events.py
  • tests/unit/observability/test_sinks.py
  • tests/unit/templates/test_errors.py
  • tests/unit/templates/test_loader.py
  • tests/unit/templates/test_renderer.py
  • tests/unit/templates/test_schema.py
  • tests/unit/tools/file_system/test_base_fs_tool.py
  • tests/unit/tools/file_system/test_list_directory.py
  • tests/unit/tools/file_system/test_write_file.py
  • tests/unit/tools/git/test_git_tools.py
  • tests/unit/tools/test_permissions.py

Comment on lines +27 to +32
- field: new_text
operator: regex_match
value: "^(def |class )"
- field: new_text
operator: not_contains
value: "get_logger"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

new_text makes this hook false-positive on already compliant modules.

The last condition only checks the diff hunk for get_logger, so editing a function in a file that already imports and initializes logger will still warn unless those top-of-file lines are part of the same patch. It also misses async def and indented members for the same reason. As per coding guidelines "Every module with business logic MUST import: from ai_company.observability import get_logger then logger = get_logger(name)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/hookify.missing-logger.md around lines 27 - 32, The hook's
conditions on field `new_text` are too narrow: they only match top-level "def "
or "class " and only search the diff hunk for the literal "get_logger", causing
false positives when a module already imports/initializes logger or uses "async
def" or indented members. Update the rule to (1) match function/class
declarations including leading whitespace and optional "async" (e.g., regex that
accepts optional leading whitespace and "async "), and (2) check the entire file
content (or a broader context) for the presence of both the import symbol "from
ai_company.observability import get_logger" and the initialization pattern
"logger = get_logger(__name__)" instead of only scanning the diff hunk; ensure
the operator uses a file-level search or adds conditions that verify both
symbols are present to suppress the hook when a module is already compliant.

Comment on lines +229 to +242
execution_result = await self._post_execution_pipeline(
execution_result,
identity,
agent_id,
task_id,
)

return self._build_and_log_result(
execution_result,
system_prompt,
start,
agent_id,
task_id,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the loop’s final context when post-processing fails.

Once _run_loop_with_timeout() returns, failures in _post_execution_pipeline() or _build_and_log_result() still go through _handle_fatal_error(), but that path can only rebuild from the stale pre-loop ctx from run(). The fallback ExecutionResult therefore loses the real final conversation, turns, accumulated_cost, and task state from execution_result.context.

🧯 Proposed fix
-        execution_result = await self._post_execution_pipeline(
-            execution_result,
-            identity,
-            agent_id,
-            task_id,
-        )
-
-        return self._build_and_log_result(
-            execution_result,
-            system_prompt,
-            start,
-            agent_id,
-            task_id,
-        )
+        try:
+            execution_result = await self._post_execution_pipeline(
+                execution_result,
+                identity,
+                agent_id,
+                task_id,
+            )
+            return self._build_and_log_result(
+                execution_result,
+                system_prompt,
+                start,
+                agent_id,
+                task_id,
+            )
+        except MemoryError, RecursionError:
+            raise
+        except Exception as exc:
+            return await self._handle_fatal_error(
+                exc=exc,
+                identity=identity,
+                task=task,
+                agent_id=agent_id,
+                task_id=task_id,
+                duration_seconds=time.monotonic() - start,
+                ctx=execution_result.context,
+                execution_result=execution_result,
+                system_prompt=system_prompt,
+            )
     async def _handle_fatal_error(
         self,
         *,
         exc: Exception,
         identity: AgentIdentity,
         task: Task,
         agent_id: str,
         task_id: str,
         duration_seconds: float,
         ctx: AgentContext | None = None,
+        execution_result: ExecutionResult | None = None,
         system_prompt: SystemPrompt | None = None,
     ) -> AgentRunResult:
@@
             error_execution = await self._build_error_execution(
                 identity,
                 task,
                 agent_id,
                 task_id,
                 error_msg,
                 ctx,
+                execution_result,
             )
@@
     async def _build_error_execution(
         self,
         identity: AgentIdentity,
         task: Task,
         agent_id: str,
         task_id: str,
         error_msg: str,
         ctx: AgentContext | None,
+        execution_result: ExecutionResult | None,
     ) -> ExecutionResult:
         """Create an error ``ExecutionResult`` and apply recovery."""
-        error_ctx = ctx or AgentContext.from_identity(identity, task=task)
+        error_ctx = (
+            execution_result.context
+            if execution_result is not None
+            else ctx or AgentContext.from_identity(identity, task=task)
+        )

Also applies to: 685-698, 725-745

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 229 - 242, When
post-processing fails after _run_loop_with_timeout(), the fallback path
currently rebuilds from the stale pre-loop ctx from run(), losing the true final
state in execution_result.context; update the error handling in the blocks that
call _post_execution_pipeline and _build_and_log_result (including the other
occurrences around the second and third locations) to detect a failed
post-processing and pass the final execution_result.context (or collapse into
execution_result) into _handle_fatal_error (or into whatever fallback builder
currently uses ctx) so the fallback rebuild uses execution_result.context
(conversation, turns, accumulated_cost, task state) rather than the pre-loop
ctx; ensure _handle_fatal_error signature or the fallback builder is adjusted to
accept and prefer execution_result.context when present.

Comment on lines +205 to +219
def test_file_open_oserror_raises_runtime_error(self, tmp_path: Path) -> None:
"""File open OSError is wrapped in RuntimeError."""
sink = SinkConfig(
sink_type=SinkType.FILE,
file_path="app.log",
rotation=RotationConfig(strategy=RotationStrategy.BUILTIN),
)
with (
patch(
"logging.handlers.RotatingFileHandler.__init__",
side_effect=OSError("disk full"),
),
pytest.raises(RuntimeError, match="Failed to open log file"),
):
_build_file_handler(sink, tmp_path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Patching __init__ may not reliably simulate file-open failure.

The RotatingFileHandler.__init__ internally calls BaseRotatingHandler.__init__, which then calls logging.FileHandler.__init__ to open the file. Patching __init__ with side_effect=OSError prevents the object from being constructed at all, which happens to raise OSError before entering the implementation's try block.

This works by accident — the OSError is raised during handler construction, which the implementation catches. However, this approach is fragile. A more precise approach would be to patch the actual file-opening call.

♻️ Optional: More precise mock target
     def test_file_open_oserror_raises_runtime_error(self, tmp_path: Path) -> None:
         """File open OSError is wrapped in RuntimeError."""
         sink = SinkConfig(
             sink_type=SinkType.FILE,
             file_path="app.log",
             rotation=RotationConfig(strategy=RotationStrategy.BUILTIN),
         )
         with (
             patch(
-                "logging.handlers.RotatingFileHandler.__init__",
+                "logging.handlers.RotatingFileHandler",
                 side_effect=OSError("disk full"),
             ),
             pytest.raises(RuntimeError, match="Failed to open log file"),
         ):
             _build_file_handler(sink, tmp_path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_file_open_oserror_raises_runtime_error(self, tmp_path: Path) -> None:
"""File open OSError is wrapped in RuntimeError."""
sink = SinkConfig(
sink_type=SinkType.FILE,
file_path="app.log",
rotation=RotationConfig(strategy=RotationStrategy.BUILTIN),
)
with (
patch(
"logging.handlers.RotatingFileHandler.__init__",
side_effect=OSError("disk full"),
),
pytest.raises(RuntimeError, match="Failed to open log file"),
):
_build_file_handler(sink, tmp_path)
def test_file_open_oserror_raises_runtime_error(self, tmp_path: Path) -> None:
"""File open OSError is wrapped in RuntimeError."""
sink = SinkConfig(
sink_type=SinkType.FILE,
file_path="app.log",
rotation=RotationConfig(strategy=RotationStrategy.BUILTIN),
)
with (
patch(
"logging.handlers.RotatingFileHandler",
side_effect=OSError("disk full"),
),
pytest.raises(RuntimeError, match="Failed to open log file"),
):
_build_file_handler(sink, tmp_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/observability/test_sinks.py` around lines 205 - 219, The test
test_file_open_oserror_raises_runtime_error currently patches
logging.handlers.RotatingFileHandler.__init__ which prevents the handler object
from being constructed and is brittle; change the mock to target the actual
file-open call (e.g., patch logging.FileHandler.__init__ or the built-in open
used by logging) so that _build_file_handler executes its normal constructor
path and raises OSError at the file-open step, ensuring the RuntimeError path in
_build_file_handler/SinkConfig handling is exercised reliably.

Comment on lines +37 to +44
def test_map_os_error(
exc: OSError,
expected_key: str,
expected_msg_fragment: str,
) -> None:
key, msg = _map_os_error(exc, "/workspace/missing.txt", "reading")
assert key == expected_key
assert msg == expected_msg_fragment
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add the required docstring to the public test function.

test_map_os_error is typed but still missing the docstring required by repo rules.

✏️ Proposed fix
 def test_map_os_error(
     exc: OSError,
     expected_key: str,
     expected_msg_fragment: str,
 ) -> None:
+    """Verifies `_map_os_error` returns expected mappings for common OS errors."""
     key, msg = _map_os_error(exc, "/workspace/missing.txt", "reading")
     assert key == expected_key
     assert msg == expected_msg_fragment
As per coding guidelines, "Docstrings must use Google style; required on all public classes and functions (enforced by ruff D rules)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/tools/file_system/test_base_fs_tool.py` around lines 37 - 44, The
public test function test_map_os_error is missing the required Google-style
docstring; add a concise docstring above test_map_os_error that describes the
test purpose (verifies _map_os_error maps an OSError to the expected key and
message fragment), and include Google-style Args for exc, expected_key, and
expected_msg_fragment and an Optional Returns/Raises line if applicable so it
satisfies the ruff D rule.

…, Greptile, Copilot

Critical:
- Fix OS-dependent path traversal check in loader.py (explicit / \ .. rejection)
- Fix symlink test FileExistsError in parallel xdist runs (exist_ok=True)

Major:
- Widen credential regex to catch token-only URLs (https://<token>@host)
- Remove duplicate TEMPLATE_PERSONALITY_PRESET_UNKNOWN log from presets.py
- Raise TemplateRenderError on unknown personality preset (was silent fallthrough)
- Validate list item types in _validate_list (reject non-dict entries)
- Fix hookify rules: async def support, file-level logger check, exclude as-clause
- Add pytestmark timeout(30) to test_agent_engine.py and test_loader.py
- Import get_logger from public package, not private _logger module

Medium:
- Upgrade tags to tuple[NotBlankStr, ...] per convention
- Fix [DIR] double space formatting in list_directory.py
- Soften DESIGN_SPEC.md crash recovery wording (eligible for reassignment)
- Add logging before raises in registry.py _build_driver/_resolve_factory
- Fix test_write_file to patch os.fsync (actually tests temp-file cleanup)
- Fix max_bytes docstring in read_file.py (reads characters, not bytes)

Minor:
- Remove duplicate seniority guard tests from test_role_catalog.py
- Add status assertion to test_non_active_allows_assigned_to
- Strengthen defective builtin test assertion (== () not isinstance)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit c57a6a9 into main Mar 7, 2026
8 of 9 checks passed
@Aureliolo Aureliolo deleted the chore/post-audit-cleanup branch March 7, 2026 10:57
Comment on lines +249 to +250
"path": user_path,
"total_entries": total,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[DIR] output format changed from double-space to single-space alignment

The original code used f"[DIR] {display}/" (two spaces) intentionally to align directory and file entries: [DIR] (5 chars) + two spaces = 7 chars total, matching [FILE] (6 chars) + one space = 7 chars total. This kept the entry name aligned in a column regardless of prefix type.

The new helper returns f"[DIR] {display}/" (one space), breaking that alignment. Since list_directory's output is typically user-facing and may be parsed by agent logic or tests expecting the old format, this is a subtle format change worth addressing.

Suggested change
"path": user_path,
"total_entries": total,
return f"[DIR] {display}/"
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/file_system/list_directory.py
Line: 249-250

Comment:
**`[DIR]` output format changed from double-space to single-space alignment**

The original code used `f"[DIR]  {display}/"` (two spaces) intentionally to align directory and file entries: `[DIR]` (5 chars) + two spaces = 7 chars total, matching `[FILE]` (6 chars) + one space = 7 chars total. This kept the entry name aligned in a column regardless of prefix type.

The new helper returns `f"[DIR] {display}/"` (one space), breaking that alignment. Since `list_directory`'s output is typically user-facing and may be parsed by agent logic or tests expecting the old format, this is a subtle format change worth addressing.

```suggestion
    return f"[DIR]  {display}/"
```

How can I resolve this? If you propose a fix, please make it concise.


if execution_result.termination_reason == TerminationReason.ERROR:
execution_result = await self._apply_recovery(
execution_result,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise exc from None suppresses the exception chain

Changing from raise exc from build_exc to raise exc from None clears __cause__ on the re-raised exception. While build_exc is logged on the preceding line so the information isn't completely lost, any upstream handler that inspects exc.__cause__ or reads a chained traceback will no longer see the build_exc context. Suppressing the chain deliberately with from None is typically reserved for cases where you want to present a cleaner, unrelated exception — here build_exc is directly related to why we're in this path.

If the goal is to re-raise the original exc without implying build_exc caused it, raise exc (without any from) would re-attach build_exc as __context__ while not implying causation, preserving the chain without the misleading from semantics.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 266

Comment:
**`raise exc from None` suppresses the exception chain**

Changing from `raise exc from build_exc` to `raise exc from None` clears `__cause__` on the re-raised exception. While `build_exc` is logged on the preceding line so the information isn't completely lost, any upstream handler that inspects `exc.__cause__` or reads a chained traceback will no longer see the `build_exc` context. Suppressing the chain deliberately with `from None` is typically reserved for cases where you want to present a cleaner, unrelated exception — here `build_exc` is directly related to why we're in this path.

If the goal is to re-raise the original `exc` without implying `build_exc` *caused* it, `raise exc` (without any `from`) would re-attach `build_exc` as `__context__` while not implying causation, preserving the chain without the misleading `from` semantics.

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 7, 2026
CI fixes:
- Remove unused type: ignore on Linux (use dual suppression for
  cross-platform compat) in subprocess_sandbox.py
- Fix test assertion for raise-from-build_exc chaining in
  test_agent_engine_errors.py

Review feedback (Greptile):
- Rename _check_ref → _check_git_arg with updated docstring (was
  misleadingly named when used for author/date validation)
- Fix missing space in clone URL error message
- Add EXECUTION_SHUTDOWN_CLEANUP_FAILED event constant (was reusing
  EXECUTION_SHUTDOWN_CLEANUP for both start and failure)
- Capture partial stderr on timeout in direct subprocess path
  (mirrors sandbox behavior for better debuggability)
- Make Windows Git PATH prefixes configurable via
  SubprocessSandboxConfig.extra_safe_path_prefixes
- Add clarifying comment for total_entries semantics when truncated
- Fix ResourceWarning on Windows by explicitly closing subprocess
  transports after communicate()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: post-audit cleanup — code quality, test coverage, and prevention hooks

2 participants