Skip to content

fix: address post-merge review feedback from PRs #164-#167#170

Merged
Aureliolo merged 3 commits intomainfrom
fix/post-merge-feedback-2
Mar 8, 2026
Merged

fix: address post-merge review feedback from PRs #164-#167#170
Aureliolo merged 3 commits intomainfrom
fix/post-merge-feedback-2

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • 46 findings from external reviewers (Copilot, Greptile, CodeRabbit, Gemini) on PRs fix: incorporate post-merge feedback + pre-PR review fixes #164-feat: implement meeting protocol system (#123) #167, all addressed
  • 4 critical: Parse decisions/action_items from LLM synthesis in all 3 meeting protocols; validate winning_agent_id in find_losers()
  • 17 major: Token budget guards, duplicate participant rejection, frozen registry, hierarchy tiebreakers, dependency copying, wake-all-waiters on unsubscribe, and more
  • 15 minor: Duplicate log removal, assert→raise, traceback preservation, O(1) seniority lookups, NotBlankStr typing, routing validation ordering
  • 4 trivial: Centralized event constants, docstring/spec fixes
  • 6 test/doc gaps: New test_parsing.py (18 tests), expanded tests across 7 modules, timeout markers, spec name corrections
  • Pre-PR review fixes: List-item regex crossing line boundaries, parent_task_id validation ordering, circular exception cause, dead code removal

Test plan

  • All 3634 unit tests pass
  • Ruff lint clean
  • Mypy strict clean
  • Pre-commit hooks pass
  • CI pipeline (lint + type-check + test + coverage)

Closes #169

Critical (C1-C4):
- Parse decisions/action_items from LLM synthesis in all 3 meeting protocols
- Validate winning_agent_id exists in find_losers() before computing losers

Major (M1-M17):
- Guard summary budget reserve when leader_summarizes=False
- Add synthesis sub-reserve in structured phases discussion
- Reject duplicate participant_ids in meeting orchestrator
- Freeze protocol registry with MappingProxyType
- Warn when token tracker exceeds budget
- Add hierarchy tiebreaker to pick_highest_seniority()
- Wire hierarchy into debate/hybrid authority fallbacks
- Fast-path get_lowest_common_manager(a, a) → a
- Validate _SENIORITY_ORDER matches enum members at import
- Remove dead max_tokens_per_argument config field
- Verify task IDs match plan subtask IDs in DecompositionResult
- Return CANCELLED for mixed completed+cancelled terminal states
- Fix double-logging in rollup compute() for empty case
- Copy subtask dependencies from plan to created Tasks
- Reject duplicate subtask IDs in RoutingResult
- Wake all pending waiters on unsubscribe (not just one)

Minor (m1-m15):
- Remove duplicate MEETING_CONFLICT_DETECTED log events
- Replace assert with explicit raises in meeting protocols
- Include presenter_id in formatted agenda prompt
- Validate token aggregates in MeetingMinutes
- Require non-empty error_message for FAILED/BUDGET_EXHAUSTED
- Move _MIN_POSITIONS to local constant in service.py
- Precompute seniority rank dict for O(1) lookups
- Remove dead asyncio.QueueFull catch on unbounded queue
- Fix racy state check in _log_receive_null (acquire lock)
- Type channel_name as NotBlankStr in messenger
- Document unsubscribe as None return path in receive()
- Preserve traceback context in parallel.py re-raise
- Validate parent_task.id matches plan.parent_task_id
- Add logging before raises in routing model validators

Trivial (t1-t4):
- Use centralized event constant in routing scorer
- Add task_structure/coordination_topology to Task docstring
- Fix DESIGN_SPEC.md model/function names to match code
- Fix StructuredPhasesConfig docstring

Tests (T1-T5):
- Assert MEETING_CONTRIBUTION enum value
- Add timeout markers to all meeting test modules
- Add 3+ participant test for authority/debate strategies
- Remove dead max_tokens_per_argument test references
- Update HybridResolver tests for new hierarchy parameter

Closes #169
- Fix list-item regex crossing line boundaries (\s* → [^\S\n]*)
- Move parent_task_id validation before empty-agents early return
- Fix circular exception cause in parallel.py re-raise
- Remove unused COMM_UNSUBSCRIBE_SENTINEL_FAILED constant
- Use NotBlankStr for error_message field (replaces manual check)
- Add logger + logging before raises in parsing/position_papers/structured_phases
- Fix import ordering in rollup.py
- Remove dead max_tokens_per_argument from DESIGN_SPEC.md examples
- Correct M3 status in README.md
- Improve docstrings across bus_memory, helpers, hybrid_strategy, orchestrator
- Add test_parsing.py (18 tests) + expand tests in 7 existing modules
Copilot AI review requested due to automatic review settings March 8, 2026 12:14
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 8, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 8, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3725483c-784c-46b5-a522-bc80cc2665d6

📥 Commits

Reviewing files that changed from the base of the PR and between cf27048 and 1317bea.

📒 Files selected for processing (19)
  • DESIGN_SPEC.md
  • README.md
  • src/ai_company/communication/bus_memory.py
  • src/ai_company/communication/delegation/hierarchy.py
  • src/ai_company/communication/meeting/_parsing.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • src/ai_company/communication/meeting/models.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/communication/meeting/position_papers.py
  • src/ai_company/communication/meeting/round_robin.py
  • src/ai_company/communication/meeting/structured_phases.py
  • src/ai_company/engine/decomposition/models.py
  • src/ai_company/engine/parallel.py
  • src/ai_company/observability/events/meeting.py
  • tests/unit/communication/delegation/test_hierarchy.py
  • tests/unit/communication/meeting/test_models.py
  • tests/unit/communication/meeting/test_prompts.py
  • tests/unit/engine/test_decomposition_models.py
  • tests/unit/engine/test_decomposition_rollup.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Meeting summaries now automatically extract decisions and action items from discussions.
    • Expanded conflict resolution strategies including debate, hybrid, authority, and human escalation approaches.
    • Tasks now support structure classification and coordination topology specification.
  • Bug Fixes

    • Message bus now properly handles multiple concurrent message receivers during unsubscribe.
    • Added validation for task routing and dependency consistency.
  • Refactor

    • Simplified conflict resolution logic and improved validation throughout communication systems.
    • Enhanced token tracking and observability in meetings.

Walkthrough

This PR addresses 40+ post-merge review feedback items from four recently-merged PRs by fixing critical validation gaps, implementing missing parsing logic for meeting decisions/action items, adding hierarchy-based tiebreaking to conflict resolution, improving waiter handling in message bus, and enhancing decomposition/routing validation with proper ID consistency checks.

Changes

Cohort / File(s) Summary
Conflict Resolution — Hierarchy Tiebreaking
src/ai_company/communication/conflict_resolution/_helpers.py, debate_strategy.py, hybrid_strategy.py
Added hierarchy-based tiebreaking to pick_highest_seniority() when seniority is equal; updated DebateResolver._authority_fallback() to instance method and HybridResolver to accept and use hierarchy; added validation in find_losers() to verify winning agent exists.
Conflict Resolution — Config & Simplification
src/ai_company/communication/conflict_resolution/config.py
Removed dead max_tokens_per_argument field from DebateConfig and HybridConfig.
Meeting Parsing & Decision Extraction
src/ai_company/communication/meeting/_parsing.py, position_papers.py, round_robin.py, structured_phases.py
Added new public parsing module with parse_decisions() and parse_action_items() functions; integrated parsing into three protocol implementations to populate previously-empty MeetingMinutes.decisions and action_items fields; updated synthesis prompts with exact section headers.
Meeting Validation & Infrastructure
src/ai_company/communication/meeting/orchestrator.py, models.py, _token_tracker.py
Added duplicate participant detection with MappingProxyType immutable registry; enhanced MeetingMinutes validation for token aggregates; added budget-exhaustion and validation-failure logging to TokenTracker; introduced MEETING_INTERNAL_ERROR event constant.
Meeting Models & Prompts
src/ai_company/communication/meeting/_prompts.py, config.py
Added presenter metadata to formatted agenda items; improved docstrings.
Message Bus — Concurrent Waiter Handling
src/ai_company/communication/bus_memory.py
Replaced single-waiter sentinel wake-up with per-subscription waiter counting via _waiters dict; made _log_receive_null() async to safely determine shutdown/unsubscribe state; removed dead COMM_UNSUBSCRIBE_SENTINEL_FAILED event.
Messenger API Surface
src/ai_company/communication/messenger.py
Updated subscribe(), unsubscribe(), and receive() signatures to use NotBlankStr for channel names; expanded docstring for None return conditions.
Hierarchy Fast-Path
src/ai_company/communication/delegation/hierarchy.py
Added fast-path in get_lowest_common_manager() for identical agents; added _known_agents tracking for validation.
Core Models
src/ai_company/core/task.py, enums.py
Added task_structure and coordination_topology fields to Task; added validation and O(1) rank lookup for _SENIORITY_ORDER in enum comparison.
Decomposition Validation
src/ai_company/engine/decomposition/models.py, rollup.py, service.py
Added ID-set matching validation in DecompositionResult beyond count; changed derived_parent_status to return CANCELLED for mixed completed+cancelled states; added early-return for empty subtasks in rollup; propagate dependencies from decomposition plan to created tasks.
Routing Validation
src/ai_company/engine/routing/models.py, service.py, scorer.py
Added duplicate ID detection within decisions/unroutable and overlaps between them; added parent task ID consistency check; replaced string literal event key with centralized constant TASK_ROUTING_SCORER_INVALID_CONFIG.
Engine Parallel & Logging
src/ai_company/engine/parallel.py
Reformatted multi-line error construction; elevated suppressed exception group logging from debug to warning.
Observability Events
src/ai_company/observability/events/communication.py, task_routing.py, meeting.py
Removed COMM_UNSUBSCRIBE_SENTINEL_FAILED; added TASK_ROUTING_SCORER_INVALID_CONFIG and MEETING_INTERNAL_ERROR constants.
Documentation & Status
DESIGN_SPEC.md, README.md
Updated project status to reflect M3/M4 in-progress; no API changes.
Test Coverage — Communication
tests/unit/communication/test_bus_memory.py, test_enums.py, meeting/*.py
Added test for multiple concurrent unsubscribe wake-ups; added MEETING_CONTRIBUTION enum assertion; added module-level 30-second timeout markers across meeting tests; added duplicate participant validation test; added comprehensive parsing tests for decisions/action items.
Test Coverage — Conflict Resolution
tests/unit/communication/conflict_resolution/test_*.py
Added tests for three-participant authority scenarios; updated tests to verify hierarchy tiebreaking in pick_highest_seniority(); removed max_tokens_per_argument assertions; added validation tests for winning agent existence.
Test Coverage — Decomposition & Routing
tests/unit/engine/test_decomposition_*.py, test_routing_*.py
Added tests for task ID mismatch rejection; added derived parent status CANCELLED assertion; added dependency propagation test; added parent task ID validation test; added duplicate ID detection tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • Copilot
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 59.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main objective of the PR — addressing post-merge review feedback from earlier PRs #164-#167.
Description check ✅ Passed The description comprehensively explains the 46 findings addressed across severity levels with specific focus on 4 critical, 17 major, 15 minor, and 4 trivial issues plus test/doc gaps.
Linked Issues check ✅ Passed All code changes directly address requirements from issue #169: critical parsing/validation fixes [C1-C4], major budget/registry/dependency/validation fixes [M1-M17], minor typing/logging/docs improvements [m1-m15], trivial constants/docstrings [t1-t4], and test coverage gaps [T1-T6].
Out of Scope Changes check ✅ Passed All changes are in-scope per issue #169 objectives and explicitly exclude out-of-scope items (human escalation queue, per-conflict overrides, protocol interface changes, TokenTracker mutability, etc.).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/post-merge-feedback-2
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/post-merge-feedback-2

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the multi-agent system by incorporating extensive feedback from external reviews. The changes focus on improving the robustness and functionality of core communication mechanisms, particularly within meeting protocols and conflict resolution. Key updates include more reliable message bus operations, enhanced parsing of LLM outputs for structured decision-making, and stricter data integrity checks across various models. These improvements aim to make the system more stable, predictable, and easier to debug, while also clarifying design specifications and updating development status.

Highlights

  • External Review Feedback Addressed: Addressed 46 findings from external reviewers (Copilot, Greptile, CodeRabbit, Gemini) across PRs fix: incorporate post-merge feedback + pre-PR review fixes #164-feat: implement meeting protocol system (#123) #167, covering critical, major, minor, and test/documentation gaps.
  • Meeting Protocol Enhancements: Implemented shared LLM response parsing for decisions and action items in all three meeting protocols, improving structured output extraction.
  • Communication Bus Robustness: Enhanced the in-memory message bus to correctly wake all concurrent receive() calls when a subscription is cancelled, preventing potential deadlocks.
  • Conflict Resolution Logic Improvements: Refined conflict resolution strategies, including adding hierarchy-based tie-breaking for agents with equal seniority and validating winning agent IDs in dissent record generation.
  • Validation and Data Integrity: Introduced new validations for meeting participants (preventing duplicates), token aggregates in meeting minutes, and consistency between decomposition plans and created tasks.
  • Codebase Clarity and Maintainability: Removed max_tokens_per_argument from conflict resolution configs, updated documentation, and improved logging for various components.
Changelog
  • DESIGN_SPEC.md
    • Removed max_tokens_per_argument from debate and hybrid conflict resolution configurations.
    • Updated the 'Current state' description for meeting protocols to reflect the addition of shared LLM response parsing.
    • Updated the description of _helpers.py in the conflict resolution section to include build_dissent_records.
  • README.md
    • Updated the status of M3 Single Agent from 'all done' to 'in progress'.
  • src/ai_company/communication/bus_memory.py
    • Removed the COMM_UNSUBSCRIBE_SENTINEL_FAILED event constant.
    • Added a _waiters dictionary to track pending receive() calls.
    • Modified the unsubscribe method to wake all concurrent receive() calls by putting a sentinel for each pending waiter.
    • Updated the docstring for the receive method to reflect the new behavior regarding unsubscribe.
    • Refactored _log_receive_null to be an asynchronous method and safely inspect bus state using a lock.
  • src/ai_company/communication/conflict_resolution/_helpers.py
    • Imported HierarchyResolver for use in conflict resolution.
    • Updated the docstring for find_losers to clarify its behavior for N-party conflicts.
    • Added validation to find_losers to ensure the winning_agent_id exists in the conflict positions, raising ConflictStrategyError if not found.
    • Modified pick_highest_seniority to accept an optional hierarchy for tie-breaking when seniority levels are equal.
    • Implemented _hierarchy_tiebreak to resolve seniority ties based on an agent's depth in the organizational hierarchy.
  • src/ai_company/communication/conflict_resolution/config.py
    • Removed the max_tokens_per_argument field from DebateConfig.
    • Removed the max_tokens_per_argument field from HybridConfig.
  • src/ai_company/communication/conflict_resolution/debate_strategy.py
    • Modified the _authority_fallback method to use the hierarchy for tie-breaking when seniority levels are equal.
  • src/ai_company/communication/conflict_resolution/hybrid_strategy.py
    • Imported HierarchyResolver for use in the hybrid strategy.
    • Added hierarchy as a required parameter to the HybridResolver constructor.
    • Modified the _authority_fallback method to use the hierarchy for tie-breaking when seniority levels are equal.
  • src/ai_company/communication/conflict_resolution/service.py
    • Moved the _MIN_POSITIONS constant from models.py to service.py.
  • src/ai_company/communication/delegation/hierarchy.py
    • Added a check in get_lowest_common_manager to return the agent itself if both input agents are the same.
  • src/ai_company/communication/meeting/_parsing.py
    • Added a new file containing shared helper functions for parsing decisions and action items from LLM-generated text, including regex patterns for headers and list items.
  • src/ai_company/communication/meeting/_prompts.py
    • Modified build_agenda_prompt to include the presenter_id in agenda item entries if available.
  • src/ai_company/communication/meeting/_token_tracker.py
    • Imported get_logger and MEETING_BUDGET_EXHAUSTED for logging.
    • Added a warning log when input_tokens or output_tokens are negative in the record method.
    • Added a warning log when the token budget is exceeded after recording token usage.
  • src/ai_company/communication/meeting/config.py
    • Clarified the docstring for max_discussion_tokens in StructuredPhasesConfig.
  • src/ai_company/communication/meeting/models.py
    • Added a _validate_token_aggregates model validator to MeetingMinutes to ensure that total_input_tokens and total_output_tokens match the sum of contributions.
    • Changed the type of error_message in MeetingRecord to NotBlankStr for stricter validation.
  • src/ai_company/communication/meeting/orchestrator.py
    • Wrapped the protocol_registry in a MappingProxyType and deep-copied it to ensure immutability.
    • Updated docstrings for MeetingParticipantError to explicitly mention duplicate participants.
    • Added validation to _validate_inputs to check for duplicate participant IDs, raising MeetingParticipantError if found.
  • src/ai_company/communication/meeting/position_papers.py
    • Imported parse_action_items and parse_decisions from _parsing.py.
    • Used the new parsing functions to extract decisions and action items from the synthesis text and include them in MeetingMinutes.
    • Replaced assert statements with if not ... raise RuntimeError for robustness in _collect_paper.
  • src/ai_company/communication/meeting/round_robin.py
    • Imported parse_action_items and parse_decisions from _parsing.py.
    • Conditionally reserved summary tokens based on whether the leader summarizes.
    • Used the new parsing functions to extract decisions and action items from the summary text and include them in MeetingMinutes.
  • src/ai_company/communication/meeting/structured_phases.py
    • Imported parse_action_items and parse_decisions from _parsing.py.
    • Updated the comment for _SYNTHESIS_RESERVE_FRACTION to clarify it applies to the remaining budget.
    • Used the new parsing functions to extract decisions and action items from the summary text and include them in MeetingMinutes.
    • Removed a redundant MEETING_CONFLICT_DETECTED debug log.
    • Adjusted the calculation of discussion_budget to reserve tokens for the synthesis phase.
    • Added discussion_used tracking to ensure discussion does not exceed its allocated budget.
    • Replaced assert statements with if not ... raise RuntimeError for robustness in _collect_input.
  • src/ai_company/communication/messenger.py
    • Imported NotBlankStr for type hinting.
    • Updated type hints for channel_name parameters in subscribe, unsubscribe, and receive methods to NotBlankStr.
    • Updated the docstring for the receive method to provide more detailed conditions for returning None.
  • src/ai_company/core/enums.py
    • Added validation logic to ensure _SENIORITY_ORDER is in sync with SeniorityLevel enum members and contains no duplicates.
    • Introduced _SENIORITY_RANK as a precomputed dictionary for O(1) seniority comparison, replacing list.index() calls.
  • src/ai_company/core/task.py
    • Added task_structure and coordination_topology fields to the Task model.
  • src/ai_company/engine/decomposition/models.py
    • Added validation to _validate_plan_task_consistency to ensure that the IDs of created tasks exactly match the IDs in the decomposition plan.
    • Modified derived_parent_status to return TaskStatus.CANCELLED when a mix of completed and cancelled subtasks are present, indicating partial abandonment.
  • src/ai_company/engine/decomposition/rollup.py
    • Reordered imports for better organization.
    • Added an explicit return of an empty SubtaskStatusRollup when no subtask statuses are provided.
  • src/ai_company/engine/decomposition/service.py
    • Ensured that dependencies from subtask definitions are propagated to the newly created Task objects during decomposition.
  • src/ai_company/engine/parallel.py
    • Adjusted the formatting of a ParallelExecutionError instantiation for consistency.
  • src/ai_company/engine/routing/models.py
    • Imported Counter, get_logger, and TASK_ROUTING_FAILED for enhanced validation and logging.
    • Added logging and validation to _validate_selected_not_in_alternatives to prevent selected candidates from appearing in alternatives.
    • Enhanced _validate_unique_subtask_ids to check for duplicate subtask IDs within decisions and unroutable lists, as well as overlaps between them, with corresponding logging.
    • Added logging to _validate_no_auto_defaults when CoordinationTopology.AUTO is used inappropriately.
  • src/ai_company/engine/routing/scorer.py
    • Imported TASK_ROUTING_SCORER_INVALID_CONFIG.
    • Updated the log event name for invalid min_score configuration.
  • src/ai_company/engine/routing/service.py
    • Added validation to the route method to ensure that the parent_task.id matches the plan.parent_task_id, raising a ValueError if they do not.
  • src/ai_company/observability/events/communication.py
    • Removed the COMM_UNSUBSCRIBE_SENTINEL_FAILED constant.
  • src/ai_company/observability/events/task_routing.py
    • Added TASK_ROUTING_SCORER_INVALID_CONFIG event constant.
  • tests/integration/communication/test_meeting_integration.py
    • Added a pytestmark to set a timeout for integration tests.
  • tests/unit/communication/conflict_resolution/test_authority_strategy.py
    • Added new test cases for AuthorityResolver with three participants, verifying highest seniority wins and correct dissent record generation.
  • tests/unit/communication/conflict_resolution/test_config.py
    • Removed tests related to max_tokens_per_argument in DebateConfig and HybridConfig as the field was removed.
  • tests/unit/communication/conflict_resolution/test_debate_strategy.py
    • Updated the description for the test_three_party_shared_manager_judge test case for clarity.
  • tests/unit/communication/conflict_resolution/test_helpers.py
    • Imported HierarchyResolver for testing.
    • Added new test cases for pick_highest_seniority to verify hierarchy-based tie-breaking.
    • Added a test case for find_losers to ensure it raises an error when the winning agent is not found in positions.
  • tests/unit/communication/conflict_resolution/test_hybrid_strategy.py
    • Imported HierarchyResolver for testing.
    • Updated HybridResolver instantiation in various test cases to include the hierarchy parameter.
  • tests/unit/communication/delegation/test_hierarchy.py
    • Added a test case for get_lowest_common_manager to verify it returns the agent itself when both arguments are the same.
  • tests/unit/communication/meeting/test_config.py
    • Added a pytestmark to set a timeout for unit tests.
  • tests/unit/communication/meeting/test_enums.py
    • Added a pytestmark to set a timeout for unit tests.
  • tests/unit/communication/meeting/test_errors.py
    • Added a pytestmark to set a timeout for unit tests.
  • tests/unit/communication/meeting/test_models.py
    • Added a pytestmark to set a timeout for unit tests.
    • Updated _make_minutes calls in tests to include total_input_tokens and total_output_tokens.
    • Added a test case to ensure blank error messages are rejected by MeetingRecord.
    • Added new test cases for MeetingMinutes to verify that mismatched token aggregates raise validation errors.
  • tests/unit/communication/meeting/test_orchestrator.py
    • Added a pytestmark to set a timeout for unit tests.
    • Added a test case to ensure duplicate participants in a meeting raise a MeetingParticipantError.
  • tests/unit/communication/meeting/test_parsing.py
    • Added a new test file with comprehensive unit tests for parse_decisions and parse_action_items functions.
  • tests/unit/communication/meeting/test_position_papers.py
    • Added a pytestmark to set a timeout for unit tests.
  • tests/unit/communication/meeting/test_prompts.py
    • Added a pytestmark to set a timeout for unit tests.
  • tests/unit/communication/meeting/test_protocol.py
    • Added a pytestmark to set a timeout for unit tests.
  • tests/unit/communication/meeting/test_round_robin.py
    • Added a pytestmark to set a timeout for unit tests.
  • tests/unit/communication/meeting/test_structured_phases.py
    • Added a pytestmark to set a timeout for unit tests.
  • tests/unit/communication/meeting/test_token_tracker.py
    • Added a pytestmark to set a timeout for unit tests.
  • tests/unit/communication/test_bus_memory.py
    • Added a test case to verify that unsubscribe correctly wakes multiple blocked receive() calls.
  • tests/unit/communication/test_enums.py
    • Added MEETING_CONTRIBUTION to the MessageType enum values test.
  • tests/unit/engine/test_decomposition_models.py
    • Added a test case to ensure that DecompositionResult rejects a mismatch between created task IDs and plan subtask IDs.
    • Updated the test_completed_plus_cancelled_mix to assert TaskStatus.CANCELLED instead of COMPLETED for mixed terminal states.
  • tests/unit/engine/test_decomposition_service.py
    • Added a test case to verify that subtask dependencies are correctly propagated to created Task objects.
  • tests/unit/engine/test_routing_models.py
    • Added test cases to ensure RoutingResult rejects duplicate subtask IDs within decisions and unroutable lists.
  • tests/unit/engine/test_routing_service.py
    • Added a test case to ensure a ValueError is raised when parent_task.id does not match plan.parent_task_id during routing.
Activity
  • The pull request addresses 46 findings identified by external reviewers (Copilot, Greptile, CodeRabbit, Gemini) from previous PRs fix: incorporate post-merge feedback + pre-PR review fixes #164-feat: implement meeting protocol system (#123) #167.
  • Critical findings included parsing decisions/action items from LLM synthesis in meeting protocols and validating winning_agent_id in find_losers.
  • Major findings covered token budget guards, duplicate participant rejection, frozen registry, hierarchy tiebreakers, dependency copying, and waking all waiters on unsubscribe.
  • Minor findings involved duplicate log removal, assert to raise conversions, traceback preservation, O(1) seniority lookups, NotBlankStr typing, and routing validation ordering.
  • Trivial findings included centralized event constants and docstring/spec fixes.
  • Test and documentation gaps were addressed with a new test_parsing.py file, expanded tests across seven modules, timeout markers, and spec name corrections.
  • Pre-PR review fixes included list-item regex handling, parent_task_id validation ordering, circular exception cause, and dead code removal.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses post-merge review feedback from PRs #164#167 by tightening validation, improving meeting protocol outputs (decisions/action items parsing), and expanding test coverage to prevent regressions across communication + engine subsystems.

Changes:

  • Add meeting summary/synthesis parsing helpers and populate decisions / action_items in all meeting protocols.
  • Harden routing/decomposition/conflict-resolution invariants (ID validation, hierarchy tiebreakers, dependency propagation, multi-waiter unsubscribe wakeups).
  • Expand/adjust unit + integration tests (including consistent pytest.mark.timeout(30) in meeting tests).

Reviewed changes

Copilot reviewed 55 out of 55 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/ai_company/communication/meeting/_parsing.py New parsing utilities for decisions/action items from LLM text
src/ai_company/communication/meeting/{round_robin,position_papers,structured_phases}.py Populate minutes with parsed decisions/action items; adjust budget handling/invariants
src/ai_company/communication/meeting/_token_tracker.py Add logging when token usage exceeds budget / invalid counts
src/ai_company/communication/meeting/orchestrator.py Freeze protocol registry and reject duplicate participants
src/ai_company/communication/bus_memory.py Wake all pending receivers on unsubscribe; make receive-null logging non-racy
src/ai_company/engine/{routing,decomposition}/... Add/adjust validation + propagation (parent_task_id, dependencies, rollups) and event constant usage
src/ai_company/core/enums.py Guard seniority ordering and make comparisons O(1)
tests/** Add coverage for the above behaviors + timeout markers
README.md, DESIGN_SPEC.md Spec/doc alignment with implemented behavior and naming

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +11 to +13
from ai_company.observability import get_logger

logger = get_logger(__name__)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_logger() result is assigned to logger but never used in this module, which should trigger Ruff's unused-variable check. Either remove the observability import/logger, or actually use it (e.g., to log when a section header is found/missing).

Copilot uses AI. Check for mistakes.
self._protocol_registry = protocol_registry
self._protocol_registry: MappingProxyType[
MeetingProtocolType, MeetingProtocol
] = MappingProxyType(copy.deepcopy(dict(protocol_registry)))
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using copy.deepcopy() on the protocol registry will also deep-copy protocol implementation instances. That’s risky/unnecessary (custom protocol objects may not be deepcopy-safe and it can duplicate internal state/resources). Prefer making the mapping immutable via MappingProxyType(dict(protocol_registry)) (shallow copy) and, if needed, validate the registry contents rather than deep-copying arbitrary objects.

Suggested change
] = MappingProxyType(copy.deepcopy(dict(protocol_registry)))
] = MappingProxyType(dict(protocol_registry))

Copilot uses AI. Check for mistakes.
# points. The asymmetry with the lock-guarded increment
# is intentional — the decrement must happen after
# _await_with_shutdown completes.
self._waiters[key] = max(0, self._waiters.get(key, 0) - 1)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

receive() decrements _waiters[key] in the finally block even if unsubscribe() has already pop()'d that key. This re-inserts the key with a 0 count, causing _waiters to accumulate stale entries over time (and can skew future sentinel counts if code changes). Update the decrement logic to avoid re-adding missing keys and/or delete the key when the count reaches 0 (ideally under the lock).

Suggested change
self._waiters[key] = max(0, self._waiters.get(key, 0) - 1)
current = self._waiters.get(key)
if current is None:
# Key was removed (for example, by unsubscribe); do not recreate it.
pass
elif current <= 1:
# Last waiter for this key; remove entry entirely.
self._waiters.pop(key, None)
else:
self._waiters[key] = current - 1

Copilot uses AI. Check for mistakes.
Comment on lines +401 to +408
if not all(r is not None for r in result_inputs):
msg = f"Expected {num_participants} inputs but some slots are None"
logger.error(msg, meeting_id=meeting_id)
raise RuntimeError(msg)
if not all(c is not None for c in result_contributions):
msg = f"Expected {num_participants} contributions but some slots are None"
logger.error(msg, meeting_id=meeting_id)
raise RuntimeError(msg)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These invariant failures are logged with logger.error(msg, ...), which makes the event name dynamic and bypasses the centralized observability.events.meeting constants pattern used elsewhere in this module. Log with a stable meeting event constant (e.g., MEETING_FAILED/MEETING_VALIDATION_FAILED) and include msg in a structured field (like error=msg).

Copilot uses AI. Check for mistakes.
Comment on lines +282 to +289
if not all(r is not None for r in results):
msg = f"Expected {n} position papers but some slots are None"
logger.error(msg, meeting_id=meeting_id)
raise RuntimeError(msg)
if not all(c is not None for c in contrib_results):
msg = f"Expected {n} contributions but some slots are None"
logger.error(msg, meeting_id=meeting_id)
raise RuntimeError(msg)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These invariant failures are logged with logger.error(msg, ...), which makes the event name dynamic and bypasses the centralized observability.events.meeting constants pattern used elsewhere in this module. Log with a stable meeting event constant (e.g., MEETING_FAILED/MEETING_VALIDATION_FAILED) and include msg in a structured field (like error=msg).

Copilot uses AI. Check for mistakes.
Comment on lines 71 to 83
if input_tokens < 0 or output_tokens < 0:
msg = (
f"Token counts must be non-negative, got "
f"input_tokens={input_tokens}, output_tokens={output_tokens}"
f"input_tokens={input_tokens}, "
f"output_tokens={output_tokens}"
)
logger.warning(
MEETING_BUDGET_EXHAUSTED,
error=msg,
input_tokens=input_tokens,
output_tokens=output_tokens,
)
raise ValueError(msg)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokenTracker.record() logs MEETING_BUDGET_EXHAUSTED even when the problem is invalid input (negative token counts). That event name implies a normal budget exhaustion scenario and can confuse monitoring/alerts. Consider logging MEETING_VALIDATION_FAILED (or a dedicated token-tracking/invalid-usage event) for negative counts, while keeping MEETING_BUDGET_EXHAUSTED for actual over-budget conditions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces automated extraction of decisions and action items from LLM-generated meeting summaries, while also addressing numerous findings and significantly improving the codebase through stricter data validation, robust concurrency handling in the message bus, and new features like hierarchy-based tie-breaking. However, a critical security vulnerability exists due to insufficient safeguards against Indirect Prompt Injection, where raw agent responses in prompts could allow a malicious agent to manipulate the LLM's output to create unauthorized tasks. Additionally, there is a high-severity concern that the new LLM response parser may not correctly handle multi-line list items, potentially leading to truncated data.

Comment on lines +121 to +157
def parse_action_items(
summary_text: str,
) -> tuple[ActionItem, ...]:
"""Parse action items from an LLM summary/synthesis response.

Looks for an "Action Items" section header, then extracts
bulleted or numbered list items. Attempts to detect assignee
information within each item.

Args:
summary_text: The full summary/synthesis text from the LLM.

Returns:
Tuple of ActionItem instances (may be empty).
"""
section = _extract_section(summary_text, _ACTION_ITEMS_HEADER_RE)
if not section:
return ()

items: list[ActionItem] = []
for match in _LIST_ITEM_RE.finditer(section):
raw_text = match.group(1).strip()
if not raw_text:
continue

description, assignee_id = _parse_assignee(raw_text)
if not description:
continue

items.append(
ActionItem(
description=description,
assignee_id=assignee_id,
)
)

return tuple(items)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The parse_action_items function extracts assignee_id from free-form LLM text without any validation against the meeting's participant list. This allows an attacker to use prompt injection to assign tasks to arbitrary agents who may not even be part of the meeting, potentially bypassing intended workflow boundaries.

Comment on lines +153 to +155
synthesis_text = synthesis_contribution.content
decisions = parse_decisions(synthesis_text)
action_items = parse_action_items(synthesis_text)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Action items are parsed directly from LLM synthesis output. Since the synthesis prompt (built in _build_synthesis_prompt) concatenates raw responses from other agents, it is vulnerable to indirect prompt injection. A malicious agent can inject instructions to cause the synthesizer to output unauthorized action items.

Recommendation: Validate that the assignee_id in each action item is either the meeting leader or one of the participants.

        synthesis_text = synthesis_contribution.content
        decisions = parse_decisions(synthesis_text)
        raw_action_items = parse_action_items(synthesis_text)
        # Validate assignees are participants or the leader
        allowed_assignees = set(participant_ids) | {leader_id}
        action_items = [
            item for item in raw_action_items 
            if item.assignee_id is None or item.assignee_id in allowed_assignees
        ]

Comment on lines +164 to +166
all_contributions = (*contributions, summary_contribution)
decisions = parse_decisions(summary)
action_items = parse_action_items(summary)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The round-robin protocol is vulnerable to indirect prompt injection because the transcript (containing raw agent responses) is included in the summary prompt. This allows an agent to manipulate the final list of action items.

Recommendation: Validate that the assignee_id in each action item is either the meeting leader or one of the participants.

            all_contributions = (*contributions, summary_contribution)
            decisions = parse_decisions(summary)
            raw_action_items = parse_action_items(summary)
            # Validate assignees are participants or the leader
            allowed_assignees = set(participant_ids) | {leader_id}
            action_items = [
                item for item in raw_action_items 
                if item.assignee_id is None or item.assignee_id in allowed_assignees
            ]

Comment on lines +279 to +280
decisions = parse_decisions(summary)
action_items = parse_action_items(summary)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The structured phases protocol is vulnerable to indirect prompt injection because participant inputs are concatenated into the synthesis prompt. This allows a participant to inject instructions that manipulate the resulting action items.

Recommendation: Validate that the assignee_id in each action item is either the meeting leader or one of the participants.

        decisions = parse_decisions(summary)
        raw_action_items = parse_action_items(summary)
        # Validate assignees are participants or the leader
        allowed_assignees = set(participant_ids) | {leader_id}
        action_items = [
            item for item in raw_action_items 
            if item.assignee_id is None or item.assignee_id in allowed_assignees
        ]

Comment on lines +30 to +33
_LIST_ITEM_RE = re.compile(
r"^[^\S\n]*(?:\d+[\.\)][^\S\n]*|-[^\S\n]*|\*[^\S\n]*|\u2022[^\S\n]*)(.+)",
re.MULTILINE,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _LIST_ITEM_RE regex only captures single-line list items. If a decision or action item from the LLM spans multiple lines, only the first line will be captured. This could lead to truncated and incomplete items, which could be critical for decisions and action items.

For example, with this input:

# Decisions
1. This is a decision
   that spans multiple lines.
2. This is a single line decision.

The parser would extract "This is a decision" and "This is a single line decision.", losing the second line of the first decision.

Consider updating the regex to handle multi-line list items, for example by using a non-greedy match with the re.DOTALL flag that continues until the next list item marker or the end of the section.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/ai_company/engine/decomposition/rollup.py (1)

44-59: ⚠️ Potential issue | 🟡 Minor

Keep DECOMPOSITION_ROLLUP_COMPUTED consistent on the empty-input branch.

This branch returns a valid zeroed rollup, but it logs the same event at WARNING and without derived_status, unlike the normal path. That makes empty inputs look like operational faults and gives the event two different payload shapes.

Proposed adjustment
         if total == 0:
-            logger.warning(
-                DECOMPOSITION_ROLLUP_COMPUTED,
-                parent_task_id=parent_task_id,
-                total=0,
-                reason="rollup computed with no subtask statuses",
-            )
-            return SubtaskStatusRollup(
+            rollup = SubtaskStatusRollup(
                 parent_task_id=parent_task_id,
                 total=0,
                 completed=0,
                 failed=0,
                 in_progress=0,
                 blocked=0,
                 cancelled=0,
             )
+            logger.debug(
+                DECOMPOSITION_ROLLUP_COMPUTED,
+                parent_task_id=parent_task_id,
+                total=0,
+                derived_status=rollup.derived_parent_status.value,
+                reason="rollup computed with no subtask statuses",
+            )
+            return rollup
As per coding guidelines "Log at WARNING or ERROR with context for all error paths before raising exceptions" and "Log at DEBUG for object creation, internal flow, and entry/exit of key functions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/decomposition/rollup.py` around lines 44 - 59, The
empty-input branch logs DECOMPOSITION_ROLLUP_COMPUTED at WARNING without the
derived_status and thus differs from the normal path and misreports an
operational fault; update the total == 0 branch to log the same event shape as
the normal path (include derived_status="zeroed" or the same field name used
elsewhere) and use the same log level as non-error creation (change to DEBUG if
normal path logs creation at DEBUG) before returning the zeroed
SubtaskStatusRollup(parent_task_id=..., total=0, completed=0, failed=0,
in_progress=0, blocked=0, cancelled=0) so the event payload and severity remain
consistent with SubtaskStatusRollup creation.
src/ai_company/communication/conflict_resolution/debate_strategy.py (1)

255-262: ⚠️ Potential issue | 🟡 Minor

Keep the fallback reasoning aligned with the actual tiebreaker.

Lines 255-262 can now choose a winner via hierarchy when seniority ties, but the returned reasoning still says the winner "has highest seniority". That makes the audit trail false for equal-level conflicts. Please distinguish pure seniority wins from hierarchy tiebreak wins in the JudgeDecision.reasoning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/conflict_resolution/debate_strategy.py` around
lines 255 - 262, The reasoning message claims "highest seniority" even when
pick_highest_seniority resolved an equal-seniority tie via hierarchy; update the
JudgeDecision.reasoning to reflect which tiebreaker was used. After calling
pick_highest_seniority(conflict, hierarchy=self._hierarchy), check whether the
win was resolved from multiple agents with equal seniority (e.g., detect if
conflict has multiple agents at best.agent_level or if pick_highest_seniority
can return/indicate a tie-break flag); if it was a hierarchy tiebreak, set
reasoning to something like "Debate fallback: hierarchy tiebreak among
equal-seniority agents — {best.agent_id} ({best.agent_level}) selected",
otherwise keep "authority-based judging — {best.agent_id} ({best.agent_level})
has highest seniority". Ensure this logic is implemented where JudgeDecision is
constructed so the audit trail accurately distinguishes pure seniority wins from
hierarchy tiebreak wins.
src/ai_company/communication/conflict_resolution/_helpers.py (1)

21-79: 🛠️ Refactor suggestion | 🟠 Major

Break find_losers() into validation and extraction helpers.

The new winner-integrity checks pushed this helper past the repo's 50-line ceiling, and it now mixes validation, logging, and the happy-path loser selection. Pulling winner validation into a small helper would keep the unhappy paths isolated and the core computation trivial.

As per coding guidelines, 'Keep functions under 50 lines and files under 800 lines'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/conflict_resolution/_helpers.py` around lines 21
- 79, Split validation from extraction: add a small helper (e.g.
ensure_winner_in_conflict or validate_winner_present(conflict, winner_id)) that
performs the winner existence check, logs the CONFLICT_STRATEGY_ERROR and raises
ConflictStrategyError with the same context when missing; then simplify
find_losers to call that helper and only perform the tuple comprehension (losers
= tuple(pos for pos in conflict.positions if pos.agent_id != winner_id)), keep
the "no losers" warning/raise in find_losers as the only remaining unhappy-path
logic; update imports/refs accordingly.
src/ai_company/communication/meeting/orchestrator.py (1)

364-431: 🛠️ Refactor suggestion | 🟠 Major

Split _validate_inputs() into smaller validators.

This method now bundles token-budget validation, empty-participant checks, duplicate detection, and leader-membership checks into one 68-line branchy helper. Extracting the participant-specific checks and shared log-and-raise scaffolding will make future validation changes safer and bring it back under the repo limit.

As per coding guidelines, 'Keep functions under 50 lines and files under 800 lines'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/orchestrator.py` around lines 364 - 431,
The _validate_inputs method is too large and mixes token_budget checks with
several participant checks; split it into smaller focused validators: create a
_validate_token_budget(meeting_id, token_budget) that performs the positive
check and logs/raises ValueError, and create a
_validate_participants(meeting_id, leader_id, participant_ids) that contains the
empty-participants, duplicate detection (use Counter), and
leader-in-participants checks and raises MeetingParticipantError with the same
context payloads; factor the common logging-and-raise pattern into a helper
(e.g., _log_and_raise or _log_participant_error) used by both validators, then
have _validate_inputs call these two new helpers to preserve behavior and
messages (keep names _validate_inputs, _validate_token_budget,
_validate_participants, and the logging helper to locate changes).
src/ai_company/communication/conflict_resolution/hybrid_strategy.py (1)

228-240: ⚠️ Potential issue | 🟠 Major

Preserve authority-strategy validation in the fallback path.

pick_highest_seniority() only compares seniority plus raw ancestor counts. That means equal-seniority conflicts the authority strategy would reject now silently resolve here — e.g. an agent missing from the hierarchy looks like depth 0 because HierarchyResolver.get_ancestors() returns (), and peers with no common manager collapse to an order-dependent winner. Since this branch is the hybrid's authority fallback, it should reuse the same hierarchy validation/tiebreak semantics as AuthorityResolver before building the hybrid resolution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/conflict_resolution/hybrid_strategy.py` around
lines 228 - 240, The hybrid fallback currently calls
pick_highest_seniority(conflict, hierarchy=self._hierarchy) without reusing the
AuthorityResolver's validation/tiebreak semantics; update the hybrid fallback to
first run the same hierarchy validation used by AuthorityResolver (e.g., invoke
the AuthorityResolver validation method or replicate its checks: ensure both
agents exist in self._hierarchy, detect equal seniority ties and
missing-ancestor cases via HierarchyResolver.get_ancestors() semantics) and only
then call pick_highest_seniority to build the ConflictResolution; if the
AuthorityResolver would have rejected/abstained (tie or missing agent), the
hybrid must not silently pick a winner but follow AuthorityResolver’s outcome
path (reject/abstain or escalate) before creating the RESOLVED_BY_HYBRID
ConflictResolution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 29: The sentence currently contradicts itself by saying "M4: Multi-Agent"
is in progress while also claiming "M3 Single Agent" is in progress; update the
README sentence to make milestone statuses consistent — e.g., mark "M3: Single
Agent" as complete if M4 is in progress, or mark "M4: Multi-Agent" as
planned/not started if M3 is still in progress, and rephrase the line that
mentions "M4: Multi-Agent" and "M3 Single Agent" so it unambiguously lists each
milestone with its correct status (using the exact labels "M4: Multi-Agent" and
"M3: Single Agent").

In `@src/ai_company/communication/bus_memory.py`:
- Around line 462-472: The decrement of self._waiters[key] after await in the
block with _await_with_shutdown can re-insert a key with value 0 even if
unsubscribe() removed it; to fix, change the post-await decrement to only modify
the dict if the key still exists and its current value is >0 (or decrement and
then remove the key when the resulting value is 0) so you never leave
zero-valued orphan entries in self._waiters — update the code around key,
self._waiters, and the finally block that runs after _await_with_shutdown() to
check existence and remove zero entries atomically.

In `@src/ai_company/communication/delegation/hierarchy.py`:
- Around line 231-232: get_lowest_common_manager currently returns agent_a when
agent_a == agent_b without verifying the agent exists; change this fast-path in
get_lowest_common_manager to first check membership in the hierarchy's
known-agent set (or build/maintain such a set during Hierarchy construction),
and only return agent_a if it is present; otherwise return None (or the existing
"no manager" sentinel). Locate get_lowest_common_manager and the class that
builds the hierarchy (e.g., Hierarchy.__init__ or similar) to add/verify the
known-agent collection and use it in the equality fast-path.

In `@src/ai_company/communication/meeting/_parsing.py`:
- Around line 24-27: _ ANY_HEADER_RE currently treats any line that ends with a
colon as a header, which can prematurely split sections (matches things like
"Note:" inside bodies); update the _ANY_HEADER_RE usage so the colon-terminated
alternative only counts as a header when it is followed by a non-blank next line
(use a positive lookahead to assert the next line starts with a non-whitespace
character), keeping the existing markdown-hash header branch intact; modify the
regex assigned to _ANY_HEADER_RE accordingly and keep re.MULTILINE.

In `@src/ai_company/communication/meeting/_token_tracker.py`:
- Around line 77-82: The negative-token validation path is currently logging
MEETING_BUDGET_EXHAUSTED which conflates caller validation errors with true
budget overruns; update the logging there (the logger.warning call in
_token_tracker.py that currently passes MEETING_BUDGET_EXHAUSTED) to emit a
distinct validation event (e.g., MEETING_INVALID_TOKEN_COUNT or
MEETING_TOKEN_VALIDATION_ERROR) and keep the same context fields (error=msg,
input_tokens, output_tokens) so dashboards/alerts can distinguish invalid input
from genuine exhaustion; add the new constant and swap its use in the validation
branch where negative counts are detected.

In `@src/ai_company/communication/meeting/models.py`:
- Around line 237-262: The validator _validate_token_aggregates currently
returns early when contributions is empty, allowing non-zero
total_input_tokens/total_output_tokens; change it so that when
self.contributions is empty you assert both totals are zero (raise ValueError if
total_input_tokens != 0 or total_output_tokens != 0) otherwise proceed to sum
contributions as implemented; update the error messages to reference the field
names (total_input_tokens/total_output_tokens) and expected zero when raising in
the empty-contributions case.

In `@src/ai_company/communication/meeting/position_papers.py`:
- Around line 282-289: Replace the free-form logger.error calls in the two
invariant checks so they use stable event constants from
ai_company.observability.events (e.g., POSITION_PAPERS_MISSING and
CONTRIBUTIONS_MISSING) and structured kwargs rather than formatted strings: for
the results check (variable names results, n, meeting_id) call
logger.error(POSITION_PAPERS_MISSING, detail=msg, meeting_id=meeting_id) (after
importing the constant) and similarly for contrib_results use
logger.error(CONTRIBUTIONS_MISSING, detail=msg, meeting_id=meeting_id); keep
raising RuntimeError(msg) but ensure logging uses the event constant and
structured fields instead of logger.error(msg, ...).
- Around line 153-156: The synthesis output can be missing explicit "Decisions"
and "Action Items" headers so parse_decisions and parse_action_items receive
empty results; update _build_synthesis_prompt() to require the model to emit
clearly labeled, parser-friendly sections named exactly "Decisions:" and "Action
Items:" (or another agreed exact header text) and include examples/format
constraints (e.g., bullet list under each header) so
synthesis_contribution.content (synthesis_text) always contains those headers
for parse_decisions and parse_action_items to consume.

In `@src/ai_company/communication/meeting/round_robin.py`:
- Around line 165-166: The code calls parse_decisions(summary) and
parse_action_items(summary) but the prompt produced by _build_summary_prompt()
does not require "Decisions:" or "Action Items:" headers, so leaders can return
lists that the parsers miss; update _build_summary_prompt() to explicitly
require distinct "Decisions:" and "Action Items:" section headers (with
examples) and then add a small guard where decisions = parse_decisions(summary)
/ action_items = parse_action_items(summary) are invoked to validate the summary
contains those headers (e.g., check for the literal "Decisions:" and "Action
Items:") and if missing, either request the model to reformat or log/raise a
clear parsing error so empty results are not silently accepted.

In `@src/ai_company/engine/decomposition/models.py`:
- Around line 248-253: Update the public class docstring for SubtaskStatusRollup
to explicitly document the mixed terminal-state rule: when completed + cancelled
== total the rollup resolves to TaskStatus.CANCELLED (i.e., any mix of completed
and cancelled subtasks is considered CANCELLED), in addition to the existing
description that pure completed maps to COMPLETED, pure cancelled maps to
CANCELLED, and the remainder maps to IN_PROGRESS; reference the attributes
completed, cancelled, total and the TaskStatus.CANCELLED enum so callers can
rely on this contract.

In `@tests/unit/engine/test_decomposition_models.py`:
- Around line 230-249: The test test_task_id_mismatch_rejected currently only
asserts that a ValueError is raised for ID mismatches; update it to also assert
the exception message contains the specific missing and extra IDs so regressions
that drop diagnostics are caught. Modify the pytest.raises match to look for the
missing plan ID ("sub-2") and the extra created ID ("sub-99") (or add an
explicit str(e) assertion inside the context) when constructing
DecompositionResult for the given DecompositionPlan and created_tasks, ensuring
the validator's diagnostic strings mention both IDs.

---

Outside diff comments:
In `@src/ai_company/communication/conflict_resolution/_helpers.py`:
- Around line 21-79: Split validation from extraction: add a small helper (e.g.
ensure_winner_in_conflict or validate_winner_present(conflict, winner_id)) that
performs the winner existence check, logs the CONFLICT_STRATEGY_ERROR and raises
ConflictStrategyError with the same context when missing; then simplify
find_losers to call that helper and only perform the tuple comprehension (losers
= tuple(pos for pos in conflict.positions if pos.agent_id != winner_id)), keep
the "no losers" warning/raise in find_losers as the only remaining unhappy-path
logic; update imports/refs accordingly.

In `@src/ai_company/communication/conflict_resolution/debate_strategy.py`:
- Around line 255-262: The reasoning message claims "highest seniority" even
when pick_highest_seniority resolved an equal-seniority tie via hierarchy;
update the JudgeDecision.reasoning to reflect which tiebreaker was used. After
calling pick_highest_seniority(conflict, hierarchy=self._hierarchy), check
whether the win was resolved from multiple agents with equal seniority (e.g.,
detect if conflict has multiple agents at best.agent_level or if
pick_highest_seniority can return/indicate a tie-break flag); if it was a
hierarchy tiebreak, set reasoning to something like "Debate fallback: hierarchy
tiebreak among equal-seniority agents — {best.agent_id} ({best.agent_level})
selected", otherwise keep "authority-based judging — {best.agent_id}
({best.agent_level}) has highest seniority". Ensure this logic is implemented
where JudgeDecision is constructed so the audit trail accurately distinguishes
pure seniority wins from hierarchy tiebreak wins.

In `@src/ai_company/communication/conflict_resolution/hybrid_strategy.py`:
- Around line 228-240: The hybrid fallback currently calls
pick_highest_seniority(conflict, hierarchy=self._hierarchy) without reusing the
AuthorityResolver's validation/tiebreak semantics; update the hybrid fallback to
first run the same hierarchy validation used by AuthorityResolver (e.g., invoke
the AuthorityResolver validation method or replicate its checks: ensure both
agents exist in self._hierarchy, detect equal seniority ties and
missing-ancestor cases via HierarchyResolver.get_ancestors() semantics) and only
then call pick_highest_seniority to build the ConflictResolution; if the
AuthorityResolver would have rejected/abstained (tie or missing agent), the
hybrid must not silently pick a winner but follow AuthorityResolver’s outcome
path (reject/abstain or escalate) before creating the RESOLVED_BY_HYBRID
ConflictResolution.

In `@src/ai_company/communication/meeting/orchestrator.py`:
- Around line 364-431: The _validate_inputs method is too large and mixes
token_budget checks with several participant checks; split it into smaller
focused validators: create a _validate_token_budget(meeting_id, token_budget)
that performs the positive check and logs/raises ValueError, and create a
_validate_participants(meeting_id, leader_id, participant_ids) that contains the
empty-participants, duplicate detection (use Counter), and
leader-in-participants checks and raises MeetingParticipantError with the same
context payloads; factor the common logging-and-raise pattern into a helper
(e.g., _log_and_raise or _log_participant_error) used by both validators, then
have _validate_inputs call these two new helpers to preserve behavior and
messages (keep names _validate_inputs, _validate_token_budget,
_validate_participants, and the logging helper to locate changes).

In `@src/ai_company/engine/decomposition/rollup.py`:
- Around line 44-59: The empty-input branch logs DECOMPOSITION_ROLLUP_COMPUTED
at WARNING without the derived_status and thus differs from the normal path and
misreports an operational fault; update the total == 0 branch to log the same
event shape as the normal path (include derived_status="zeroed" or the same
field name used elsewhere) and use the same log level as non-error creation
(change to DEBUG if normal path logs creation at DEBUG) before returning the
zeroed SubtaskStatusRollup(parent_task_id=..., total=0, completed=0, failed=0,
in_progress=0, blocked=0, cancelled=0) so the event payload and severity remain
consistent with SubtaskStatusRollup creation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7ba8cc0a-97c6-4a72-be1a-9f6a3c561cfd

📥 Commits

Reviewing files that changed from the base of the PR and between ee7caca and cf27048.

📒 Files selected for processing (55)
  • DESIGN_SPEC.md
  • README.md
  • src/ai_company/communication/bus_memory.py
  • src/ai_company/communication/conflict_resolution/_helpers.py
  • src/ai_company/communication/conflict_resolution/config.py
  • src/ai_company/communication/conflict_resolution/debate_strategy.py
  • src/ai_company/communication/conflict_resolution/hybrid_strategy.py
  • src/ai_company/communication/conflict_resolution/service.py
  • src/ai_company/communication/delegation/hierarchy.py
  • src/ai_company/communication/meeting/_parsing.py
  • src/ai_company/communication/meeting/_prompts.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • src/ai_company/communication/meeting/config.py
  • src/ai_company/communication/meeting/models.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/communication/meeting/position_papers.py
  • src/ai_company/communication/meeting/round_robin.py
  • src/ai_company/communication/meeting/structured_phases.py
  • src/ai_company/communication/messenger.py
  • src/ai_company/core/enums.py
  • src/ai_company/core/task.py
  • src/ai_company/engine/decomposition/models.py
  • src/ai_company/engine/decomposition/rollup.py
  • src/ai_company/engine/decomposition/service.py
  • src/ai_company/engine/parallel.py
  • src/ai_company/engine/routing/models.py
  • src/ai_company/engine/routing/scorer.py
  • src/ai_company/engine/routing/service.py
  • src/ai_company/observability/events/communication.py
  • src/ai_company/observability/events/task_routing.py
  • tests/integration/communication/test_meeting_integration.py
  • tests/unit/communication/conflict_resolution/test_authority_strategy.py
  • tests/unit/communication/conflict_resolution/test_config.py
  • tests/unit/communication/conflict_resolution/test_debate_strategy.py
  • tests/unit/communication/conflict_resolution/test_helpers.py
  • tests/unit/communication/conflict_resolution/test_hybrid_strategy.py
  • tests/unit/communication/delegation/test_hierarchy.py
  • tests/unit/communication/meeting/test_config.py
  • tests/unit/communication/meeting/test_enums.py
  • tests/unit/communication/meeting/test_errors.py
  • tests/unit/communication/meeting/test_models.py
  • tests/unit/communication/meeting/test_orchestrator.py
  • tests/unit/communication/meeting/test_parsing.py
  • tests/unit/communication/meeting/test_position_papers.py
  • tests/unit/communication/meeting/test_prompts.py
  • tests/unit/communication/meeting/test_protocol.py
  • tests/unit/communication/meeting/test_round_robin.py
  • tests/unit/communication/meeting/test_structured_phases.py
  • tests/unit/communication/meeting/test_token_tracker.py
  • tests/unit/communication/test_bus_memory.py
  • tests/unit/communication/test_enums.py
  • tests/unit/engine/test_decomposition_models.py
  • tests/unit/engine/test_decomposition_service.py
  • tests/unit/engine/test_routing_models.py
  • tests/unit/engine/test_routing_service.py
💤 Files with no reviewable changes (2)
  • src/ai_company/communication/conflict_resolution/config.py
  • src/ai_company/observability/events/communication.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use 'from future import annotations' in Python 3.14+ code
Use 'except A, B:' syntax without parentheses in exception handling for Python 3.14 (PEP 758)
Include type hints on all public functions; enforce with mypy strict mode
Use Google-style docstrings, required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones; use copy.deepcopy() for non-Pydantic internal collections and MappingProxyType for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models for runtime state
Use Pydantic v2 with BaseModel, model_validator, computed_field, and ConfigDict; use @computed_field for derived values instead of storing redundant fields
Use NotBlankStr from core.types for all identifier/name fields instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code
Enforce maximum line length of 88 characters
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files

Files:

  • tests/unit/communication/meeting/test_round_robin.py
  • tests/unit/communication/meeting/test_structured_phases.py
  • src/ai_company/observability/events/task_routing.py
  • tests/unit/communication/meeting/test_protocol.py
  • src/ai_company/core/task.py
  • src/ai_company/communication/meeting/models.py
  • tests/unit/communication/meeting/test_models.py
  • tests/unit/communication/meeting/test_prompts.py
  • src/ai_company/communication/conflict_resolution/hybrid_strategy.py
  • tests/unit/communication/delegation/test_hierarchy.py
  • src/ai_company/communication/delegation/hierarchy.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/engine/routing/scorer.py
  • src/ai_company/engine/decomposition/service.py
  • tests/unit/communication/meeting/test_parsing.py
  • tests/unit/engine/test_decomposition_models.py
  • src/ai_company/engine/routing/models.py
  • tests/unit/communication/meeting/test_position_papers.py
  • src/ai_company/communication/messenger.py
  • tests/unit/communication/meeting/test_orchestrator.py
  • tests/unit/communication/meeting/test_token_tracker.py
  • tests/unit/communication/conflict_resolution/test_authority_strategy.py
  • src/ai_company/engine/decomposition/models.py
  • src/ai_company/communication/meeting/config.py
  • src/ai_company/communication/conflict_resolution/service.py
  • src/ai_company/communication/meeting/_parsing.py
  • tests/integration/communication/test_meeting_integration.py
  • tests/unit/engine/test_decomposition_service.py
  • src/ai_company/communication/meeting/structured_phases.py
  • src/ai_company/communication/conflict_resolution/_helpers.py
  • tests/unit/communication/test_bus_memory.py
  • tests/unit/engine/test_routing_models.py
  • src/ai_company/core/enums.py
  • src/ai_company/communication/meeting/_prompts.py
  • tests/unit/communication/meeting/test_config.py
  • src/ai_company/engine/parallel.py
  • tests/unit/communication/meeting/test_errors.py
  • tests/unit/communication/test_enums.py
  • src/ai_company/communication/conflict_resolution/debate_strategy.py
  • src/ai_company/engine/decomposition/rollup.py
  • tests/unit/communication/conflict_resolution/test_config.py
  • src/ai_company/communication/meeting/position_papers.py
  • tests/unit/communication/conflict_resolution/test_helpers.py
  • tests/unit/communication/conflict_resolution/test_hybrid_strategy.py
  • tests/unit/communication/conflict_resolution/test_debate_strategy.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • tests/unit/communication/meeting/test_enums.py
  • tests/unit/engine/test_routing_service.py
  • src/ai_company/communication/meeting/round_robin.py
  • src/ai_company/communication/bus_memory.py
  • src/ai_company/engine/routing/service.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark unit tests with @pytest.mark.unit
Mark integration tests with @pytest.mark.integration
Mark end-to-end tests with @pytest.mark.e2e
Mark slow tests with @pytest.mark.slow
Prefer @pytest.mark.parametrize for testing similar cases
Use 'test-provider', 'test-small-001', etc. in test code instead of real vendor names

Files:

  • tests/unit/communication/meeting/test_round_robin.py
  • tests/unit/communication/meeting/test_structured_phases.py
  • tests/unit/communication/meeting/test_protocol.py
  • tests/unit/communication/meeting/test_models.py
  • tests/unit/communication/meeting/test_prompts.py
  • tests/unit/communication/delegation/test_hierarchy.py
  • tests/unit/communication/meeting/test_parsing.py
  • tests/unit/engine/test_decomposition_models.py
  • tests/unit/communication/meeting/test_position_papers.py
  • tests/unit/communication/meeting/test_orchestrator.py
  • tests/unit/communication/meeting/test_token_tracker.py
  • tests/unit/communication/conflict_resolution/test_authority_strategy.py
  • tests/integration/communication/test_meeting_integration.py
  • tests/unit/engine/test_decomposition_service.py
  • tests/unit/communication/test_bus_memory.py
  • tests/unit/engine/test_routing_models.py
  • tests/unit/communication/meeting/test_config.py
  • tests/unit/communication/meeting/test_errors.py
  • tests/unit/communication/test_enums.py
  • tests/unit/communication/conflict_resolution/test_config.py
  • tests/unit/communication/conflict_resolution/test_helpers.py
  • tests/unit/communication/conflict_resolution/test_hybrid_strategy.py
  • tests/unit/communication/conflict_resolution/test_debate_strategy.py
  • tests/unit/communication/meeting/test_enums.py
  • tests/unit/engine/test_routing_service.py
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names like 'example-provider', 'example-large-001', 'example-medium-001', 'example-small-001', 'large'/'medium'/'small'

Files:

  • tests/unit/communication/meeting/test_round_robin.py
  • tests/unit/communication/meeting/test_structured_phases.py
  • src/ai_company/observability/events/task_routing.py
  • tests/unit/communication/meeting/test_protocol.py
  • src/ai_company/core/task.py
  • src/ai_company/communication/meeting/models.py
  • tests/unit/communication/meeting/test_models.py
  • tests/unit/communication/meeting/test_prompts.py
  • src/ai_company/communication/conflict_resolution/hybrid_strategy.py
  • tests/unit/communication/delegation/test_hierarchy.py
  • src/ai_company/communication/delegation/hierarchy.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/engine/routing/scorer.py
  • src/ai_company/engine/decomposition/service.py
  • tests/unit/communication/meeting/test_parsing.py
  • tests/unit/engine/test_decomposition_models.py
  • src/ai_company/engine/routing/models.py
  • tests/unit/communication/meeting/test_position_papers.py
  • src/ai_company/communication/messenger.py
  • tests/unit/communication/meeting/test_orchestrator.py
  • tests/unit/communication/meeting/test_token_tracker.py
  • tests/unit/communication/conflict_resolution/test_authority_strategy.py
  • src/ai_company/engine/decomposition/models.py
  • src/ai_company/communication/meeting/config.py
  • src/ai_company/communication/conflict_resolution/service.py
  • src/ai_company/communication/meeting/_parsing.py
  • tests/integration/communication/test_meeting_integration.py
  • tests/unit/engine/test_decomposition_service.py
  • src/ai_company/communication/meeting/structured_phases.py
  • src/ai_company/communication/conflict_resolution/_helpers.py
  • tests/unit/communication/test_bus_memory.py
  • tests/unit/engine/test_routing_models.py
  • src/ai_company/core/enums.py
  • src/ai_company/communication/meeting/_prompts.py
  • tests/unit/communication/meeting/test_config.py
  • src/ai_company/engine/parallel.py
  • tests/unit/communication/meeting/test_errors.py
  • tests/unit/communication/test_enums.py
  • src/ai_company/communication/conflict_resolution/debate_strategy.py
  • src/ai_company/engine/decomposition/rollup.py
  • tests/unit/communication/conflict_resolution/test_config.py
  • src/ai_company/communication/meeting/position_papers.py
  • tests/unit/communication/conflict_resolution/test_helpers.py
  • tests/unit/communication/conflict_resolution/test_hybrid_strategy.py
  • tests/unit/communication/conflict_resolution/test_debate_strategy.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • tests/unit/communication/meeting/test_enums.py
  • tests/unit/engine/test_routing_service.py
  • src/ai_company/communication/meeting/round_robin.py
  • src/ai_company/communication/bus_memory.py
  • src/ai_company/engine/routing/service.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every module with business logic must import logger via 'from ai_company.observability import get_logger' and instantiate as 'logger = get_logger(name)'
Never use 'import logging' or 'logging.getLogger()' or 'print()' in application code
Always use 'logger' as the variable name for logging, not '_logger' or 'log'
Use event name constants from domain-specific modules under ai_company.observability.events for logging events
Always use structured logging with kwargs (logger.info(EVENT, key=value)), never formatted strings (logger.info('msg %s', val))
Log at WARNING or ERROR with context for all error paths before raising exceptions
Log at INFO for all state transitions
Log at DEBUG for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do not require logging
All provider calls go through BaseCompletionProvider which applies retry and rate limiting automatically

Files:

  • src/ai_company/observability/events/task_routing.py
  • src/ai_company/core/task.py
  • src/ai_company/communication/meeting/models.py
  • src/ai_company/communication/conflict_resolution/hybrid_strategy.py
  • src/ai_company/communication/delegation/hierarchy.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/engine/routing/scorer.py
  • src/ai_company/engine/decomposition/service.py
  • src/ai_company/engine/routing/models.py
  • src/ai_company/communication/messenger.py
  • src/ai_company/engine/decomposition/models.py
  • src/ai_company/communication/meeting/config.py
  • src/ai_company/communication/conflict_resolution/service.py
  • src/ai_company/communication/meeting/_parsing.py
  • src/ai_company/communication/meeting/structured_phases.py
  • src/ai_company/communication/conflict_resolution/_helpers.py
  • src/ai_company/core/enums.py
  • src/ai_company/communication/meeting/_prompts.py
  • src/ai_company/engine/parallel.py
  • src/ai_company/communication/conflict_resolution/debate_strategy.py
  • src/ai_company/engine/decomposition/rollup.py
  • src/ai_company/communication/meeting/position_papers.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • src/ai_company/communication/meeting/round_robin.py
  • src/ai_company/communication/bus_memory.py
  • src/ai_company/engine/routing/service.py
DESIGN_SPEC.md

📄 CodeRabbit inference engine (CLAUDE.md)

Update DESIGN_SPEC.md to reflect approved deviations from the specification

Files:

  • DESIGN_SPEC.md
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T09:48:46.483Z
Learning: Fix all valid issues found by review agents including pre-existing issues adjacent to PR changes; never defer or skip as out of scope
📚 Learning: 2026-03-08T09:48:46.483Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T09:48:46.483Z
Learning: Enforce 30-second timeout per test

Applied to files:

  • tests/unit/communication/meeting/test_round_robin.py
  • tests/unit/communication/meeting/test_structured_phases.py
  • tests/unit/communication/meeting/test_protocol.py
  • tests/unit/communication/meeting/test_models.py
  • tests/unit/communication/meeting/test_prompts.py
  • tests/unit/communication/meeting/test_position_papers.py
  • tests/unit/communication/meeting/test_token_tracker.py
  • tests/integration/communication/test_meeting_integration.py
  • tests/unit/communication/meeting/test_config.py
  • tests/unit/communication/meeting/test_errors.py
  • tests/unit/communication/meeting/test_enums.py
📚 Learning: 2026-03-08T09:48:46.483Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T09:48:46.483Z
Learning: Applies to tests/**/*.py : Mark slow tests with pytest.mark.slow

Applied to files:

  • tests/unit/communication/meeting/test_round_robin.py
  • tests/unit/communication/meeting/test_prompts.py
  • tests/unit/communication/meeting/test_position_papers.py
  • tests/integration/communication/test_meeting_integration.py
  • tests/unit/communication/meeting/test_config.py
📚 Learning: 2026-03-08T09:48:46.483Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T09:48:46.483Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events for logging events

Applied to files:

  • src/ai_company/engine/routing/scorer.py
📚 Learning: 2026-03-08T09:48:46.483Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T09:48:46.483Z
Learning: Applies to tests/**/*.py : Mark integration tests with pytest.mark.integration

Applied to files:

  • tests/integration/communication/test_meeting_integration.py
📚 Learning: 2026-03-08T09:48:46.483Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T09:48:46.483Z
Learning: Applies to DESIGN_SPEC.md : Update DESIGN_SPEC.md to reflect approved deviations from the specification

Applied to files:

  • DESIGN_SPEC.md
🧬 Code graph analysis (22)
tests/unit/communication/meeting/test_models.py (1)
src/ai_company/communication/meeting/models.py (4)
  • MeetingRecord (265-324)
  • MeetingContribution (99-128)
  • MeetingMinutes (153-262)
  • MeetingAgenda (77-96)
src/ai_company/communication/conflict_resolution/hybrid_strategy.py (2)
src/ai_company/communication/delegation/hierarchy.py (1)
  • HierarchyResolver (16-270)
src/ai_company/communication/conflict_resolution/_helpers.py (1)
  • pick_highest_seniority (136-163)
tests/unit/communication/delegation/test_hierarchy.py (1)
src/ai_company/communication/delegation/hierarchy.py (1)
  • get_lowest_common_manager (212-246)
src/ai_company/communication/meeting/orchestrator.py (3)
src/ai_company/communication/meeting/enums.py (1)
  • MeetingProtocolType (6-17)
src/ai_company/communication/meeting/protocol.py (1)
  • MeetingProtocol (61-100)
src/ai_company/communication/meeting/errors.py (1)
  • MeetingParticipantError (22-23)
tests/unit/communication/meeting/test_parsing.py (1)
src/ai_company/communication/meeting/_parsing.py (2)
  • parse_action_items (121-157)
  • parse_decisions (70-93)
tests/unit/engine/test_decomposition_models.py (1)
src/ai_company/engine/decomposition/models.py (3)
  • DecompositionPlan (66-122)
  • DecompositionResult (125-176)
  • derived_parent_status (228-255)
src/ai_company/engine/routing/models.py (3)
src/ai_company/core/agent.py (1)
  • AgentIdentity (246-304)
src/ai_company/core/enums.py (1)
  • CoordinationTopology (325-336)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/communication/messenger.py (3)
src/ai_company/communication/bus_memory.py (2)
  • subscribe (326-374)
  • unsubscribe (376-417)
src/ai_company/communication/bus_protocol.py (2)
  • subscribe (83-104)
  • unsubscribe (106-121)
src/ai_company/communication/subscription.py (1)
  • Subscription (9-22)
tests/unit/communication/meeting/test_orchestrator.py (3)
tests/unit/communication/meeting/conftest.py (3)
  • simple_agenda (83-98)
  • leader_id (102-104)
  • participant_ids (108-110)
src/ai_company/communication/meeting/models.py (1)
  • MeetingAgenda (77-96)
src/ai_company/communication/meeting/errors.py (1)
  • MeetingParticipantError (22-23)
src/ai_company/engine/decomposition/models.py (1)
src/ai_company/core/enums.py (1)
  • TaskStatus (165-191)
src/ai_company/communication/meeting/_parsing.py (2)
src/ai_company/communication/meeting/models.py (1)
  • ActionItem (131-150)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/communication/conflict_resolution/_helpers.py (2)
src/ai_company/communication/delegation/hierarchy.py (1)
  • get_ancestors (195-210)
src/ai_company/communication/errors.py (1)
  • ConflictStrategyError (107-108)
tests/unit/communication/test_bus_memory.py (1)
src/ai_company/communication/bus_memory.py (3)
  • subscribe (326-374)
  • receive (419-475)
  • unsubscribe (376-417)
src/ai_company/engine/parallel.py (1)
src/ai_company/engine/errors.py (1)
  • ParallelExecutionError (42-43)
tests/unit/communication/test_enums.py (1)
src/ai_company/communication/enums.py (1)
  • MessageType (6-20)
src/ai_company/communication/conflict_resolution/debate_strategy.py (4)
src/ai_company/communication/conflict_resolution/models.py (1)
  • Conflict (67-111)
src/ai_company/communication/conflict_resolution/protocol.py (1)
  • JudgeDecision (18-27)
src/ai_company/communication/conflict_resolution/_helpers.py (1)
  • pick_highest_seniority (136-163)
tests/unit/communication/conflict_resolution/conftest.py (1)
  • hierarchy (143-145)
src/ai_company/engine/decomposition/rollup.py (1)
src/ai_company/engine/decomposition/models.py (1)
  • SubtaskStatusRollup (179-255)
tests/unit/communication/conflict_resolution/test_config.py (1)
src/ai_company/communication/conflict_resolution/config.py (1)
  • DebateConfig (9-22)
src/ai_company/communication/meeting/position_papers.py (1)
src/ai_company/communication/meeting/_parsing.py (2)
  • parse_action_items (121-157)
  • parse_decisions (70-93)
src/ai_company/communication/meeting/_token_tracker.py (1)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
tests/unit/engine/test_routing_service.py (1)
src/ai_company/engine/routing/service.py (2)
  • TaskRoutingService (33-192)
  • route (51-114)
src/ai_company/communication/meeting/round_robin.py (2)
src/ai_company/communication/meeting/_parsing.py (2)
  • parse_action_items (121-157)
  • parse_decisions (70-93)
src/ai_company/communication/meeting/models.py (1)
  • ActionItem (131-150)
🔇 Additional comments (44)
src/ai_company/engine/parallel.py (1)

181-183: LGTM: explicit exception chaining is preserved.

This is a formatting-only change; ParallelExecutionError still keeps release_error as its cause and the control flow remains unchanged.

tests/integration/communication/test_meeting_integration.py (1)

37-38: LGTM!

The module-level timeout marker correctly enforces a 30-second limit per test, aligning with the repository's test timeout policy. Based on learnings: "Enforce 30-second timeout per test".

src/ai_company/observability/events/task_routing.py (1)

14-14: LGTM!

The new event constant follows the established naming convention and typing pattern used by other constants in this module. It properly centralizes the event key for invalid scorer configuration logging.

src/ai_company/engine/routing/models.py (4)

7-20: LGTM!

Logger setup and imports follow the coding guidelines: using get_logger(__name__), importing event constants from observability.events, and naming the logger variable logger.


72-88: LGTM!

The addition of structured warning logging before raising the exception follows the coding guidelines for error-path logging. The log includes relevant context (subtask_id, error) using kwargs.


112-157: LGTM!

The enhanced validation is well-structured:

  • Uses Counter for efficient O(n) duplicate detection
  • Provides deterministic error messages via sorted()
  • Logs warnings with relevant context before raising exceptions
  • Covers all three validation cases: duplicates in decisions, duplicates in unroutable, and overlap between them

191-206: LGTM!

The warning logging before raising follows the coding guidelines. The error message includes the field name, providing sufficient context.

src/ai_company/engine/routing/service.py (1)

75-87: LGTM!

This early consistency check addresses a critical fix from the PR objectives. The validation:

  • Detects mismatches between parent_task.id and plan.parent_task_id before routing proceeds
  • Logs a structured warning with all relevant context
  • Raises a clear ValueError with both IDs for debugging
src/ai_company/engine/routing/scorer.py (1)

12-14: LGTM!

Replacing the literal string with TASK_ROUTING_SCORER_INVALID_CONFIG improves consistency by using centralized event constants from the observability module. Based on learnings: "Use event name constants from domain-specific modules under ai_company.observability.events for logging events."

Also applies to: 61-66

tests/unit/engine/test_routing_models.py (2)

138-163: LGTM!

The test properly validates duplicate detection within decisions. It uses the correct marker, creates a realistic scenario with two decisions having the same subtask_id, and asserts the expected error message.


165-173: LGTM!

The test properly validates duplicate detection within the unroutable list. It's appropriately minimal since unroutable only contains string IDs and doesn't require the agent fixture.

tests/unit/engine/test_routing_service.py (1)

296-307: LGTM!

The test properly validates the parent task ID mismatch check. It creates a mismatched scenario using different IDs for the task and decomposition plan, then asserts the expected ValueError.

tests/unit/communication/test_bus_memory.py (1)

291-315: LGTM!

The test correctly validates that multiple concurrent receive() calls are all woken when unsubscribe() is invoked. The structure mirrors the existing single-receiver test, and the assertions properly verify that all three receivers returned None. This provides good coverage for the new per-(channel, subscriber) waiter tracking introduced in bus_memory.py.

src/ai_company/communication/messenger.py (2)

17-17: LGTM!

The NotBlankStr import and its usage in the public API methods (subscribe, unsubscribe, receive) aligns with the coding guidelines for identifier/name fields and matches the bus_protocol.py interface. This provides validation at the API boundary.


289-294: LGTM!

The updated docstring accurately documents all three conditions under which receive() returns None, matching the implementation in bus_memory.py. This improves API clarity for consumers.

src/ai_company/communication/bus_memory.py (4)

102-102: LGTM!

The _waiters dictionary is correctly typed and initialized to track the count of concurrent receive() calls per (channel, subscriber) pair.


402-412: LGTM!

The sentinel wake-up logic correctly:

  1. Pops the waiter count and uses max(1, pending) to ensure at least one sentinel is sent (handles edge cases)
  2. Relies on unbounded queues (maxsize=0) so put_nowait cannot raise QueueFull
  3. Cleans up the _waiters entry along with the queue

This properly implements the "wake all concurrent receivers" requirement.


477-509: LGTM!

The async _log_receive_null method correctly:

  1. Acquires the lock to get a consistent snapshot of bus state
  2. Checks conditions in the right order (shutdown → unsubscribed → timeout)
  3. Uses the timeout_seconds parameter name consistently

This eliminates the race condition where the logged reason could be incorrect.


438-443: LGTM!

The updated docstring accurately documents all three conditions for returning None and is consistent with the corresponding documentation in messenger.py.

src/ai_company/core/task.py (1)

73-76: Schema addition is consistent.

These fields are documented, typed, and defaulted in a way that keeps existing Task construction paths backward compatible.

Also applies to: 147-154

src/ai_company/engine/decomposition/service.py (1)

121-133: Dependency propagation now preserves the subtask DAG on created tasks.

This keeps created_tasks consistent with both the decomposition plan and the emitted dependency_edges.

tests/unit/engine/test_decomposition_service.py (1)

273-288: Good regression coverage for dependency propagation.

This would catch the exact gap the service fix is addressing by asserting the created Task.dependencies match the plan.

tests/unit/engine/test_decomposition_models.py (1)

387-401: Good semantic lock-in for mixed completed/cancelled rollups.

This makes the new terminal-state behavior explicit and helps prevent the aggregate status from drifting back to COMPLETED.

src/ai_company/engine/decomposition/models.py (1)

157-165: Add validator to ensure dependency_edges exactly match plan-declared dependencies.

Currently, the DecompositionResult validator checks that edge endpoint IDs are known, but dependency_edges can omit or exclude declared plan dependencies since the edge-set is derived directly from plan.subtasks[*].dependencies in the service (see service.py lines 144–153). Add a validator to enforce that the edge set exactly mirrors what the plan declares, and add a regression test to prevent future inconsistencies.

tests/unit/communication/meeting/test_prompts.py (1)

8-9: LGTM!

Module-level timeout marker correctly enforces the 30-second timeout policy across all tests in this file. Based on learnings: "Enforce 30-second timeout per test".

src/ai_company/communication/meeting/config.py (1)

65-66: LGTM!

Docstring line wrap maintains readability within the 88-character line limit.

tests/unit/communication/test_enums.py (1)

31-31: LGTM!

The assertion correctly validates the new MEETING_CONTRIBUTION enum member. The member count assertion on line 20 (len(MessageType) == 9) is consistent with the enum definition showing 9 members in the relevant code snippet.

tests/unit/communication/meeting/test_config.py (1)

14-15: LGTM!

Module-level timeout marker correctly enforces the 30-second timeout policy. Based on learnings: "Enforce 30-second timeout per test".

tests/unit/communication/meeting/test_position_papers.py (1)

22-23: LGTM!

Module-level timeout marker correctly enforces the 30-second timeout policy for all async protocol tests. Based on learnings: "Enforce 30-second timeout per test".

tests/unit/communication/meeting/test_protocol.py (1)

17-18: LGTM!

Module-level timeout marker correctly enforces the 30-second timeout policy. Based on learnings: "Enforce 30-second timeout per test".

tests/unit/communication/meeting/test_token_tracker.py (1)

7-8: LGTM!

Module-level timeout marker correctly enforces the 30-second timeout policy. Based on learnings: "Enforce 30-second timeout per test".

tests/unit/communication/meeting/test_errors.py (1)

14-15: LGTM!

Module-level timeout marker correctly enforces the 30-second timeout policy. Based on learnings: "Enforce 30-second timeout per test".

src/ai_company/communication/meeting/_parsing.py (3)

70-93: LGTM!

The parse_decisions function correctly extracts decisions from LLM summary text. The logic properly handles missing sections, filters empty items, and returns an immutable tuple.


121-157: LGTM!

The parse_action_items function properly handles edge cases:

  • Empty sections return empty tuples
  • Empty raw text is skipped
  • Empty descriptions after assignee extraction are handled by falling back to the original text
  • Returns immutable tuples with ActionItem instances

The defensive checks at lines 143-144 and 147-148 ensure NotBlankStr validation won't fail on empty descriptions.


11-13: 🧹 Nitpick | 🔵 Trivial

Unused logger import.

The logger is imported and instantiated but never used in this module. The parsing functions are pure transformations that return empty tuples on missing sections rather than logging warnings. Consider either:

  1. Removing the unused logger if no logging is intended, or
  2. Adding DEBUG-level logging for when sections are not found (to aid debugging LLM response parsing issues).
⛔ Skipped due to learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T09:48:46.483Z
Learning: Applies to src/**/*.py : Every module with business logic must import logger via 'from ai_company.observability import get_logger' and instantiate as 'logger = get_logger(__name__)'
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T09:48:46.483Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events for logging events
src/ai_company/communication/meeting/structured_phases.py (5)

12-15: LGTM!

The import of parsing utilities from the internal _parsing module is correct and follows the established naming convention for internal modules.


279-281: LGTM!

The integration of parse_decisions and parse_action_items correctly extracts structured data from the synthesized summary. Empty tuples are gracefully returned if sections are not found, which is appropriate behavior for MeetingMinutes construction.


401-408: LGTM!

Excellent improvement: replacing assertions with explicit RuntimeError exceptions, with ERROR-level logging before raising. This follows coding guidelines for error paths and provides better diagnostics than bare assertions.


548-599: LGTM!

The budget handling improvements correctly implement the synthesis reserve pattern:

  1. Lines 548-551: Reserves 20% of remaining budget for synthesis before allocating to discussion
  2. Line 563: Introduces discussion_used counter for precise tracking within the discussion phase
  3. Line 566: Adds compound check to prevent exceeding discussion budget
  4. Lines 589-593: Dynamically calculates remaining budget per agent, ensuring no agent gets zero tokens while respecting the overall cap
  5. Line 599: Correctly tracks both input and output tokens

This ensures synthesis always has budget available, addressing the critical fix mentioned in PR objectives.


53-54: LGTM!

Clear documentation of the synthesis reserve fraction constant.

DESIGN_SPEC.md (4)

653-653: LGTM!

Documentation correctly describes the new shared parsing utilities in _parsing.py, aligning with the implementation.


2447-2447: LGTM!

The file structure documentation correctly reflects the addition of _parsing.py with an accurate description of its purpose.


2649-2649: LGTM!

The conflict resolution convention documentation is updated to reflect the implementation details, including the build_dissent_records() method (plural), hierarchy proximity tiebreaker, and shared find_losers utility.


605-606: LGTM!

Minor documentation wording improvements for the Structured Debate + Judge strategy description.

Comment on lines +24 to +27
_ANY_HEADER_RE = re.compile(
r"^#+\s+\S|^\S.*:\s*$",
re.MULTILINE,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

_ANY_HEADER_RE pattern may prematurely truncate sections.

The pattern ^\S.*:\s*$ will match any line starting with a non-whitespace character and ending with a colon. This could match lines like "Note:" or "Example:" within a section body, causing premature section truncation. Consider whether this is intentional or if you need a more restrictive pattern (e.g., requiring the colon-terminated line to be followed by content on subsequent lines).

Given LLM output is semi-structured, this behavior may be acceptable as a best-effort heuristic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/_parsing.py` around lines 24 - 27, _
ANY_HEADER_RE currently treats any line that ends with a colon as a header,
which can prematurely split sections (matches things like "Note:" inside
bodies); update the _ANY_HEADER_RE usage so the colon-terminated alternative
only counts as a header when it is followed by a non-blank next line (use a
positive lookahead to assert the next line starts with a non-whitespace
character), keeping the existing markdown-hash header branch intact; modify the
regex assigned to _ANY_HEADER_RE accordingly and keep re.MULTILINE.

Comment on lines +282 to +289
if not all(r is not None for r in results):
msg = f"Expected {n} position papers but some slots are None"
logger.error(msg, meeting_id=meeting_id)
raise RuntimeError(msg)
if not all(c is not None for c in contrib_results):
msg = f"Expected {n} contributions but some slots are None"
logger.error(msg, meeting_id=meeting_id)
raise RuntimeError(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a stable event constant for these error logs.

logger.error(msg, ...) breaks the event-based logging contract in src/ code and makes these invariant failures harder to query consistently. Emit a fixed event name and put the human-readable detail in structured fields instead.

As per coding guidelines, "Use event name constants from domain-specific modules under ai_company.observability.events for logging events" and "Always use structured logging with kwargs (logger.info(EVENT, key=value)), never formatted strings (logger.info('msg %s', val))".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/position_papers.py` around lines 282 -
289, Replace the free-form logger.error calls in the two invariant checks so
they use stable event constants from ai_company.observability.events (e.g.,
POSITION_PAPERS_MISSING and CONTRIBUTIONS_MISSING) and structured kwargs rather
than formatted strings: for the results check (variable names results, n,
meeting_id) call logger.error(POSITION_PAPERS_MISSING, detail=msg,
meeting_id=meeting_id) (after importing the constant) and similarly for
contrib_results use logger.error(CONTRIBUTIONS_MISSING, detail=msg,
meeting_id=meeting_id); keep raising RuntimeError(msg) but ensure logging uses
the event constant and structured fields instead of logger.error(msg, ...).

Comment on lines +165 to +166
decisions = parse_decisions(summary)
action_items = parse_action_items(summary)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Require explicit section headers before parsing the summary.

parse_decisions() and parse_action_items() only extract from Decisions / Action Items sections, but _build_summary_prompt() still asks for plain numbered/bulleted lists. A leader response that follows the current prompt exactly can leave both parsed fields empty.

Suggested prompt contract update
     parts.append(
-        "Please summarize this meeting. List the key decisions made "
-        "and any action items with assignees. Format decisions as a "
-        "numbered list and action items as a bulleted list."
+        "Please summarize this meeting using exactly these sections:\n"
+        "Decisions:\n"
+        "1. <decision>\n"
+        "2. <decision>\n\n"
+        "Action Items:\n"
+        "- <action item> (assigned to <agent_id>)\n"
+        "- <action item>"
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/round_robin.py` around lines 165 - 166,
The code calls parse_decisions(summary) and parse_action_items(summary) but the
prompt produced by _build_summary_prompt() does not require "Decisions:" or
"Action Items:" headers, so leaders can return lists that the parsers miss;
update _build_summary_prompt() to explicitly require distinct "Decisions:" and
"Action Items:" section headers (with examples) and then add a small guard where
decisions = parse_decisions(summary) / action_items =
parse_action_items(summary) are invoked to validate the summary contains those
headers (e.g., check for the literal "Decisions:" and "Action Items:") and if
missing, either request the model to reformat or log/raise a clear parsing error
so empty results are not silently accepted.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR addresses 46 post-merge review findings from PRs #164#167, delivering a broad set of correctness, safety, and observability improvements across the meeting protocol, conflict resolution, message bus, decomposition, and routing subsystems. The changes are generally well-structured and thoroughly tested (3,634 passing unit tests, 18 new parsing tests, updated integration tests).

Key highlights:

  • Meeting protocol LLM parsing (_parsing.py): New shared module extracts structured decisions and action items from LLM synthesis text. The assigned? regex inadvertently makes d optional, so "assign to X" in an item description will silently extract a false assignee — this should be assigned\s+to.
  • Message bus concurrency (bus_memory.py): _waiters counter correctly tracks concurrent receive() callers and wakes all of them on unsubscribe(). The lock asymmetry (increment inside lock, decrement in finally outside) is safe in asyncio's single-threaded scheduler.
  • Token budget enforcement: MeetingMinutes._validate_token_aggregates enforces that reported totals equal the sum of contributions, and StructuredPhasesProtocol now reserves 20% of the remaining budget before discussion to guarantee the synthesis phase can run.
  • Conflict resolution correctness: find_losers() validates that winning_agent_id is actually present in the conflict positions before proceeding; pick_highest_seniority() uses hierarchy depth as a tiebreaker; O(n) .index() lookups in compare_seniority replaced with a precomputed O(1) rank dict.
  • Decomposition correctness: DecompositionResult now cross-checks task ID sets (not just counts); mixed COMPLETED+CANCELLED rollup correctly returns CANCELLED; dependency fields are now copied to created Task objects; missing early return in StatusRollup.compute for empty input.
  • Routing and observability: Duplicate subtask ID checks added to RoutingResult; centralized event constants replace ad-hoc strings; routing service now validates parent_task.id == plan.parent_task_id.

Confidence Score: 4/5

  • Safe to merge after fixing the assigned? regex to prevent silent false-positive assignee extraction in action items.
  • This PR is safe to merge with one minor logic fix recommended in the new _parsing.py module. The vast majority of changes are correct, well-tested fixes for previously identified issues. The only logic-level concern is the assigned? regex in _parsing.py:39 — the optional d can produce false-positive assignee extraction for action items whose descriptions contain "assign to". This would not cause a crash or data loss but would silently parse wrong assignee IDs in edge-case LLM outputs. All other changes are sound.
  • src/ai_company/communication/meeting/_parsing.py (fix assigned? regex to use assigned\s+to)

Last reviewed commit: 1317bea

- Fix regex patterns in meeting _parsing.py to prevent false header
  matches on list items and capture multi-line list items
- Change MEETING_BUDGET_EXHAUSTED to MEETING_VALIDATION_FAILED for
  negative token validation in _token_tracker.py
- Add assignee validation against meeting participants in all 3
  protocols (prompt injection defense)
- Fix _waiters decrement in bus_memory.py to avoid orphan entries
  after unsubscribe
- Add empty-contributions-must-have-zero-totals validation in
  MeetingMinutes
- Use MEETING_INTERNAL_ERROR event constant instead of dynamic
  f-string event names
- Upgrade except* log level from debug to warning in parallel.py
- Add _known_agents existence check in hierarchy get_lowest_common_manager
- Import _MIN_POSITIONS from models instead of redefining in
  conflict resolution service
- Update DESIGN_SPEC.md and README.md for accuracy
- Add/update tests for all changes
@Aureliolo Aureliolo merged commit 3bf897a into main Mar 8, 2026
4 checks passed
@Aureliolo Aureliolo deleted the fix/post-merge-feedback-2 branch March 8, 2026 12:46
# Pattern for "assignee: <name>" or "(assigned to <name>)" at end of line
_ASSIGNEE_RE = re.compile(
r"(?:"
r"\(?assigned?\s+to:?\s*(.+?)\)?"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigned? regex matches "assign to X" unintentionally

The d in assigned? is optional, so the pattern also matches assign to <name> (without the past-tense d). This means any action item whose description contains the phrase "assign to" (e.g., - We need to assign to the platform team) will have its assignee silently extracted as the platform team and its description truncated — even though the item was not explicitly assigned.

The canonical form from the prompt template is (assigned to <agent_id>), so making d optional is unnecessarily permissive. The fix is to use assigned\s+to instead:

Suggested change
r"\(?assigned?\s+to:?\s*(.+?)\)?"
r"\(?assigned\s+to:?\s*(.+?)\)?"
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/communication/meeting/_parsing.py
Line: 39

Comment:
**`assigned?` regex matches "assign to X" unintentionally**

The `d` in `assigned?` is optional, so the pattern also matches `assign to <name>` (without the past-tense `d`). This means any action item whose description contains the phrase "assign to" (e.g., `- We need to assign to the platform team`) will have its assignee silently extracted as `the platform team` and its description truncated — even though the item was not explicitly assigned.

The canonical form from the prompt template is `(assigned to <agent_id>)`, so making `d` optional is unnecessarily permissive. The fix is to use `assigned\s+to` instead:

```suggestion
    r"\(?assigned\s+to:?\s*(.+?)\)?"
```

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 8, 2026
PR #174 — git_worktree.py:
- _validate_git_ref now accepts error_cls/event params so merge context
  raises WorkspaceMergeError and teardown raises WorkspaceCleanupError
- _run_git catches asyncio.CancelledError to kill subprocess before
  re-raising, preventing orphaned git processes

PR #172 — task assignment:
- TaskAssignmentConfig.strategy validated against known strategy names
- max_concurrent_tasks_per_agent now enforced in _score_and_filter_candidates
  via new AssignmentRequest.max_concurrent_tasks field
- TaskAssignmentStrategy protocol docstring documents error signaling contract

PR #171 — worktree skill:
- rebase uses --left-right --count with triple-dot to detect behind-main
- setup reuse path uses correct git worktree add (without -b)
- setup handles dirty working tree with stash/abort prompt
- status table shows both ahead and behind counts
- tree command provides circular dependency recovery guidance

PR #170 — meeting parsing:
- Fix assigned? regex to assigned (prevents false-positive assignee
  extraction from "assign to X" in action item descriptions)
Aureliolo added a commit that referenced this pull request Mar 8, 2026
…176)

## Summary

- Fix CI failures on main: 2 test assertion mismatches in cost-optimized
assignment tests + mypy `attr-defined` error in strategy registry test
- Address all Greptile post-merge review findings across PRs #170#175
(14 fixes total)

### PR #175 — Test assertion fixes (CI blockers)
- `"no cost data"` → `"insufficient cost data"` to match implementation
wording
- `unknown-dev` → `known-dev` winner assertion (all-or-nothing fallback,
sort stability)
- `getattr()` for `_scorer` access on protocol type (Windows/Linux mypy
difference)

### PR #174 — Workspace isolation
- `_validate_git_ref` raises context-appropriate exception types
(`WorkspaceMergeError` in merge, `WorkspaceCleanupError` in teardown)
- `_run_git` catches `asyncio.CancelledError` to kill subprocess before
re-raising (prevents orphaned git processes)

### PR #172 — Task assignment
- `TaskAssignmentConfig.strategy` validated against 6 known strategy
names
- `max_concurrent_tasks_per_agent` enforced via new
`AssignmentRequest.max_concurrent_tasks` field in
`_score_and_filter_candidates`
- `TaskAssignmentStrategy` protocol docstring documents error signaling
contract (raises vs `selected=None`)

### PR #171 — Worktree skill
- `rebase` uses `--left-right --count` with triple-dot to detect
behind-main worktrees
- `setup` reuse path uses `git worktree add` without `-b` for existing
branches
- `setup` handles dirty working tree with stash/abort prompt
- `status` table shows both ahead and behind counts
- `tree` provides circular dependency recovery guidance

### PR #170 — Meeting parsing
- `assigned?` → `assigned` regex fix (prevents false-positive assignee
extraction from "assign to X")

## Test plan

- [x] All 3988 tests pass (10 new tests added)
- [x] mypy strict: 0 errors (463 source files)
- [x] ruff lint + format: all clean
- [x] Coverage: 96.53% (threshold: 80%)
- [x] Pre-commit hooks pass

## Review coverage

Quick mode — automated checks only (lint, type-check, tests, coverage).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: address post-merge review feedback from PRs #164-#167

2 participants