fix: address post-merge review feedback from PRs #164-#167 by Aureliolo · Pull Request #170 · Aureliolo/synthorg

Aureliolo · 2026-03-08T12:14:16Z

Summary

46 findings from external reviewers (Copilot, Greptile, CodeRabbit, Gemini) on PRs fix: incorporate post-merge feedback + pre-PR review fixes #164-feat: implement meeting protocol system (#123) #167, all addressed
4 critical: Parse decisions/action_items from LLM synthesis in all 3 meeting protocols; validate winning_agent_id in find_losers()
17 major: Token budget guards, duplicate participant rejection, frozen registry, hierarchy tiebreakers, dependency copying, wake-all-waiters on unsubscribe, and more
15 minor: Duplicate log removal, assert→raise, traceback preservation, O(1) seniority lookups, NotBlankStr typing, routing validation ordering
4 trivial: Centralized event constants, docstring/spec fixes
6 test/doc gaps: New test_parsing.py (18 tests), expanded tests across 7 modules, timeout markers, spec name corrections
Pre-PR review fixes: List-item regex crossing line boundaries, parent_task_id validation ordering, circular exception cause, dead code removal

Test plan

Closes #169

Critical (C1-C4): - Parse decisions/action_items from LLM synthesis in all 3 meeting protocols - Validate winning_agent_id exists in find_losers() before computing losers Major (M1-M17): - Guard summary budget reserve when leader_summarizes=False - Add synthesis sub-reserve in structured phases discussion - Reject duplicate participant_ids in meeting orchestrator - Freeze protocol registry with MappingProxyType - Warn when token tracker exceeds budget - Add hierarchy tiebreaker to pick_highest_seniority() - Wire hierarchy into debate/hybrid authority fallbacks - Fast-path get_lowest_common_manager(a, a) → a - Validate _SENIORITY_ORDER matches enum members at import - Remove dead max_tokens_per_argument config field - Verify task IDs match plan subtask IDs in DecompositionResult - Return CANCELLED for mixed completed+cancelled terminal states - Fix double-logging in rollup compute() for empty case - Copy subtask dependencies from plan to created Tasks - Reject duplicate subtask IDs in RoutingResult - Wake all pending waiters on unsubscribe (not just one) Minor (m1-m15): - Remove duplicate MEETING_CONFLICT_DETECTED log events - Replace assert with explicit raises in meeting protocols - Include presenter_id in formatted agenda prompt - Validate token aggregates in MeetingMinutes - Require non-empty error_message for FAILED/BUDGET_EXHAUSTED - Move _MIN_POSITIONS to local constant in service.py - Precompute seniority rank dict for O(1) lookups - Remove dead asyncio.QueueFull catch on unbounded queue - Fix racy state check in _log_receive_null (acquire lock) - Type channel_name as NotBlankStr in messenger - Document unsubscribe as None return path in receive() - Preserve traceback context in parallel.py re-raise - Validate parent_task.id matches plan.parent_task_id - Add logging before raises in routing model validators Trivial (t1-t4): - Use centralized event constant in routing scorer - Add task_structure/coordination_topology to Task docstring - Fix DESIGN_SPEC.md model/function names to match code - Fix StructuredPhasesConfig docstring Tests (T1-T5): - Assert MEETING_CONTRIBUTION enum value - Add timeout markers to all meeting test modules - Add 3+ participant test for authority/debate strategies - Remove dead max_tokens_per_argument test references - Update HybridResolver tests for new hierarchy parameter Closes #169

- Fix list-item regex crossing line boundaries (\s* → [^\S\n]*) - Move parent_task_id validation before empty-agents early return - Fix circular exception cause in parallel.py re-raise - Remove unused COMM_UNSUBSCRIBE_SENTINEL_FAILED constant - Use NotBlankStr for error_message field (replaces manual check) - Add logger + logging before raises in parsing/position_papers/structured_phases - Fix import ordering in rollup.py - Remove dead max_tokens_per_argument from DESIGN_SPEC.md examples - Correct M3 status in README.md - Improve docstrings across bus_memory, helpers, hybrid_strategy, orchestrator - Add test_parsing.py (18 tests) + expand tests in 7 existing modules

github-actions · 2026-03-08T12:14:25Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-03-08T12:14:32Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3725483c-784c-46b5-a522-bc80cc2665d6

📥 Commits

Reviewing files that changed from the base of the PR and between cf27048 and 1317bea.

📒 Files selected for processing (19)

DESIGN_SPEC.md
README.md
src/ai_company/communication/bus_memory.py
src/ai_company/communication/delegation/hierarchy.py
src/ai_company/communication/meeting/_parsing.py
src/ai_company/communication/meeting/_token_tracker.py
src/ai_company/communication/meeting/models.py
src/ai_company/communication/meeting/orchestrator.py
src/ai_company/communication/meeting/position_papers.py
src/ai_company/communication/meeting/round_robin.py
src/ai_company/communication/meeting/structured_phases.py
src/ai_company/engine/decomposition/models.py
src/ai_company/engine/parallel.py
src/ai_company/observability/events/meeting.py
tests/unit/communication/delegation/test_hierarchy.py
tests/unit/communication/meeting/test_models.py
tests/unit/communication/meeting/test_prompts.py
tests/unit/engine/test_decomposition_models.py
tests/unit/engine/test_decomposition_rollup.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Meeting summaries now automatically extract decisions and action items from discussions.
- Expanded conflict resolution strategies including debate, hybrid, authority, and human escalation approaches.
- Tasks now support structure classification and coordination topology specification.
Bug Fixes
- Message bus now properly handles multiple concurrent message receivers during unsubscribe.
- Added validation for task routing and dependency consistency.
Refactor
- Simplified conflict resolution logic and improved validation throughout communication systems.
- Enhanced token tracking and observability in meetings.

Walkthrough

This PR addresses 40+ post-merge review feedback items from four recently-merged PRs by fixing critical validation gaps, implementing missing parsing logic for meeting decisions/action items, adding hierarchy-based tiebreaking to conflict resolution, improving waiter handling in message bus, and enhancing decomposition/routing validation with proper ID consistency checks.

Changes

Cohort / File(s)	Summary
Conflict Resolution — Hierarchy Tiebreaking `src/ai_company/communication/conflict_resolution/_helpers.py`, `debate_strategy.py`, `hybrid_strategy.py`	Added hierarchy-based tiebreaking to `pick_highest_seniority()` when seniority is equal; updated `DebateResolver._authority_fallback()` to instance method and `HybridResolver` to accept and use hierarchy; added validation in `find_losers()` to verify winning agent exists.
Conflict Resolution — Config & Simplification `src/ai_company/communication/conflict_resolution/config.py`	Removed dead `max_tokens_per_argument` field from `DebateConfig` and `HybridConfig`.
Meeting Parsing & Decision Extraction `src/ai_company/communication/meeting/_parsing.py`, `position_papers.py`, `round_robin.py`, `structured_phases.py`	Added new public parsing module with `parse_decisions()` and `parse_action_items()` functions; integrated parsing into three protocol implementations to populate previously-empty `MeetingMinutes.decisions` and `action_items` fields; updated synthesis prompts with exact section headers.
Meeting Validation & Infrastructure `src/ai_company/communication/meeting/orchestrator.py`, `models.py`, `_token_tracker.py`	Added duplicate participant detection with `MappingProxyType` immutable registry; enhanced `MeetingMinutes` validation for token aggregates; added budget-exhaustion and validation-failure logging to `TokenTracker`; introduced `MEETING_INTERNAL_ERROR` event constant.
Meeting Models & Prompts `src/ai_company/communication/meeting/_prompts.py`, `config.py`	Added presenter metadata to formatted agenda items; improved docstrings.
Message Bus — Concurrent Waiter Handling `src/ai_company/communication/bus_memory.py`	Replaced single-waiter sentinel wake-up with per-subscription waiter counting via `_waiters` dict; made `_log_receive_null()` async to safely determine shutdown/unsubscribe state; removed dead `COMM_UNSUBSCRIBE_SENTINEL_FAILED` event.
Messenger API Surface `src/ai_company/communication/messenger.py`	Updated `subscribe()`, `unsubscribe()`, and `receive()` signatures to use `NotBlankStr` for channel names; expanded docstring for `None` return conditions.
Hierarchy Fast-Path `src/ai_company/communication/delegation/hierarchy.py`	Added fast-path in `get_lowest_common_manager()` for identical agents; added `_known_agents` tracking for validation.
Core Models `src/ai_company/core/task.py`, `enums.py`	Added `task_structure` and `coordination_topology` fields to `Task`; added validation and O(1) rank lookup for `_SENIORITY_ORDER` in enum comparison.
Decomposition Validation `src/ai_company/engine/decomposition/models.py`, `rollup.py`, `service.py`	Added ID-set matching validation in `DecompositionResult` beyond count; changed `derived_parent_status` to return `CANCELLED` for mixed completed+cancelled states; added early-return for empty subtasks in rollup; propagate dependencies from decomposition plan to created tasks.
Routing Validation `src/ai_company/engine/routing/models.py`, `service.py`, `scorer.py`	Added duplicate ID detection within decisions/unroutable and overlaps between them; added parent task ID consistency check; replaced string literal event key with centralized constant `TASK_ROUTING_SCORER_INVALID_CONFIG`.
Engine Parallel & Logging `src/ai_company/engine/parallel.py`	Reformatted multi-line error construction; elevated suppressed exception group logging from debug to warning.
Observability Events `src/ai_company/observability/events/communication.py`, `task_routing.py`, `meeting.py`	Removed `COMM_UNSUBSCRIBE_SENTINEL_FAILED`; added `TASK_ROUTING_SCORER_INVALID_CONFIG` and `MEETING_INTERNAL_ERROR` constants.
Documentation & Status `DESIGN_SPEC.md`, `README.md`	Updated project status to reflect M3/M4 in-progress; no API changes.
Test Coverage — Communication `tests/unit/communication/test_bus_memory.py`, `test_enums.py`, `meeting/*.py`	Added test for multiple concurrent unsubscribe wake-ups; added `MEETING_CONTRIBUTION` enum assertion; added module-level 30-second timeout markers across meeting tests; added duplicate participant validation test; added comprehensive parsing tests for decisions/action items.
Test Coverage — Conflict Resolution `tests/unit/communication/conflict_resolution/test_*.py`	Added tests for three-participant authority scenarios; updated tests to verify hierarchy tiebreaking in `pick_highest_seniority()`; removed `max_tokens_per_argument` assertions; added validation tests for winning agent existence.
Test Coverage — Decomposition & Routing `tests/unit/engine/test_decomposition_.py`, `test_routing_.py`	Added tests for task ID mismatch rejection; added derived parent status CANCELLED assertion; added dependency propagation test; added parent task ID validation test; added duplicate ID detection tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

feat: implement conflict resolution protocol (#122) #166: Directly precedes this PR; introduces the initial conflict-resolution subsystem that this PR refines with hierarchy tiebreaking and validation fixes.
feat: implement meeting protocol system (#123) #167: Introduces the meeting protocol subsystem that this PR completes by implementing decision/action item parsing and fixing validation gaps.
fix: incorporate post-merge feedback + pre-PR review fixes #164: Modifies message bus receive/unsubscribe semantics that this PR extends with proper multi-waiter coordination.

Suggested reviewers

Copilot

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 59.13% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main objective of the PR — addressing post-merge review feedback from earlier PRs `#164-`#167.
Description check	✅ Passed	The description comprehensively explains the 46 findings addressed across severity levels with specific focus on 4 critical, 17 major, 15 minor, and 4 trivial issues plus test/doc gaps.
Linked Issues check	✅ Passed	All code changes directly address requirements from issue `#169`: critical parsing/validation fixes [C1-C4], major budget/registry/dependency/validation fixes [M1-M17], minor typing/logging/docs improvements [m1-m15], trivial constants/docstrings [t1-t4], and test coverage gaps [T1-T6].
Out of Scope Changes check	✅ Passed	All changes are in-scope per issue `#169` objectives and explicitly exclude out-of-scope items (human escalation queue, per-conflict overrides, protocol interface changes, TokenTracker mutability, etc.).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/post-merge-feedback-2

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch fix/post-merge-feedback-2

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-08T12:15:05Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the multi-agent system by incorporating extensive feedback from external reviews. The changes focus on improving the robustness and functionality of core communication mechanisms, particularly within meeting protocols and conflict resolution. Key updates include more reliable message bus operations, enhanced parsing of LLM outputs for structured decision-making, and stricter data integrity checks across various models. These improvements aim to make the system more stable, predictable, and easier to debug, while also clarifying design specifications and updating development status.

Highlights

External Review Feedback Addressed: Addressed 46 findings from external reviewers (Copilot, Greptile, CodeRabbit, Gemini) across PRs fix: incorporate post-merge feedback + pre-PR review fixes #164-feat: implement meeting protocol system (#123) #167, covering critical, major, minor, and test/documentation gaps.
Meeting Protocol Enhancements: Implemented shared LLM response parsing for decisions and action items in all three meeting protocols, improving structured output extraction.
Communication Bus Robustness: Enhanced the in-memory message bus to correctly wake all concurrent receive() calls when a subscription is cancelled, preventing potential deadlocks.
Conflict Resolution Logic Improvements: Refined conflict resolution strategies, including adding hierarchy-based tie-breaking for agents with equal seniority and validating winning agent IDs in dissent record generation.
Validation and Data Integrity: Introduced new validations for meeting participants (preventing duplicates), token aggregates in meeting minutes, and consistency between decomposition plans and created tasks.
Codebase Clarity and Maintainability: Removed max_tokens_per_argument from conflict resolution configs, updated documentation, and improved logging for various components.

Changelog

DESIGN_SPEC.md
- Removed max_tokens_per_argument from debate and hybrid conflict resolution configurations.
- Updated the 'Current state' description for meeting protocols to reflect the addition of shared LLM response parsing.
- Updated the description of _helpers.py in the conflict resolution section to include build_dissent_records.
README.md
- Updated the status of M3 Single Agent from 'all done' to 'in progress'.
src/ai_company/communication/bus_memory.py
- Removed the COMM_UNSUBSCRIBE_SENTINEL_FAILED event constant.
- Added a _waiters dictionary to track pending receive() calls.
- Modified the unsubscribe method to wake all concurrent receive() calls by putting a sentinel for each pending waiter.
- Updated the docstring for the receive method to reflect the new behavior regarding unsubscribe.
- Refactored _log_receive_null to be an asynchronous method and safely inspect bus state using a lock.
src/ai_company/communication/conflict_resolution/_helpers.py
- Imported HierarchyResolver for use in conflict resolution.
- Updated the docstring for find_losers to clarify its behavior for N-party conflicts.
- Added validation to find_losers to ensure the winning_agent_id exists in the conflict positions, raising ConflictStrategyError if not found.
- Modified pick_highest_seniority to accept an optional hierarchy for tie-breaking when seniority levels are equal.
- Implemented _hierarchy_tiebreak to resolve seniority ties based on an agent's depth in the organizational hierarchy.
src/ai_company/communication/conflict_resolution/config.py
- Removed the max_tokens_per_argument field from DebateConfig.
- Removed the max_tokens_per_argument field from HybridConfig.
src/ai_company/communication/conflict_resolution/debate_strategy.py
- Modified the _authority_fallback method to use the hierarchy for tie-breaking when seniority levels are equal.
src/ai_company/communication/conflict_resolution/hybrid_strategy.py
- Imported HierarchyResolver for use in the hybrid strategy.
- Added hierarchy as a required parameter to the HybridResolver constructor.
- Modified the _authority_fallback method to use the hierarchy for tie-breaking when seniority levels are equal.
src/ai_company/communication/conflict_resolution/service.py
- Moved the _MIN_POSITIONS constant from models.py to service.py.
src/ai_company/communication/delegation/hierarchy.py
- Added a check in get_lowest_common_manager to return the agent itself if both input agents are the same.
src/ai_company/communication/meeting/_parsing.py
- Added a new file containing shared helper functions for parsing decisions and action items from LLM-generated text, including regex patterns for headers and list items.
src/ai_company/communication/meeting/_prompts.py
- Modified build_agenda_prompt to include the presenter_id in agenda item entries if available.
src/ai_company/communication/meeting/_token_tracker.py
- Imported get_logger and MEETING_BUDGET_EXHAUSTED for logging.
- Added a warning log when input_tokens or output_tokens are negative in the record method.
- Added a warning log when the token budget is exceeded after recording token usage.
src/ai_company/communication/meeting/config.py
- Clarified the docstring for max_discussion_tokens in StructuredPhasesConfig.
src/ai_company/communication/meeting/models.py
- Added a _validate_token_aggregates model validator to MeetingMinutes to ensure that total_input_tokens and total_output_tokens match the sum of contributions.
- Changed the type of error_message in MeetingRecord to NotBlankStr for stricter validation.
src/ai_company/communication/meeting/orchestrator.py
- Wrapped the protocol_registry in a MappingProxyType and deep-copied it to ensure immutability.
- Updated docstrings for MeetingParticipantError to explicitly mention duplicate participants.
- Added validation to _validate_inputs to check for duplicate participant IDs, raising MeetingParticipantError if found.
src/ai_company/communication/meeting/position_papers.py
- Imported parse_action_items and parse_decisions from _parsing.py.
- Used the new parsing functions to extract decisions and action items from the synthesis text and include them in MeetingMinutes.
- Replaced assert statements with if not ... raise RuntimeError for robustness in _collect_paper.
src/ai_company/communication/meeting/round_robin.py
- Imported parse_action_items and parse_decisions from _parsing.py.
- Conditionally reserved summary tokens based on whether the leader summarizes.
- Used the new parsing functions to extract decisions and action items from the summary text and include them in MeetingMinutes.
src/ai_company/communication/meeting/structured_phases.py
- Imported parse_action_items and parse_decisions from _parsing.py.
- Updated the comment for _SYNTHESIS_RESERVE_FRACTION to clarify it applies to the remaining budget.
- Used the new parsing functions to extract decisions and action items from the summary text and include them in MeetingMinutes.
- Removed a redundant MEETING_CONFLICT_DETECTED debug log.
- Adjusted the calculation of discussion_budget to reserve tokens for the synthesis phase.
- Added discussion_used tracking to ensure discussion does not exceed its allocated budget.
- Replaced assert statements with if not ... raise RuntimeError for robustness in _collect_input.
src/ai_company/communication/messenger.py
- Imported NotBlankStr for type hinting.
- Updated type hints for channel_name parameters in subscribe, unsubscribe, and receive methods to NotBlankStr.
- Updated the docstring for the receive method to provide more detailed conditions for returning None.
src/ai_company/core/enums.py
- Added validation logic to ensure _SENIORITY_ORDER is in sync with SeniorityLevel enum members and contains no duplicates.
- Introduced _SENIORITY_RANK as a precomputed dictionary for O(1) seniority comparison, replacing list.index() calls.
src/ai_company/core/task.py
- Added task_structure and coordination_topology fields to the Task model.
src/ai_company/engine/decomposition/models.py
- Added validation to _validate_plan_task_consistency to ensure that the IDs of created tasks exactly match the IDs in the decomposition plan.
- Modified derived_parent_status to return TaskStatus.CANCELLED when a mix of completed and cancelled subtasks are present, indicating partial abandonment.
src/ai_company/engine/decomposition/rollup.py
- Reordered imports for better organization.
- Added an explicit return of an empty SubtaskStatusRollup when no subtask statuses are provided.
src/ai_company/engine/decomposition/service.py
- Ensured that dependencies from subtask definitions are propagated to the newly created Task objects during decomposition.
src/ai_company/engine/parallel.py
- Adjusted the formatting of a ParallelExecutionError instantiation for consistency.
src/ai_company/engine/routing/models.py
- Imported Counter, get_logger, and TASK_ROUTING_FAILED for enhanced validation and logging.
- Added logging and validation to _validate_selected_not_in_alternatives to prevent selected candidates from appearing in alternatives.
- Enhanced _validate_unique_subtask_ids to check for duplicate subtask IDs within decisions and unroutable lists, as well as overlaps between them, with corresponding logging.
- Added logging to _validate_no_auto_defaults when CoordinationTopology.AUTO is used inappropriately.
src/ai_company/engine/routing/scorer.py
- Imported TASK_ROUTING_SCORER_INVALID_CONFIG.
- Updated the log event name for invalid min_score configuration.
src/ai_company/engine/routing/service.py
- Added validation to the route method to ensure that the parent_task.id matches the plan.parent_task_id, raising a ValueError if they do not.
src/ai_company/observability/events/communication.py
- Removed the COMM_UNSUBSCRIBE_SENTINEL_FAILED constant.
src/ai_company/observability/events/task_routing.py
- Added TASK_ROUTING_SCORER_INVALID_CONFIG event constant.
tests/integration/communication/test_meeting_integration.py
- Added a pytestmark to set a timeout for integration tests.
tests/unit/communication/conflict_resolution/test_authority_strategy.py
- Added new test cases for AuthorityResolver with three participants, verifying highest seniority wins and correct dissent record generation.
tests/unit/communication/conflict_resolution/test_config.py
- Removed tests related to max_tokens_per_argument in DebateConfig and HybridConfig as the field was removed.
tests/unit/communication/conflict_resolution/test_debate_strategy.py
- Updated the description for the test_three_party_shared_manager_judge test case for clarity.
tests/unit/communication/conflict_resolution/test_helpers.py
- Imported HierarchyResolver for testing.
- Added new test cases for pick_highest_seniority to verify hierarchy-based tie-breaking.
- Added a test case for find_losers to ensure it raises an error when the winning agent is not found in positions.
tests/unit/communication/conflict_resolution/test_hybrid_strategy.py
- Imported HierarchyResolver for testing.
- Updated HybridResolver instantiation in various test cases to include the hierarchy parameter.
tests/unit/communication/delegation/test_hierarchy.py
- Added a test case for get_lowest_common_manager to verify it returns the agent itself when both arguments are the same.
tests/unit/communication/meeting/test_config.py
- Added a pytestmark to set a timeout for unit tests.
tests/unit/communication/meeting/test_enums.py
- Added a pytestmark to set a timeout for unit tests.
tests/unit/communication/meeting/test_errors.py
- Added a pytestmark to set a timeout for unit tests.
tests/unit/communication/meeting/test_models.py
- Added a pytestmark to set a timeout for unit tests.
- Updated _make_minutes calls in tests to include total_input_tokens and total_output_tokens.
- Added a test case to ensure blank error messages are rejected by MeetingRecord.
- Added new test cases for MeetingMinutes to verify that mismatched token aggregates raise validation errors.
tests/unit/communication/meeting/test_orchestrator.py
- Added a pytestmark to set a timeout for unit tests.
- Added a test case to ensure duplicate participants in a meeting raise a MeetingParticipantError.
tests/unit/communication/meeting/test_parsing.py
- Added a new test file with comprehensive unit tests for parse_decisions and parse_action_items functions.
tests/unit/communication/meeting/test_position_papers.py
- Added a pytestmark to set a timeout for unit tests.
tests/unit/communication/meeting/test_prompts.py
- Added a pytestmark to set a timeout for unit tests.
tests/unit/communication/meeting/test_protocol.py
- Added a pytestmark to set a timeout for unit tests.
tests/unit/communication/meeting/test_round_robin.py
- Added a pytestmark to set a timeout for unit tests.
tests/unit/communication/meeting/test_structured_phases.py
- Added a pytestmark to set a timeout for unit tests.
tests/unit/communication/meeting/test_token_tracker.py
- Added a pytestmark to set a timeout for unit tests.
tests/unit/communication/test_bus_memory.py
- Added a test case to verify that unsubscribe correctly wakes multiple blocked receive() calls.
tests/unit/communication/test_enums.py
- Added MEETING_CONTRIBUTION to the MessageType enum values test.
tests/unit/engine/test_decomposition_models.py
- Added a test case to ensure that DecompositionResult rejects a mismatch between created task IDs and plan subtask IDs.
- Updated the test_completed_plus_cancelled_mix to assert TaskStatus.CANCELLED instead of COMPLETED for mixed terminal states.
tests/unit/engine/test_decomposition_service.py
- Added a test case to verify that subtask dependencies are correctly propagated to created Task objects.
tests/unit/engine/test_routing_models.py
- Added test cases to ensure RoutingResult rejects duplicate subtask IDs within decisions and unroutable lists.
tests/unit/engine/test_routing_service.py
- Added a test case to ensure a ValueError is raised when parent_task.id does not match plan.parent_task_id during routing.

Activity

The pull request addresses 46 findings identified by external reviewers (Copilot, Greptile, CodeRabbit, Gemini) from previous PRs fix: incorporate post-merge feedback + pre-PR review fixes #164-feat: implement meeting protocol system (#123) #167.
Critical findings included parsing decisions/action items from LLM synthesis in meeting protocols and validating winning_agent_id in find_losers.
Major findings covered token budget guards, duplicate participant rejection, frozen registry, hierarchy tiebreakers, dependency copying, and waking all waiters on unsubscribe.
Minor findings involved duplicate log removal, assert to raise conversions, traceback preservation, O(1) seniority lookups, NotBlankStr typing, and routing validation ordering.
Trivial findings included centralized event constants and docstring/spec fixes.
Test and documentation gaps were addressed with a new test_parsing.py file, expanded tests across seven modules, timeout markers, and spec name corrections.
Pre-PR review fixes included list-item regex handling, parent_task_id validation ordering, circular exception cause, and dead code removal.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull request overview

Addresses post-merge review feedback from PRs #164–#167 by tightening validation, improving meeting protocol outputs (decisions/action items parsing), and expanding test coverage to prevent regressions across communication + engine subsystems.

Changes:

Add meeting summary/synthesis parsing helpers and populate decisions / action_items in all meeting protocols.
Harden routing/decomposition/conflict-resolution invariants (ID validation, hierarchy tiebreakers, dependency propagation, multi-waiter unsubscribe wakeups).
Expand/adjust unit + integration tests (including consistent pytest.mark.timeout(30) in meeting tests).

Reviewed changes

Copilot reviewed 55 out of 55 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/ai_company/communication/meeting/_parsing.py`	New parsing utilities for decisions/action items from LLM text
`src/ai_company/communication/meeting/{round_robin,position_papers,structured_phases}.py`	Populate minutes with parsed decisions/action items; adjust budget handling/invariants
`src/ai_company/communication/meeting/_token_tracker.py`	Add logging when token usage exceeds budget / invalid counts
`src/ai_company/communication/meeting/orchestrator.py`	Freeze protocol registry and reject duplicate participants
`src/ai_company/communication/bus_memory.py`	Wake all pending receivers on unsubscribe; make receive-null logging non-racy
`src/ai_company/engine/{routing,decomposition}/...`	Add/adjust validation + propagation (parent_task_id, dependencies, rollups) and event constant usage
`src/ai_company/core/enums.py`	Guard seniority ordering and make comparisons O(1)
`tests/**`	Add coverage for the above behaviors + timeout markers
`README.md`, `DESIGN_SPEC.md`	Spec/doc alignment with implemented behavior and naming

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-08T12:17:59Z

src/ai_company/communication/meeting/_parsing.py

+from ai_company.observability import get_logger
+
+logger = get_logger(__name__)


get_logger() result is assigned to logger but never used in this module, which should trigger Ruff's unused-variable check. Either remove the observability import/logger, or actually use it (e.g., to log when a section header is found/missing).

Copilot · 2026-03-08T12:18:00Z

src/ai_company/communication/meeting/orchestrator.py

-        self._protocol_registry = protocol_registry
+        self._protocol_registry: MappingProxyType[
+            MeetingProtocolType, MeetingProtocol
+        ] = MappingProxyType(copy.deepcopy(dict(protocol_registry)))


Using copy.deepcopy() on the protocol registry will also deep-copy protocol implementation instances. That’s risky/unnecessary (custom protocol objects may not be deepcopy-safe and it can duplicate internal state/resources). Prefer making the mapping immutable via MappingProxyType(dict(protocol_registry)) (shallow copy) and, if needed, validate the registry contents rather than deep-copying arbitrary objects.

Suggested change

] = MappingProxyType(copy.deepcopy(dict(protocol_registry)))

] = MappingProxyType(dict(protocol_registry))

Copilot · 2026-03-08T12:18:00Z

src/ai_company/communication/bus_memory.py

+            # points.  The asymmetry with the lock-guarded increment
+            # is intentional — the decrement must happen after
+            # _await_with_shutdown completes.
+            self._waiters[key] = max(0, self._waiters.get(key, 0) - 1)


receive() decrements _waiters[key] in the finally block even if unsubscribe() has already pop()'d that key. This re-inserts the key with a 0 count, causing _waiters to accumulate stale entries over time (and can skew future sentinel counts if code changes). Update the decrement logic to avoid re-adding missing keys and/or delete the key when the count reaches 0 (ideally under the lock).

Suggested change

self._waiters[key] = max(0, self._waiters.get(key, 0) - 1)

current = self._waiters.get(key)

if current is None:

# Key was removed (for example, by unsubscribe); do not recreate it.

pass

elif current <= 1:

# Last waiter for this key; remove entry entirely.

self._waiters.pop(key, None)

else:

self._waiters[key] = current - 1

Copilot · 2026-03-08T12:18:00Z

src/ai_company/communication/meeting/structured_phases.py

+        if not all(r is not None for r in result_inputs):
+            msg = f"Expected {num_participants} inputs but some slots are None"
+            logger.error(msg, meeting_id=meeting_id)
+            raise RuntimeError(msg)
+        if not all(c is not None for c in result_contributions):
+            msg = f"Expected {num_participants} contributions but some slots are None"
+            logger.error(msg, meeting_id=meeting_id)
+            raise RuntimeError(msg)


These invariant failures are logged with logger.error(msg, ...), which makes the event name dynamic and bypasses the centralized observability.events.meeting constants pattern used elsewhere in this module. Log with a stable meeting event constant (e.g., MEETING_FAILED/MEETING_VALIDATION_FAILED) and include msg in a structured field (like error=msg).

Copilot · 2026-03-08T12:18:00Z

src/ai_company/communication/meeting/position_papers.py

+        if not all(r is not None for r in results):
+            msg = f"Expected {n} position papers but some slots are None"
+            logger.error(msg, meeting_id=meeting_id)
+            raise RuntimeError(msg)
+        if not all(c is not None for c in contrib_results):
+            msg = f"Expected {n} contributions but some slots are None"
+            logger.error(msg, meeting_id=meeting_id)
+            raise RuntimeError(msg)


These invariant failures are logged with logger.error(msg, ...), which makes the event name dynamic and bypasses the centralized observability.events.meeting constants pattern used elsewhere in this module. Log with a stable meeting event constant (e.g., MEETING_FAILED/MEETING_VALIDATION_FAILED) and include msg in a structured field (like error=msg).

Copilot · 2026-03-08T12:18:00Z

src/ai_company/communication/meeting/_token_tracker.py

        if input_tokens < 0 or output_tokens < 0:
            msg = (
                f"Token counts must be non-negative, got "
-                f"input_tokens={input_tokens}, output_tokens={output_tokens}"
+                f"input_tokens={input_tokens}, "
+                f"output_tokens={output_tokens}"
+            )
+            logger.warning(
+                MEETING_BUDGET_EXHAUSTED,
+                error=msg,
+                input_tokens=input_tokens,
+                output_tokens=output_tokens,
            )
            raise ValueError(msg)


TokenTracker.record() logs MEETING_BUDGET_EXHAUSTED even when the problem is invalid input (negative token counts). That event name implies a normal budget exhaustion scenario and can confuse monitoring/alerts. Consider logging MEETING_VALIDATION_FAILED (or a dedicated token-tracking/invalid-usage event) for negative counts, while keeping MEETING_BUDGET_EXHAUSTED for actual over-budget conditions.

gemini-code-assist

Code Review

This pull request introduces automated extraction of decisions and action items from LLM-generated meeting summaries, while also addressing numerous findings and significantly improving the codebase through stricter data validation, robust concurrency handling in the message bus, and new features like hierarchy-based tie-breaking. However, a critical security vulnerability exists due to insufficient safeguards against Indirect Prompt Injection, where raw agent responses in prompts could allow a malicious agent to manipulate the LLM's output to create unauthorized tasks. Additionally, there is a high-severity concern that the new LLM response parser may not correctly handle multi-line list items, potentially leading to truncated data.

gemini-code-assist · 2026-03-08T12:19:33Z

src/ai_company/communication/meeting/_parsing.py

+def parse_action_items(
+    summary_text: str,
+) -> tuple[ActionItem, ...]:
+    """Parse action items from an LLM summary/synthesis response.
+
+    Looks for an "Action Items" section header, then extracts
+    bulleted or numbered list items. Attempts to detect assignee
+    information within each item.
+
+    Args:
+        summary_text: The full summary/synthesis text from the LLM.
+
+    Returns:
+        Tuple of ActionItem instances (may be empty).
+    """
+    section = _extract_section(summary_text, _ACTION_ITEMS_HEADER_RE)
+    if not section:
+        return ()
+
+    items: list[ActionItem] = []
+    for match in _LIST_ITEM_RE.finditer(section):
+        raw_text = match.group(1).strip()
+        if not raw_text:
+            continue
+
+        description, assignee_id = _parse_assignee(raw_text)
+        if not description:
+            continue
+
+        items.append(
+            ActionItem(
+                description=description,
+                assignee_id=assignee_id,
+            )
+        )
+
+    return tuple(items)


The parse_action_items function extracts assignee_id from free-form LLM text without any validation against the meeting's participant list. This allows an attacker to use prompt injection to assign tasks to arbitrary agents who may not even be part of the meeting, potentially bypassing intended workflow boundaries.

gemini-code-assist · 2026-03-08T12:19:33Z

src/ai_company/communication/meeting/position_papers.py

+        synthesis_text = synthesis_contribution.content
+        decisions = parse_decisions(synthesis_text)
+        action_items = parse_action_items(synthesis_text)


Action items are parsed directly from LLM synthesis output. Since the synthesis prompt (built in _build_synthesis_prompt) concatenates raw responses from other agents, it is vulnerable to indirect prompt injection. A malicious agent can inject instructions to cause the synthesizer to output unauthorized action items.

Recommendation: Validate that the assignee_id in each action item is either the meeting leader or one of the participants.

synthesis_text = synthesis_contribution.content decisions = parse_decisions(synthesis_text) raw_action_items = parse_action_items(synthesis_text) # Validate assignees are participants or the leader allowed_assignees = set(participant_ids) | {leader_id} action_items = [ item for item in raw_action_items if item.assignee_id is None or item.assignee_id in allowed_assignees ]

gemini-code-assist · 2026-03-08T12:19:34Z

src/ai_company/communication/meeting/round_robin.py

            all_contributions = (*contributions, summary_contribution)
+            decisions = parse_decisions(summary)
+            action_items = parse_action_items(summary)


The round-robin protocol is vulnerable to indirect prompt injection because the transcript (containing raw agent responses) is included in the summary prompt. This allows an agent to manipulate the final list of action items.

Recommendation: Validate that the assignee_id in each action item is either the meeting leader or one of the participants.

all_contributions = (*contributions, summary_contribution) decisions = parse_decisions(summary) raw_action_items = parse_action_items(summary) # Validate assignees are participants or the leader allowed_assignees = set(participant_ids) | {leader_id} action_items = [ item for item in raw_action_items if item.assignee_id is None or item.assignee_id in allowed_assignees ]

gemini-code-assist · 2026-03-08T12:19:34Z

src/ai_company/communication/meeting/structured_phases.py

+        decisions = parse_decisions(summary)
+        action_items = parse_action_items(summary)


The structured phases protocol is vulnerable to indirect prompt injection because participant inputs are concatenated into the synthesis prompt. This allows a participant to inject instructions that manipulate the resulting action items.

Recommendation: Validate that the assignee_id in each action item is either the meeting leader or one of the participants.

decisions = parse_decisions(summary) raw_action_items = parse_action_items(summary) # Validate assignees are participants or the leader allowed_assignees = set(participant_ids) | {leader_id} action_items = [ item for item in raw_action_items if item.assignee_id is None or item.assignee_id in allowed_assignees ]

gemini-code-assist · 2026-03-08T12:19:34Z

src/ai_company/communication/meeting/_parsing.py

+_LIST_ITEM_RE = re.compile(
+    r"^[^\S\n]*(?:\d+[\.\)][^\S\n]*|-[^\S\n]*|\*[^\S\n]*|\u2022[^\S\n]*)(.+)",
+    re.MULTILINE,
+)


The _LIST_ITEM_RE regex only captures single-line list items. If a decision or action item from the LLM spans multiple lines, only the first line will be captured. This could lead to truncated and incomplete items, which could be critical for decisions and action items.

For example, with this input:

# Decisions 1. This is a decision that spans multiple lines. 2. This is a single line decision.

The parser would extract "This is a decision" and "This is a single line decision.", losing the second line of the first decision.

Consider updating the regex to handle multi-line list items, for example by using a non-greedy match with the re.DOTALL flag that continues until the next list item marker or the end of the section.

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

src/ai_company/engine/decomposition/rollup.py (1)
44-59: ⚠️ Potential issue | 🟡 Minor

Keep DECOMPOSITION_ROLLUP_COMPUTED consistent on the empty-input branch.

This branch returns a valid zeroed rollup, but it logs the same event at WARNING and without derived_status, unlike the normal path. That makes empty inputs look like operational faults and gives the event two different payload shapes.
Proposed adjustment
         if total == 0:
-            logger.warning(
-                DECOMPOSITION_ROLLUP_COMPUTED,
-                parent_task_id=parent_task_id,
-                total=0,
-                reason="rollup computed with no subtask statuses",
-            )
-            return SubtaskStatusRollup(
+            rollup = SubtaskStatusRollup(
                 parent_task_id=parent_task_id,
                 total=0,
                 completed=0,
                 failed=0,
                 in_progress=0,
                 blocked=0,
                 cancelled=0,
             )
+            logger.debug(
+                DECOMPOSITION_ROLLUP_COMPUTED,
+                parent_task_id=parent_task_id,
+                total=0,
+                derived_status=rollup.derived_parent_status.value,
+                reason="rollup computed with no subtask statuses",
+            )
+            return rollup
As per coding guidelines "Log at WARNING or ERROR with context for all error paths before raising exceptions" and "Log at DEBUG for object creation, internal flow, and entry/exit of key functions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/decomposition/rollup.py` around lines 44 - 59, The
empty-input branch logs DECOMPOSITION_ROLLUP_COMPUTED at WARNING without the
derived_status and thus differs from the normal path and misreports an
operational fault; update the total == 0 branch to log the same event shape as
the normal path (include derived_status="zeroed" or the same field name used
elsewhere) and use the same log level as non-error creation (change to DEBUG if
normal path logs creation at DEBUG) before returning the zeroed
SubtaskStatusRollup(parent_task_id=..., total=0, completed=0, failed=0,
in_progress=0, blocked=0, cancelled=0) so the event payload and severity remain
consistent with SubtaskStatusRollup creation.
src/ai_company/communication/conflict_resolution/debate_strategy.py (1)
255-262: ⚠️ Potential issue | 🟡 Minor

Keep the fallback reasoning aligned with the actual tiebreaker.

Lines 255-262 can now choose a winner via hierarchy when seniority ties, but the returned reasoning still says the winner "has highest seniority". That makes the audit trail false for equal-level conflicts. Please distinguish pure seniority wins from hierarchy tiebreak wins in the JudgeDecision.reasoning.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/conflict_resolution/debate_strategy.py` around
lines 255 - 262, The reasoning message claims "highest seniority" even when
pick_highest_seniority resolved an equal-seniority tie via hierarchy; update the
JudgeDecision.reasoning to reflect which tiebreaker was used. After calling
pick_highest_seniority(conflict, hierarchy=self._hierarchy), check whether the
win was resolved from multiple agents with equal seniority (e.g., detect if
conflict has multiple agents at best.agent_level or if pick_highest_seniority
can return/indicate a tie-break flag); if it was a hierarchy tiebreak, set
reasoning to something like "Debate fallback: hierarchy tiebreak among
equal-seniority agents — {best.agent_id} ({best.agent_level}) selected",
otherwise keep "authority-based judging — {best.agent_id} ({best.agent_level})
has highest seniority". Ensure this logic is implemented where JudgeDecision is
constructed so the audit trail accurately distinguishes pure seniority wins from
hierarchy tiebreak wins.
src/ai_company/communication/conflict_resolution/_helpers.py (1)
21-79: 🛠️ Refactor suggestion | 🟠 Major

Break find_losers() into validation and extraction helpers.

The new winner-integrity checks pushed this helper past the repo's 50-line ceiling, and it now mixes validation, logging, and the happy-path loser selection. Pulling winner validation into a small helper would keep the unhappy paths isolated and the core computation trivial.

As per coding guidelines, 'Keep functions under 50 lines and files under 800 lines'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/conflict_resolution/_helpers.py` around lines 21
- 79, Split validation from extraction: add a small helper (e.g.
ensure_winner_in_conflict or validate_winner_present(conflict, winner_id)) that
performs the winner existence check, logs the CONFLICT_STRATEGY_ERROR and raises
ConflictStrategyError with the same context when missing; then simplify
find_losers to call that helper and only perform the tuple comprehension (losers
= tuple(pos for pos in conflict.positions if pos.agent_id != winner_id)), keep
the "no losers" warning/raise in find_losers as the only remaining unhappy-path
logic; update imports/refs accordingly.
src/ai_company/communication/meeting/orchestrator.py (1)
364-431: 🛠️ Refactor suggestion | 🟠 Major

Split _validate_inputs() into smaller validators.

This method now bundles token-budget validation, empty-participant checks, duplicate detection, and leader-membership checks into one 68-line branchy helper. Extracting the participant-specific checks and shared log-and-raise scaffolding will make future validation changes safer and bring it back under the repo limit.

As per coding guidelines, 'Keep functions under 50 lines and files under 800 lines'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/orchestrator.py` around lines 364 - 431,
The _validate_inputs method is too large and mixes token_budget checks with
several participant checks; split it into smaller focused validators: create a
_validate_token_budget(meeting_id, token_budget) that performs the positive
check and logs/raises ValueError, and create a
_validate_participants(meeting_id, leader_id, participant_ids) that contains the
empty-participants, duplicate detection (use Counter), and
leader-in-participants checks and raises MeetingParticipantError with the same
context payloads; factor the common logging-and-raise pattern into a helper
(e.g., _log_and_raise or _log_participant_error) used by both validators, then
have _validate_inputs call these two new helpers to preserve behavior and
messages (keep names _validate_inputs, _validate_token_budget,
_validate_participants, and the logging helper to locate changes).
src/ai_company/communication/conflict_resolution/hybrid_strategy.py (1)
228-240: ⚠️ Potential issue | 🟠 Major

Preserve authority-strategy validation in the fallback path.

pick_highest_seniority() only compares seniority plus raw ancestor counts. That means equal-seniority conflicts the authority strategy would reject now silently resolve here — e.g. an agent missing from the hierarchy looks like depth 0 because HierarchyResolver.get_ancestors() returns (), and peers with no common manager collapse to an order-dependent winner. Since this branch is the hybrid's authority fallback, it should reuse the same hierarchy validation/tiebreak semantics as AuthorityResolver before building the hybrid resolution.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/conflict_resolution/hybrid_strategy.py` around
lines 228 - 240, The hybrid fallback currently calls
pick_highest_seniority(conflict, hierarchy=self._hierarchy) without reusing the
AuthorityResolver's validation/tiebreak semantics; update the hybrid fallback to
first run the same hierarchy validation used by AuthorityResolver (e.g., invoke
the AuthorityResolver validation method or replicate its checks: ensure both
agents exist in self._hierarchy, detect equal seniority ties and
missing-ancestor cases via HierarchyResolver.get_ancestors() semantics) and only
then call pick_highest_seniority to build the ConflictResolution; if the
AuthorityResolver would have rejected/abstained (tie or missing agent), the
hybrid must not silently pick a winner but follow AuthorityResolver’s outcome
path (reject/abstain or escalate) before creating the RESOLVED_BY_HYBRID
ConflictResolution.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 29: The sentence currently contradicts itself by saying "M4: Multi-Agent"
is in progress while also claiming "M3 Single Agent" is in progress; update the
README sentence to make milestone statuses consistent — e.g., mark "M3: Single
Agent" as complete if M4 is in progress, or mark "M4: Multi-Agent" as
planned/not started if M3 is still in progress, and rephrase the line that
mentions "M4: Multi-Agent" and "M3 Single Agent" so it unambiguously lists each
milestone with its correct status (using the exact labels "M4: Multi-Agent" and
"M3: Single Agent").

In `@src/ai_company/communication/bus_memory.py`:
- Around line 462-472: The decrement of self._waiters[key] after await in the
block with _await_with_shutdown can re-insert a key with value 0 even if
unsubscribe() removed it; to fix, change the post-await decrement to only modify
the dict if the key still exists and its current value is >0 (or decrement and
then remove the key when the resulting value is 0) so you never leave
zero-valued orphan entries in self._waiters — update the code around key,
self._waiters, and the finally block that runs after _await_with_shutdown() to
check existence and remove zero entries atomically.

In `@src/ai_company/communication/delegation/hierarchy.py`:
- Around line 231-232: get_lowest_common_manager currently returns agent_a when
agent_a == agent_b without verifying the agent exists; change this fast-path in
get_lowest_common_manager to first check membership in the hierarchy's
known-agent set (or build/maintain such a set during Hierarchy construction),
and only return agent_a if it is present; otherwise return None (or the existing
"no manager" sentinel). Locate get_lowest_common_manager and the class that
builds the hierarchy (e.g., Hierarchy.__init__ or similar) to add/verify the
known-agent collection and use it in the equality fast-path.

In `@src/ai_company/communication/meeting/_parsing.py`:
- Around line 24-27: _ ANY_HEADER_RE currently treats any line that ends with a
colon as a header, which can prematurely split sections (matches things like
"Note:" inside bodies); update the _ANY_HEADER_RE usage so the colon-terminated
alternative only counts as a header when it is followed by a non-blank next line
(use a positive lookahead to assert the next line starts with a non-whitespace
character), keeping the existing markdown-hash header branch intact; modify the
regex assigned to _ANY_HEADER_RE accordingly and keep re.MULTILINE.

In `@src/ai_company/communication/meeting/_token_tracker.py`:
- Around line 77-82: The negative-token validation path is currently logging
MEETING_BUDGET_EXHAUSTED which conflates caller validation errors with true
budget overruns; update the logging there (the logger.warning call in
_token_tracker.py that currently passes MEETING_BUDGET_EXHAUSTED) to emit a
distinct validation event (e.g., MEETING_INVALID_TOKEN_COUNT or
MEETING_TOKEN_VALIDATION_ERROR) and keep the same context fields (error=msg,
input_tokens, output_tokens) so dashboards/alerts can distinguish invalid input
from genuine exhaustion; add the new constant and swap its use in the validation
branch where negative counts are detected.

In `@src/ai_company/communication/meeting/models.py`:
- Around line 237-262: The validator _validate_token_aggregates currently
returns early when contributions is empty, allowing non-zero
total_input_tokens/total_output_tokens; change it so that when
self.contributions is empty you assert both totals are zero (raise ValueError if
total_input_tokens != 0 or total_output_tokens != 0) otherwise proceed to sum
contributions as implemented; update the error messages to reference the field
names (total_input_tokens/total_output_tokens) and expected zero when raising in
the empty-contributions case.

In `@src/ai_company/communication/meeting/position_papers.py`:
- Around line 282-289: Replace the free-form logger.error calls in the two
invariant checks so they use stable event constants from
ai_company.observability.events (e.g., POSITION_PAPERS_MISSING and
CONTRIBUTIONS_MISSING) and structured kwargs rather than formatted strings: for
the results check (variable names results, n, meeting_id) call
logger.error(POSITION_PAPERS_MISSING, detail=msg, meeting_id=meeting_id) (after
importing the constant) and similarly for contrib_results use
logger.error(CONTRIBUTIONS_MISSING, detail=msg, meeting_id=meeting_id); keep
raising RuntimeError(msg) but ensure logging uses the event constant and
structured fields instead of logger.error(msg, ...).
- Around line 153-156: The synthesis output can be missing explicit "Decisions"
and "Action Items" headers so parse_decisions and parse_action_items receive
empty results; update _build_synthesis_prompt() to require the model to emit
clearly labeled, parser-friendly sections named exactly "Decisions:" and "Action
Items:" (or another agreed exact header text) and include examples/format
constraints (e.g., bullet list under each header) so
synthesis_contribution.content (synthesis_text) always contains those headers
for parse_decisions and parse_action_items to consume.

In `@src/ai_company/communication/meeting/round_robin.py`:
- Around line 165-166: The code calls parse_decisions(summary) and
parse_action_items(summary) but the prompt produced by _build_summary_prompt()
does not require "Decisions:" or "Action Items:" headers, so leaders can return
lists that the parsers miss; update _build_summary_prompt() to explicitly
require distinct "Decisions:" and "Action Items:" section headers (with
examples) and then add a small guard where decisions = parse_decisions(summary)
/ action_items = parse_action_items(summary) are invoked to validate the summary
contains those headers (e.g., check for the literal "Decisions:" and "Action
Items:") and if missing, either request the model to reformat or log/raise a
clear parsing error so empty results are not silently accepted.

In `@src/ai_company/engine/decomposition/models.py`:
- Around line 248-253: Update the public class docstring for SubtaskStatusRollup
to explicitly document the mixed terminal-state rule: when completed + cancelled
== total the rollup resolves to TaskStatus.CANCELLED (i.e., any mix of completed
and cancelled subtasks is considered CANCELLED), in addition to the existing
description that pure completed maps to COMPLETED, pure cancelled maps to
CANCELLED, and the remainder maps to IN_PROGRESS; reference the attributes
completed, cancelled, total and the TaskStatus.CANCELLED enum so callers can
rely on this contract.

In `@tests/unit/engine/test_decomposition_models.py`:
- Around line 230-249: The test test_task_id_mismatch_rejected currently only
asserts that a ValueError is raised for ID mismatches; update it to also assert
the exception message contains the specific missing and extra IDs so regressions
that drop diagnostics are caught. Modify the pytest.raises match to look for the
missing plan ID ("sub-2") and the extra created ID ("sub-99") (or add an
explicit str(e) assertion inside the context) when constructing
DecompositionResult for the given DecompositionPlan and created_tasks, ensuring
the validator's diagnostic strings mention both IDs.

---

Outside diff comments:
In `@src/ai_company/communication/conflict_resolution/_helpers.py`:
- Around line 21-79: Split validation from extraction: add a small helper (e.g.
ensure_winner_in_conflict or validate_winner_present(conflict, winner_id)) that
performs the winner existence check, logs the CONFLICT_STRATEGY_ERROR and raises
ConflictStrategyError with the same context when missing; then simplify
find_losers to call that helper and only perform the tuple comprehension (losers
= tuple(pos for pos in conflict.positions if pos.agent_id != winner_id)), keep
the "no losers" warning/raise in find_losers as the only remaining unhappy-path
logic; update imports/refs accordingly.

In `@src/ai_company/communication/conflict_resolution/debate_strategy.py`:
- Around line 255-262: The reasoning message claims "highest seniority" even
when pick_highest_seniority resolved an equal-seniority tie via hierarchy;
update the JudgeDecision.reasoning to reflect which tiebreaker was used. After
calling pick_highest_seniority(conflict, hierarchy=self._hierarchy), check
whether the win was resolved from multiple agents with equal seniority (e.g.,
detect if conflict has multiple agents at best.agent_level or if
pick_highest_seniority can return/indicate a tie-break flag); if it was a
hierarchy tiebreak, set reasoning to something like "Debate fallback: hierarchy
tiebreak among equal-seniority agents — {best.agent_id} ({best.agent_level})
selected", otherwise keep "authority-based judging — {best.agent_id}
({best.agent_level}) has highest seniority". Ensure this logic is implemented
where JudgeDecision is constructed so the audit trail accurately distinguishes
pure seniority wins from hierarchy tiebreak wins.

In `@src/ai_company/communication/conflict_resolution/hybrid_strategy.py`:
- Around line 228-240: The hybrid fallback currently calls
pick_highest_seniority(conflict, hierarchy=self._hierarchy) without reusing the
AuthorityResolver's validation/tiebreak semantics; update the hybrid fallback to
first run the same hierarchy validation used by AuthorityResolver (e.g., invoke
the AuthorityResolver validation method or replicate its checks: ensure both
agents exist in self._hierarchy, detect equal seniority ties and
missing-ancestor cases via HierarchyResolver.get_ancestors() semantics) and only
then call pick_highest_seniority to build the ConflictResolution; if the
AuthorityResolver would have rejected/abstained (tie or missing agent), the
hybrid must not silently pick a winner but follow AuthorityResolver’s outcome
path (reject/abstain or escalate) before creating the RESOLVED_BY_HYBRID
ConflictResolution.

In `@src/ai_company/communication/meeting/orchestrator.py`:
- Around line 364-431: The _validate_inputs method is too large and mixes
token_budget checks with several participant checks; split it into smaller
focused validators: create a _validate_token_budget(meeting_id, token_budget)
that performs the positive check and logs/raises ValueError, and create a
_validate_participants(meeting_id, leader_id, participant_ids) that contains the
empty-participants, duplicate detection (use Counter), and
leader-in-participants checks and raises MeetingParticipantError with the same
context payloads; factor the common logging-and-raise pattern into a helper
(e.g., _log_and_raise or _log_participant_error) used by both validators, then
have _validate_inputs call these two new helpers to preserve behavior and
messages (keep names _validate_inputs, _validate_token_budget,
_validate_participants, and the logging helper to locate changes).

In `@src/ai_company/engine/decomposition/rollup.py`:
- Around line 44-59: The empty-input branch logs DECOMPOSITION_ROLLUP_COMPUTED
at WARNING without the derived_status and thus differs from the normal path and
misreports an operational fault; update the total == 0 branch to log the same
event shape as the normal path (include derived_status="zeroed" or the same
field name used elsewhere) and use the same log level as non-error creation
(change to DEBUG if normal path logs creation at DEBUG) before returning the
zeroed SubtaskStatusRollup(parent_task_id=..., total=0, completed=0, failed=0,
in_progress=0, blocked=0, cancelled=0) so the event payload and severity remain
consistent with SubtaskStatusRollup creation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7ba8cc0a-97c6-4a72-be1a-9f6a3c561cfd

📥 Commits

Reviewing files that changed from the base of the PR and between ee7caca and cf27048.

📒 Files selected for processing (55)

DESIGN_SPEC.md
README.md
src/ai_company/communication/bus_memory.py
src/ai_company/communication/conflict_resolution/_helpers.py
src/ai_company/communication/conflict_resolution/config.py
src/ai_company/communication/conflict_resolution/debate_strategy.py
src/ai_company/communication/conflict_resolution/hybrid_strategy.py
src/ai_company/communication/conflict_resolution/service.py
src/ai_company/communication/delegation/hierarchy.py
src/ai_company/communication/meeting/_parsing.py
src/ai_company/communication/meeting/_prompts.py
src/ai_company/communication/meeting/_token_tracker.py
src/ai_company/communication/meeting/config.py
src/ai_company/communication/meeting/models.py
src/ai_company/communication/meeting/orchestrator.py
src/ai_company/communication/meeting/position_papers.py
src/ai_company/communication/meeting/round_robin.py
src/ai_company/communication/meeting/structured_phases.py
src/ai_company/communication/messenger.py
src/ai_company/core/enums.py
src/ai_company/core/task.py
src/ai_company/engine/decomposition/models.py
src/ai_company/engine/decomposition/rollup.py
src/ai_company/engine/decomposition/service.py
src/ai_company/engine/parallel.py
src/ai_company/engine/routing/models.py
src/ai_company/engine/routing/scorer.py
src/ai_company/engine/routing/service.py
src/ai_company/observability/events/communication.py
src/ai_company/observability/events/task_routing.py
tests/integration/communication/test_meeting_integration.py
tests/unit/communication/conflict_resolution/test_authority_strategy.py
tests/unit/communication/conflict_resolution/test_config.py
tests/unit/communication/conflict_resolution/test_debate_strategy.py
tests/unit/communication/conflict_resolution/test_helpers.py
tests/unit/communication/conflict_resolution/test_hybrid_strategy.py
tests/unit/communication/delegation/test_hierarchy.py
tests/unit/communication/meeting/test_config.py
tests/unit/communication/meeting/test_enums.py
tests/unit/communication/meeting/test_errors.py
tests/unit/communication/meeting/test_models.py
tests/unit/communication/meeting/test_orchestrator.py
tests/unit/communication/meeting/test_parsing.py
tests/unit/communication/meeting/test_position_papers.py
tests/unit/communication/meeting/test_prompts.py
tests/unit/communication/meeting/test_protocol.py
tests/unit/communication/meeting/test_round_robin.py
tests/unit/communication/meeting/test_structured_phases.py
tests/unit/communication/meeting/test_token_tracker.py
tests/unit/communication/test_bus_memory.py
tests/unit/communication/test_enums.py
tests/unit/engine/test_decomposition_models.py
tests/unit/engine/test_decomposition_service.py
tests/unit/engine/test_routing_models.py
tests/unit/engine/test_routing_service.py

💤 Files with no reviewable changes (2)

src/ai_company/communication/conflict_resolution/config.py
src/ai_company/observability/events/communication.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Agent
GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py