Skip to content

feat: implement meeting protocol system (#123)#167

Merged
Aureliolo merged 4 commits intomainfrom
feat/meeting-protocol
Mar 8, 2026
Merged

feat: implement meeting protocol system (#123)#167
Aureliolo merged 4 commits intomainfrom
feat/meeting-protocol

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Implement 3 meeting protocols (DESIGN_SPEC §5.7): round-robin, position papers, structured phases — all behind the MeetingProtocol protocol interface
  • MeetingOrchestrator lifecycle manager: validates inputs, selects protocol, executes meeting, creates tasks from action items, records audit trail
  • Domain models: MeetingAgenda, MeetingMinutes, MeetingContribution, ActionItem, MeetingRecord (all frozen Pydantic v2)
  • Shared utilities: TokenTracker (mutable budget tracking), build_agenda_prompt (DRY prompt builder)
  • 20 structured event constants in observability/events/meeting.py
  • Error hierarchy: MeetingErrorMeetingBudgetExhaustedError, MeetingProtocolNotFoundError, MeetingParticipantError, MeetingAgentError
  • Config: MeetingProtocolConfig with per-protocol settings, auto_create_tasks at top level
  • Parallel execution via asyncio.TaskGroup in position papers and structured phases protocols
  • Budget enforcement: pre-divided budgets for parallel agents, MeetingBudgetExhaustedError on exhaustion
  • Documentation: updated DESIGN_SPEC.md §5.4, §5.7, §15.3 and CLAUDE.md package structure

Closes #123

Pre-PR Review

Pre-reviewed by 9 agents, 38 findings addressed:

  • code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency

Key fixes from review:

  • Extracted shared _TokenTracker and _prompts modules (eliminated duplication across 3 protocols)
  • Fixed non-deterministic parallel ordering in position_papers/structured_phases
  • Added budget pre-division for parallel agents (race condition prevention)
  • Split large run() methods into focused helpers (<50 lines each)
  • Added validation logging, ExceptionGroup handling, task creator resilience
  • Fixed TaskCreator type alias (strPriority)
  • Added participant uniqueness + leader exclusion validators on MeetingMinutes
  • Strengthened test coverage: budget exhaustion, task creation, protocol interface

Test plan

  • All 3394 tests pass (pytest tests/ -n auto)
  • 96.22% coverage (80% minimum)
  • ruff lint clean
  • mypy strict clean (379 files)
  • Pre-commit hooks pass
  • CI pipeline green

🤖 Generated with Claude Code

Aureliolo and others added 2 commits March 8, 2026 09:26
Add pluggable meeting protocol system with three strategies:
- Round-Robin: sequential turns with full transcript context
- Position Papers: parallel position papers with synthesis
- Structured Phases: input gathering, conflict detection, discussion, synthesis

Includes MeetingOrchestrator lifecycle manager, frozen Pydantic models,
token budget enforcement, event constants, and 138 meeting-specific tests.
Pre-reviewed by 9 agents, 38 findings addressed:
- Extract shared _TokenTracker and _prompts modules (DRY)
- Move auto_create_tasks to top-level MeetingProtocolConfig
- Fix non-deterministic parallel ordering in position_papers/structured_phases
- Add budget pre-division for parallel agents
- Split large run() methods into focused helpers
- Add validation logging, ExceptionGroup handling, task creator resilience
- Fix TaskCreator type alias (str → Priority)
- Add participant uniqueness + leader exclusion validators on MeetingMinutes
- Use NotBlankStr for decisions tuple
- Strengthen test coverage: budget exhaustion, task creation, protocol interface
- Add meeting event constant tests
- Update DESIGN_SPEC.md §5.4, §5.7, §15.3 and CLAUDE.md package structure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 8, 2026 09:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 8, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 8, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1115c337-1bdd-45a3-a94c-2fd09e1d0823

📥 Commits

Reviewing files that changed from the base of the PR and between 3101925 and 2aa2a1f.

📒 Files selected for processing (9)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • src/ai_company/communication/__init__.py
  • src/ai_company/communication/config.py
  • src/ai_company/communication/enums.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/communication/meeting/position_papers.py
  • src/ai_company/communication/meeting/structured_phases.py
  • tests/unit/observability/test_events.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Full meeting system with three configurable protocols, end-to-end orchestration, minutes, and audit records
    • Per-meeting token budgeting and enforcement with visible token accounting
    • Optional automatic task creation from meeting action items (top-level config)
    • New message type for meeting contributions
  • Observability

    • New set of meeting event names for telemetry (lifecycle, phases, agent calls, tokens, tasks, conflicts)
  • Documentation

    • Project structure and logging guidance updated to include meeting protocols
  • Tests

    • Comprehensive unit and integration tests covering all meeting flows, budgets, and error paths

Walkthrough

Adds a new meeting subsystem: three pluggable meeting protocols (RoundRobin, PositionPapers, StructuredPhases), a MeetingOrchestrator with token-budget enforcement and task-creation hooks, domain models/errors/enums, token/prompt utilities, observability event constants, public re-exports, and extensive unit/integration tests. No existing public signatures were removed.

Changes

Cohort / File(s) Summary
Design & docs
CLAUDE.md, DESIGN_SPEC.md
Documented meeting protocol subsystem, public API surface, telemetry events, and global auto_create_tasks config.
Public exports
src/ai_company/communication/__init__.py, src/ai_company/communication/meeting/__init__.py
Re-exports meeting subsystem types (configs, enums, models, protocols, orchestrator, task creator, detectors); expands package public surface.
Configuration
src/ai_company/communication/config.py, src/ai_company/communication/meeting/config.py
Adds protocol_config to MeetingTypeConfig and new immutable Pydantic configs: RoundRobinConfig, PositionPapersConfig, StructuredPhasesConfig, MeetingProtocolConfig (includes auto_create_tasks).
Core enums & message type
src/ai_company/communication/enums.py, src/ai_company/communication/meeting/enums.py
Adds MEETING_CONTRIBUTION to MessageType; introduces MeetingProtocolType, MeetingPhase, MeetingStatus enums.
Protocol interfaces & types
src/ai_company/communication/meeting/protocol.py
Defines AgentCaller/TaskCreator type aliases and runtime-checkable MeetingProtocol and ConflictDetector Protocols.
Models & errors
src/ai_company/communication/meeting/models.py, src/ai_company/communication/meeting/errors.py
Adds domain models (AgentResponse, MeetingAgenda*, MeetingContribution, ActionItem, MeetingMinutes, MeetingRecord) with validators and meeting-specific error hierarchy.
Orchestrator
src/ai_company/communication/meeting/orchestrator.py
Implements MeetingOrchestrator to validate input, resolve protocol, run meetings, persist in-memory MeetingRecords, handle budget/failures, emit events, and optionally create tasks.
Protocol implementations
src/ai_company/communication/meeting/round_robin.py, .../position_papers.py, .../structured_phases.py
Implements three meeting protocols with token budgeting, deterministic ordering, conflict-detection hook, phase orchestration, observability events, and MeetingMinutes generation.
Utilities
src/ai_company/communication/meeting/_prompts.py, src/ai_company/communication/meeting/_token_tracker.py
Adds build_agenda_prompt and TokenTracker (budget, record, used/remaining/is_exhausted) with validation and concurrency notes.
Observability
src/ai_company/observability/events/meeting.py
Adds MEETING_* event constants for lifecycle, phases, agent interactions, contributions, conflicts, synthesis, tasks, validation, and token recording.
Tests — unit & integration
tests/unit/communication/meeting/..., tests/integration/communication/test_meeting_integration.py
Adds comprehensive unit and integration tests covering configs, enums, models, errors, protocols, orchestrator, token tracker, prompts, conflict behavior, budget exhaustion, and audit records.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Orchestrator as MeetingOrchestrator
    participant Protocol as MeetingProtocol
    participant Tracker as TokenTracker
    participant Caller as AgentCaller
    participant Agent as Agent

    Client->>Orchestrator: run_meeting(config, agenda, leader, participants, budget)
    Orchestrator->>Orchestrator: _validate_inputs()
    Orchestrator->>Orchestrator: _resolve_protocol()
    Orchestrator->>Protocol: run(meeting_id, agenda, leader, participants, agent_caller, budget)
    Protocol->>Tracker: TokenTracker(budget)
    Protocol->>Protocol: build_agenda_prompt()

    loop contributions (parallel or sequential)
        Protocol->>Caller: call(agent_id, prompt, max_tokens)
        Caller->>Agent: invoke
        Agent-->>Caller: AgentResponse (input/output tokens)
        Caller-->>Protocol: AgentResponse
        Protocol->>Tracker: record(input_tokens, output_tokens)
        Protocol->>Protocol: append MeetingContribution
    end

    alt budget available for synthesis/summary
        Protocol->>Caller: call(leader/synthesizer, synthesis_prompt, max_tokens)
        Caller-->>Protocol: synthesis AgentResponse
        Protocol->>Tracker: record(...)
    else budget exhausted
        Protocol-->>Orchestrator: raise MeetingBudgetExhaustedError
    end

    Protocol-->>Orchestrator: MeetingMinutes
    Orchestrator->>Orchestrator: _record_success() / _make_failure_record()
    alt auto_create_tasks enabled
        Orchestrator->>Orchestrator: _create_tasks(action_items)
    end
    Orchestrator-->>Client: MeetingRecord
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: implement meeting protocol system (#123)' directly summarizes the main objective: implementing a meeting protocol system. It is concise, clear, and specific to the primary change.
Description check ✅ Passed The PR description thoroughly explains the implementation, covering all three protocols, the MeetingOrchestrator, domain models, utilities, error hierarchy, config changes, and documentation updates. It clearly relates to the changeset.
Linked Issues check ✅ Passed The PR fully implements the meeting protocol system per issue #123: three concrete protocols (round-robin, position papers, structured phases) with MeetingProtocol interface, MeetingOrchestrator for lifecycle management, token budget enforcement, action items → tasks conversion, domain models, error hierarchy, event constants, and comprehensive test coverage (>80%).
Out of Scope Changes check ✅ Passed All changes align with the meeting protocol system implementation. Documentation updates to CLAUDE.md and DESIGN_SPEC.md are directly related. The addition of MEETING_CONTRIBUTION to MessageType enum is a necessary extension to support the new meeting system. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/meeting-protocol
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/meeting-protocol

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive meeting protocol system, enabling structured multi-agent conversations within the AI company framework. It provides three distinct protocols—round-robin, position papers, and structured phases—managed by a central orchestrator. The system ensures robust lifecycle management, budget enforcement, and detailed audit trails, significantly enhancing the platform's ability to facilitate complex agent interactions and decision-making.

Highlights

  • Meeting Protocol Implementation: Implemented three distinct meeting protocols: round-robin, position papers, and structured phases, all adhering to a unified MeetingProtocol interface.
  • Meeting Orchestrator: Introduced a MeetingOrchestrator to manage the full meeting lifecycle, including input validation, protocol selection, execution, task creation from action items, and audit trail recording.
  • New Domain Models: Defined several new frozen Pydantic v2 domain models, such as MeetingAgenda, MeetingMinutes, MeetingContribution, ActionItem, and MeetingRecord, to represent meeting-related data.
  • Shared Utilities and Error Handling: Added shared utilities like TokenTracker for mutable budget tracking and build_agenda_prompt for DRY prompt construction. A new MeetingError hierarchy was also established for robust error management.
  • Parallel Execution and Budget Enforcement: Enabled parallel execution for position papers and structured phases protocols using asyncio.TaskGroup, coupled with pre-divided budgets for agents and MeetingBudgetExhaustedError for strict token budget enforcement.
  • Observability and Configuration: Integrated 20 new structured event constants for meeting activities into observability/events/meeting.py and introduced MeetingProtocolConfig for flexible, per-protocol settings, including a top-level auto_create_tasks option.
  • Pre-PR Review Enhancements: Incorporated 38 findings from a pre-PR review by 9 agents, leading to key fixes such as extracting shared modules, resolving non-deterministic parallel ordering, implementing budget pre-division, refactoring large methods, enhancing validation logging, and strengthening test coverage.
Changelog
  • CLAUDE.md
    • Updated the description for the communication/ directory to include 'meeting protocol'.
    • Revised logging guidelines to reference new MEETING_STARTED event.
    • Expanded the package structure documentation to include the new communication/meeting/ and observability/events/meeting.py modules.
  • DESIGN_SPEC.md
    • Updated the 'Current state (M4 in-progress)' section to reflect the implementation of meeting protocols.
    • Marked the 'Meetings' section as 'M4 complete' for all three protocols.
    • Adjusted the structured_phases configuration example to move auto_create_tasks to a top-level setting.
    • Added detailed package structure for the new communication/meeting/ and observability/events/meeting.py modules.
  • src/ai_company/communication/init.py
    • Imported numerous new classes and enums from the ai_company.communication.meeting package.
    • Added these new imports to the __all__ export list for public access.
  • src/ai_company/communication/config.py
    • Imported MeetingProtocolConfig.
    • Added a protocol_config field to MeetingTypeConfig.
  • src/ai_company/communication/enums.py
    • Added MEETING_CONTRIBUTION to the MessageType enum.
  • src/ai_company/communication/meeting/init.py
    • Created the meeting package, defining its public interfaces and documentation.
  • src/ai_company/communication/meeting/_prompts.py
    • Added shared prompt builders for meeting protocols, including build_agenda_prompt.
  • src/ai_company/communication/meeting/_token_tracker.py
    • Implemented a TokenTracker class for mutable token budget tracking within meeting executions.
  • src/ai_company/communication/meeting/config.py
    • Defined configuration models for RoundRobinConfig, PositionPapersConfig, StructuredPhasesConfig, and the top-level MeetingProtocolConfig.
  • src/ai_company/communication/meeting/enums.py
    • Defined new enumerations for MeetingProtocolType, MeetingPhase, and MeetingStatus.
  • src/ai_company/communication/meeting/errors.py
    • Established a hierarchy of meeting-specific exceptions, all inheriting from CommunicationError.
  • src/ai_company/communication/meeting/models.py
    • Defined Pydantic models for AgentResponse, MeetingAgendaItem, MeetingAgenda, MeetingContribution, ActionItem, MeetingMinutes, and MeetingRecord.
  • src/ai_company/communication/meeting/orchestrator.py
    • Implemented the MeetingOrchestrator class to manage the full meeting lifecycle, including protocol selection, execution, and audit trail recording.
  • src/ai_company/communication/meeting/position_papers.py
    • Implemented the PositionPapersProtocol for parallel input gathering and synthesis in meetings.
  • src/ai_company/communication/meeting/protocol.py
    • Defined the MeetingProtocol interface and AgentCaller, TaskCreator type aliases for meeting protocol implementations.
  • src/ai_company/communication/meeting/round_robin.py
    • Implemented the RoundRobinProtocol for sequential turns with full transcript context in meetings.
  • src/ai_company/communication/meeting/structured_phases.py
    • Implemented the StructuredPhasesProtocol for phased input gathering, conditional discussion, and synthesis in meetings.
  • src/ai_company/observability/events/meeting.py
    • Defined 20 structured event constants for meeting lifecycle and phase tracking.
  • tests/integration/communication/test_meeting_integration.py
    • Added integration tests for the meeting protocol system, covering full lifecycle, protocol switching, budget enforcement, and error recovery.
  • tests/unit/communication/meeting/conftest.py
    • Added Pytest fixtures and factories to support meeting protocol unit tests.
  • tests/unit/communication/meeting/test_config.py
    • Added unit tests for meeting protocol configuration models.
  • tests/unit/communication/meeting/test_enums.py
    • Added unit tests for meeting protocol enumerations.
  • tests/unit/communication/meeting/test_errors.py
    • Added unit tests for meeting protocol error hierarchy.
  • tests/unit/communication/meeting/test_models.py
    • Added unit tests for meeting protocol domain models.
  • tests/unit/communication/meeting/test_orchestrator.py
    • Added unit tests for the meeting orchestrator.
  • tests/unit/communication/meeting/test_position_papers.py
    • Added unit tests for the position papers protocol.
  • tests/unit/communication/meeting/test_protocol.py
    • Added unit tests for the meeting protocol interface.
  • tests/unit/communication/meeting/test_round_robin.py
    • Added unit tests for the round-robin protocol.
  • tests/unit/communication/meeting/test_structured_phases.py
    • Added unit tests for the structured phases protocol.
  • tests/unit/communication/test_enums.py
    • Updated MessageType enum test to reflect the addition of MEETING_CONTRIBUTION.
  • tests/unit/observability/test_events.py
    • Updated event discovery test to include the new meeting module.
    • Added a new test to verify the meeting event constants.
Activity
  • The pull request author, Aureliolo, implemented the meeting protocol system.
  • A comprehensive pre-PR review was conducted by 9 agents, resulting in 38 findings being addressed.
  • Key fixes from the pre-PR review included extracting shared _TokenTracker and _prompts modules to eliminate duplication, fixing non-deterministic parallel ordering in position papers and structured phases, adding budget pre-division for parallel agents to prevent race conditions, splitting large run() methods into focused helpers, adding validation logging and ExceptionGroup handling, improving task creator resilience, fixing TaskCreator type alias, adding participant uniqueness and leader exclusion validators on MeetingMinutes, and strengthening test coverage for budget exhaustion, task creation, and protocol interfaces.
  • All 3394 tests passed successfully, maintaining a 96.22% coverage (above the 80% minimum).
  • Ruff lint and mypy strict checks passed cleanly across all 379 files.
  • All pre-commit hooks passed, ensuring code quality and consistency.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive and well-designed meeting protocol system. It includes three distinct protocols (round-robin, position papers, and structured phases), a robust orchestrator for managing the meeting lifecycle, and a full suite of supporting models, configurations, error types, and tests. The implementation demonstrates strong adherence to best practices, especially in its use of dependency injection, asynchronous patterns with asyncio.TaskGroup, and resilient error handling. The code is well-structured and thoroughly tested. I have one suggestion to improve the robustness of the conflict detection mechanism in the structured phases protocol by using a more deterministic method than string parsing.

Note: Security Review did not run due to the size of the PR.

contributions.append(conflict_contribution)
turn_number += 1

conflicts_detected = "CONFLICTS: YES" in conflict_response.content.upper()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Parsing the LLM's free-text response to detect conflicts is a bit brittle. A minor change in the model's output format (e.g., extra whitespace, different phrasing) could break this logic. For more robust conflict detection, consider one of these approaches:

  1. Structured JSON Output: Instruct the LLM to respond with a JSON object, which can be parsed reliably.
    {
      "conflicts_detected": true,
      "analysis": "Disagreement on API style."
    }
  2. Tool Calling: Define a record_conflict_decision tool that the LLM must call. The tool would take a boolean conflicts_detected argument, ensuring a deterministic signal.

This change would improve the robustness of the conflict detection mechanism.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the meeting protocol subsystem described in DESIGN_SPEC §5.7, adding pluggable meeting execution strategies plus orchestration, configuration, observability events, and comprehensive unit/integration tests.

Changes:

  • Added ai_company.communication.meeting package with 3 protocol implementations, shared prompt/token utilities, and a MeetingOrchestrator.
  • Introduced meeting domain models/config/enums/errors and wired meeting settings into MeetingTypeConfig plus communication package re-exports.
  • Added meeting observability event constants and expanded unit/integration test coverage for protocols/orchestrator.

Reviewed changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/unit/observability/test_events.py Registers and validates meeting event constants in the observability event suite.
tests/unit/communication/test_enums.py Updates message-type enum member count for new meeting message type.
tests/unit/communication/meeting/conftest.py Adds meeting test fixtures and a mock AgentCaller.
tests/unit/communication/meeting/test_config.py Unit tests for meeting protocol config models.
tests/unit/communication/meeting/test_enums.py Unit tests for meeting protocol enums (type/phase/status).
tests/unit/communication/meeting/test_errors.py Unit tests for meeting error hierarchy + context behavior.
tests/unit/communication/meeting/test_models.py Unit tests for meeting domain Pydantic models/validators.
tests/unit/communication/meeting/test_orchestrator.py Unit tests for orchestrator validation, execution, audit trail, and error handling.
tests/unit/communication/meeting/test_position_papers.py Unit tests for parallel position-papers protocol behavior.
tests/unit/communication/meeting/test_protocol.py Unit tests verifying the runtime-checkable MeetingProtocol interface and type aliases.
tests/unit/communication/meeting/test_round_robin.py Unit tests for round-robin protocol behavior and budget handling.
tests/unit/communication/meeting/test_structured_phases.py Unit tests for structured-phases protocol behavior and conflict gating.
tests/unit/communication/meeting/init.py Marks meeting unit test package.
tests/integration/communication/test_meeting_integration.py End-to-end integration tests for protocol switching, budgets, and communication re-exports.
src/ai_company/observability/events/meeting.py Adds meeting lifecycle/phase/task/token event constants.
src/ai_company/communication/meeting/_prompts.py Shared agenda prompt builder for meeting protocols.
src/ai_company/communication/meeting/_token_tracker.py Shared mutable token budget tracker used by protocol implementations.
src/ai_company/communication/meeting/config.py Adds per-protocol config models + top-level MeetingProtocolConfig.
src/ai_company/communication/meeting/enums.py Adds meeting protocol/phase/status enums.
src/ai_company/communication/meeting/errors.py Adds meeting-specific error hierarchy rooted at CommunicationError.
src/ai_company/communication/meeting/models.py Adds frozen Pydantic domain models for agendas, minutes, records, action items, etc.
src/ai_company/communication/meeting/orchestrator.py Adds orchestrator to validate inputs, pick protocol, execute meeting, create tasks, and record audit trail.
src/ai_company/communication/meeting/position_papers.py Implements async position papers protocol using asyncio.TaskGroup.
src/ai_company/communication/meeting/protocol.py Defines MeetingProtocol runtime-checkable protocol and AgentCaller/TaskCreator type aliases.
src/ai_company/communication/meeting/round_robin.py Implements round-robin protocol with transcript context, summary reserve, and budget handling.
src/ai_company/communication/meeting/structured_phases.py Implements structured-phases protocol with parallel input gathering and conditional discussion.
src/ai_company/communication/meeting/init.py Re-exports meeting subsystem public API.
src/ai_company/communication/enums.py Adds MessageType.MEETING_CONTRIBUTION.
src/ai_company/communication/config.py Adds protocol_config: MeetingProtocolConfig to MeetingTypeConfig.
src/ai_company/communication/init.py Re-exports meeting types from top-level communication package.
DESIGN_SPEC.md Updates spec status and package tree to include meeting subsystem and config shape.
CLAUDE.md Updates package structure and logging event-constant guidance to include meetings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +10
"""Integration tests for the meeting protocol system.

Tests full meeting lifecycle with protocol switching, budget
enforcement, and orchestrator coordination.
"""

import pytest

from ai_company.communication.meeting.config import (
MeetingProtocolConfig,
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing integration tests typically include pytestmark = pytest.mark.timeout(30) (e.g. tests/integration/communication/test_delegation_integration.py:39) to avoid hangs. This new integration test module doesn’t set a timeout marker, so consider adding one for consistency and to keep CI behavior uniform.

Copilot uses AI. Check for mistakes.
Comment on lines 19 to 23
def test_member_count(self) -> None:
assert len(MessageType) == 8
assert len(MessageType) == 9

def test_values(self) -> None:
assert MessageType.TASK_UPDATE.value == "task_update"
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(MessageType) was updated to 9, but test_values() still only asserts the original 8 enum values. Add an assertion for MessageType.MEETING_CONTRIBUTION.value == "meeting_contribution" so this test actually validates the newly added member.

Copilot uses AI. Check for mistakes.
Comment on lines +249 to +255
if status == MeetingStatus.BUDGET_EXHAUSTED:
logger.warning(
MEETING_FAILED,
meeting_id=meeting_id,
status=status,
error=error_msg,
)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the meeting ends due to budget exhaustion, _make_failure_record() logs the lifecycle event as MEETING_FAILED. Since MEETING_BUDGET_EXHAUSTED exists as a distinct lifecycle event, log that event for MeetingStatus.BUDGET_EXHAUSTED (and reserve MEETING_FAILED for true failures) so observability can distinguish the two outcomes reliably.

Copilot uses AI. Check for mistakes.
Comment on lines +350 to +356
if not participant_ids:
logger.warning(
MEETING_VALIDATION_FAILED,
meeting_id=meeting_id,
error="at least one participant is required",
)
msg = "At least one participant is required"
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_validate_inputs() doesn’t reject duplicate entries in participant_ids. If duplicates are passed, protocols will perform duplicate agent calls and MeetingMinutes validation will later fail (unique participant_ids), turning an input problem into a runtime FAILED record. Validate uniqueness here (and raise MeetingParticipantError) to keep failures deterministic and consistent with the method’s “validation errors are raised directly” contract.

Copilot uses AI. Check for mistakes.
skip_discussion_if_no_conflicts: Skip discussion when no
conflicts are detected.
max_discussion_tokens: Token budget for the discussion round.
auto_create_tasks: Whether to auto-create tasks from action items.
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructuredPhasesConfig’s docstring lists auto_create_tasks, but the model doesn’t define that field (it lives on MeetingProtocolConfig). Update the docstring to avoid implying a per-protocol flag that doesn’t exist.

Suggested change
auto_create_tasks: Whether to auto-create tasks from action items.

Copilot uses AI. Check for mistakes.
DESIGN_SPEC.md Outdated
Meetings (§5.1 Pattern 3) follow configurable protocols that determine how agents interact during structured multi-agent conversations. Different meeting types naturally suit different protocols. All protocols implement a `MeetingProtocol` protocol, making the system extensible — new protocols can be registered and selected per meeting type. Cost bounds are enforced by `duration_tokens` in meeting config (§5.4).

> **MVP: Not in M3.** Meetings are an M4 feature. Round-Robin (Protocol 1) is the initial default.
> **Current state (M4 complete):** All 3 meeting protocols are implemented in `communication/meeting/`: `RoundRobinProtocol`, `PositionPapersProtocol`, and `StructuredPhasesProtocol`. The `MeetingOrchestrator` runs meetings end-to-end with token budget enforcement via `TokenBudgetTracker`. All protocols implement the `MeetingProtocol` protocol interface.
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says budget enforcement is done via TokenBudgetTracker, but the implementation added in communication/meeting/_token_tracker.py defines TokenTracker. Update the name here to match the actual type to keep the spec accurate/searchable.

Suggested change
> **Current state (M4 complete):** All 3 meeting protocols are implemented in `communication/meeting/`: `RoundRobinProtocol`, `PositionPapersProtocol`, and `StructuredPhasesProtocol`. The `MeetingOrchestrator` runs meetings end-to-end with token budget enforcement via `TokenBudgetTracker`. All protocols implement the `MeetingProtocol` protocol interface.
> **Current state (M4 complete):** All 3 meeting protocols are implemented in `communication/meeting/`: `RoundRobinProtocol`, `PositionPapersProtocol`, and `StructuredPhasesProtocol`. The `MeetingOrchestrator` runs meetings end-to-end with token budget enforcement via `TokenTracker`. All protocols implement the `MeetingProtocol` protocol interface.

Copilot uses AI. Check for mistakes.
DESIGN_SPEC.md Outdated
│ │ ├── meeting/ # Meeting protocol subsystem
│ │ │ ├── __init__.py # Package exports
│ │ │ ├── _prompts.py # LLM prompt templates for meeting phases
│ │ │ ├── _token_tracker.py # TokenBudgetTracker for duration_tokens enforcement
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package tree comment for _token_tracker.py calls it TokenBudgetTracker, but the module defines TokenTracker. Please align the spec’s naming with the code so readers can find the right symbol quickly.

Suggested change
│ │ │ ├── _token_tracker.py # TokenBudgetTracker for duration_tokens enforcement
│ │ │ ├── _token_tracker.py # TokenTracker for duration_tokens enforcement

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
"""Tests for round-robin meeting protocol."""

import pytest

from ai_company.communication.meeting.config import RoundRobinConfig
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most existing tests/unit/communication/** modules set pytestmark = pytest.mark.timeout(30) (e.g. tests/unit/communication/test_enums.py:14). This new meeting test module doesn’t, so it won’t be covered by the same hang-protection. Consider adding the same module-level timeout marker here (and across the new meeting test files) for consistency.

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR implements the full §5.7 meeting protocol system: three strategies (RoundRobinProtocol, PositionPapersProtocol, StructuredPhasesProtocol) behind a MeetingProtocol interface, a MeetingOrchestrator lifecycle manager, frozen Pydantic v2 domain models, a shared TokenTracker, 20 observability event constants, and a structured error hierarchy. The pre-review addressed the most critical findings (synthesis budget reserves, ExceptionGroup handling, event-constant corrections, validation logging). Three new issues remain:

  • decisions and action_items are never populated by any protocol — all three protocols place the synthesizer's free-text output in summary only. MeetingMinutes.decisions and MeetingMinutes.action_items always default to empty tuples, making the auto_create_tasks feature silently non-functional regardless of configuration. A parsing step is needed (or the feature should be clearly documented as caller-responsibility).
  • Discussion phase can still exhaust the synthesis reserve in structured_phases.py — the 20% _SYNTHESIS_RESERVE_FRACTION applied before input gathering is shared across the conflict-check agent, the discussion round, and synthesis. When discussion consumes the remaining reserve, _run_synthesis receives an exhausted tracker and raises MeetingBudgetExhaustedError. A nested sub-reserve for synthesis inside _run_discussion_round (mirroring the input-gathering fix) is needed.
  • MEETING_CONFLICT_DETECTED event emitted twice per conflict check — once at INFO (without raw_response) and once at DEBUG (with raw_response) using the same event constant string. This produces duplicate entries in log aggregators and inflates event-count dashboards.

Confidence Score: 3/5

  • Safe to merge for infrastructure, but the auto_create_tasks feature is silently broken and the synthesis budget exhaustion in structured phases remains a real runtime failure path.
  • The core protocol machinery, token tracking, error hierarchy, and observability plumbing are well-implemented and thoroughly tested. However, two logic issues affect runtime correctness: MeetingMinutes.action_items is never populated by any protocol (making auto_create_tasks inoperative and leaving decisions as a dead field), and the discussion phase in StructuredPhasesProtocol can still consume the synthesis reserve and trigger MeetingBudgetExhaustedError under realistic token budgets — the pre-review fix only protects synthesis from input-gathering exhaustion, not from discussion exhaustion. The duplicate event logging is a minor observability concern.
  • src/ai_company/communication/meeting/structured_phases.py (synthesis budget exhaustion + duplicate event logging) and src/ai_company/communication/meeting/models.py / all three protocol files (unpopulated action_items / decisions).

Important Files Changed

Filename Overview
src/ai_company/communication/meeting/orchestrator.py Lifecycle manager for meetings; correctly differentiates BUDGET_EXHAUSTED vs FAILED records, handles ExceptionGroup from parallel TaskGroup, validates inputs with proper logging. The _create_tasks method is silently inoperative because no protocol ever populates MeetingMinutes.action_items.
src/ai_company/communication/meeting/structured_phases.py Three-phase protocol with parallel input gathering and conditional discussion. Two issues: (1) duplicate MEETING_CONFLICT_DETECTED event at both INFO and DEBUG per conflict check, and (2) discussion phase can fully consume the 20% synthesis reserve, leaving _run_synthesis with zero remaining tokens and raising MeetingBudgetExhaustedError despite the reserve fix from a prior review.
src/ai_company/communication/meeting/models.py Well-structured frozen Pydantic v2 models with solid validators (timing, participant uniqueness, leader exclusion, status consistency). MeetingMinutes.decisions and MeetingMinutes.action_items are declared but never populated by any protocol implementation, making them dead fields.
src/ai_company/communication/meeting/position_papers.py Parallel position-paper collection with deterministic slot-indexed ordering and 20% synthesis reserve. Clean separation of concerns with _collect_position_papers and _run_synthesis. No new issues; synthesis reserve is correctly applied before dividing the parallel budget.
src/ai_company/communication/meeting/round_robin.py Sequential turn-taking with 20% summary reserve, per-turn token-available calculation, max_turns_per_agent and max_total_turns caps, and explicit budget-exhaustion warning. Logic is sound and no new issues found.
src/ai_company/communication/meeting/_token_tracker.py Simple, correct token budget tracker with clear concurrency note in the module docstring. Validates non-negative inputs and positive budget. No issues.
src/ai_company/communication/meeting/protocol.py Clean protocol interface definitions with runtime_checkable protocols for MeetingProtocol and ConflictDetector. TaskCreator type alias correctly uses Priority enum. No issues.
src/ai_company/observability/events/meeting.py Twenty well-named event constants organized into logical groups. All constants are typed as Final[str]. No issues.
tests/integration/communication/test_meeting_integration.py Good coverage of lifecycle, protocol switching, budget exhaustion, and task creation paths. No test verifies that action_items are ever non-empty or that auto_create_tasks actually creates tasks, which is consistent with the implementation gap noted in models.py.
src/ai_company/communication/meeting/errors.py Clean four-level error hierarchy under CommunicationError. All exceptions are distinct and well-named. No issues.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Orchestrator as MeetingOrchestrator
    participant Registry as ProtocolRegistry
    participant Protocol as MeetingProtocol
    participant Tracker as TokenTracker
    participant Agent as AgentCaller

    Caller->>Orchestrator: run_meeting(meeting_type_name, protocol_config, agenda, leader_id, participant_ids, token_budget)
    Orchestrator->>Orchestrator: _validate_inputs()
    Orchestrator->>Registry: _resolve_protocol(protocol_type)
    Registry-->>Orchestrator: protocol instance
    Orchestrator->>Orchestrator: log MEETING_STARTED
    Orchestrator->>Protocol: _execute_protocol() → protocol.run()

    alt RoundRobinProtocol
        Protocol->>Tracker: TokenTracker(budget=token_budget)
        Note over Protocol: Reserve 20% for summary
        loop max_turns_per_agent × participants
            Protocol->>Agent: agent_caller(participant_id, turn_prompt, tokens_available)
            Agent-->>Protocol: AgentResponse
            Protocol->>Tracker: record(input_tokens, output_tokens)
        end
        Protocol->>Agent: agent_caller(leader_id, summary_prompt, remaining)
        Agent-->>Protocol: summary AgentResponse
    else PositionPapersProtocol
        Protocol->>Tracker: TokenTracker(budget=token_budget)
        Note over Protocol: Reserve 20% for synthesis
        par parallel position papers
            Protocol->>Agent: agent_caller(participant_1, position_prompt, slice)
            Protocol->>Agent: agent_caller(participant_2, position_prompt, slice)
            Protocol->>Agent: agent_caller(participant_N, position_prompt, slice)
        end
        Protocol->>Agent: agent_caller(synthesizer_id, synthesis_prompt, remaining)
        Agent-->>Protocol: synthesis AgentResponse
    else StructuredPhasesProtocol
        Protocol->>Tracker: TokenTracker(budget=token_budget)
        Note over Protocol: Phase 1: Agenda broadcast (no LLM)
        Note over Protocol: Reserve 20% for later phases
        par Phase 2: parallel input gathering
            Protocol->>Agent: agent_caller(participant_1, input_prompt, slice)
            Protocol->>Agent: agent_caller(participant_N, input_prompt, slice)
        end
        Protocol->>Agent: agent_caller(leader_id, conflict_check_prompt, remaining)
        Agent-->>Protocol: conflict AgentResponse
        alt conflicts detected
            loop participants
                Protocol->>Agent: agent_caller(pid, discussion_prompt, slice)
            end
        end
        Protocol->>Agent: agent_caller(leader_id, synthesis_prompt, remaining)
        Agent-->>Protocol: synthesis AgentResponse
    end

    Protocol-->>Orchestrator: MeetingMinutes
    Orchestrator->>Orchestrator: _create_tasks(action_items)
    Orchestrator->>Orchestrator: _record_success() → log MEETING_COMPLETED
    Orchestrator-->>Caller: MeetingRecord(status=COMPLETED)
Loading

Comments Outside Diff (3)

  1. src/ai_company/communication/meeting/models.py, line 839-867 (link)

    decisions and action_items fields are never populated — auto_create_tasks is silently inoperative

    All three protocol implementations (RoundRobinProtocol, PositionPapersProtocol, StructuredPhasesProtocol) construct MeetingMinutes without ever passing decisions or action_items, so both fields always default to empty tuples ().

    This means MeetingOrchestrator._create_tasks() always short-circuits on the not minutes.action_items guard:

    # orchestrator.py _create_tasks — this early-return always fires
    if (
        self._task_creator is None
        or not protocol_config.auto_create_tasks
        or not minutes.action_items          # ← always True; protocols never populate this
    ):
        return

    The synthesis prompts in all three protocols do ask agents to produce "action items with assignees", but the free-text summary field captures the raw LLM output — no parsing step extracts structured ActionItem objects from it.

    Concrete evidence:

    • round_robin.py MeetingMinutes(... summary=summary ...) — no action_items kwarg
    • position_papers.py MeetingMinutes(... summary=synthesis_contribution.content ...) — no action_items kwarg
    • structured_phases.py MeetingMinutes(... summary=summary ...) — no action_items kwarg

    Setting auto_create_tasks=True in MeetingProtocolConfig (the default) will never actually create any tasks. A parsing layer that extracts ActionItem objects from the synthesis response is needed, or the docstring for auto_create_tasks should be updated to reflect that the feature requires caller-side population of action_items.

  2. src/ai_company/communication/meeting/structured_phases.py, line 2785-2795 (link)

    Same MEETING_CONFLICT_DETECTED event constant emitted twice per conflict check

    The same event constant is logged once at INFO and once at DEBUG for every conflict-check response:

    logger.info(
        MEETING_CONFLICT_DETECTED,
        meeting_id=meeting_id,
        conflicts_found=conflicts_detected,   # subset of fields
    )
    logger.debug(
        MEETING_CONFLICT_DETECTED,           # same constant
        meeting_id=meeting_id,
        conflicts_found=conflicts_detected,
        raw_response=conflict_response.content,
    )

    In any log aggregator that groups or counts by event name, this produces two meeting.conflict.detected entries for every meeting — one without raw_response and one with. Dashboards counting conflict-detection events will double-count, and filtering for meeting.conflict.detected at DEBUG level will return both entries.

    The INFO entry is already sufficient for operational visibility. The raw_response context is useful for debugging but the note-worthy bit is at DEBUG, so the DEBUG line is redundant for every case where DEBUG is enabled (it duplicates the same conflicts_found field). Consider dropping the DEBUG call, or — if the raw response is desired at DEBUG — use a different, more specific event string (e.g., MEETING_AGENT_RESPONDED already exists and is fired in _collect_input):

  3. src/ai_company/communication/meeting/structured_phases.py, line 2854-2861 (link)

    Discussion phase can exhaust the synthesis reserve, still causing MeetingBudgetExhaustedError

    The _SYNTHESIS_RESERVE_FRACTION is subtracted from tracker.remaining before input gathering, protecting synthesis from input-gathering exhaustion. However, once _run_discussion is entered, the reserved portion of the budget is shared between the conflict-check agent call and the discussion round — with no nested sub-reserve carved out for synthesis.

    In practice: after input gathering consumes ~80% of the total budget, the conflict-check and discussion phases together consume the remaining ~20%. Since discussion_budget = min(self._config.max_discussion_tokens, tracker.remaining) returns tracker.remaining whenever the remaining tokens are less than max_discussion_tokens, all participants in _run_discussion_round can collectively consume the entire remaining balance, leaving nothing for _run_synthesis, which then raises MeetingBudgetExhaustedError.

    The same reserve pattern from _run_input_gathering (line 2639) should be applied inside _run_discussion_round before dividing the discussion budget across participants, so synthesis retains a guaranteed token allowance regardless of how much discussion consumes.

Last reviewed commit: 2aa2a1f

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 27

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Line 653: Update the spec text to use the current symbol names: replace
references to TokenBudgetTracker with TokenTracker and AgendaItem with
MeetingAgendaItem wherever they appear (including the section currently
mentioning all 3 meeting protocols and the noted repeat at lines ~2421-2425);
ensure any inline code formatting or API references match the implemented
symbols RoundRobinProtocol, PositionPapersProtocol, StructuredPhasesProtocol,
MeetingOrchestrator, and TokenTracker so readers are directed to existing APIs.

In `@src/ai_company/communication/config.py`:
- Around line 113-116: Update the Google-style docstring for the public class
MeetingTypeConfig to include the new public attribute `protocol_config` (type
MeetingProtocolConfig) and a short description such as "Meeting protocol
configuration" so generated docs and IDE hints reflect the new field; ensure the
docstring follows the existing Google-style sections (Args/Attributes) and
mentions `duration_tokens` and the new `protocol_config` entry consistently with
other attributes.

In `@src/ai_company/communication/meeting/_prompts.py`:
- Around line 23-27: The agenda item formatter in _prompts.py currently drops
MeetingAgendaItem.presenter_id when building entries from agenda.items; update
the loop that builds entry (for i, item in enumerate(agenda.items, 1)) to
include the optional presenter assignment when present (e.g., append a presenter
indicator using item.presenter_id or resolved presenter name) so presenter
ownership is preserved in the shared agenda prompt and any downstream protocol
that consumes MeetingAgendaItem can route presenter-specific flows correctly.
- Around line 3-9: The public annotation for MeetingAgenda in
build_agenda_prompt is only available under TYPE_CHECKING which causes runtime
resolution errors; replace the TYPE_CHECKING-only import with a real import of
MeetingAgenda from ai_company.communication.meeting.models at module level
(remove the conditional block or add a normal import alongside it) so that
build_agenda_prompt can be resolved at runtime by typing.get_type_hints() and
other consumers.

In `@src/ai_company/communication/meeting/_token_tracker.py`:
- Around line 6-38: TokenTracker currently mutates shared state via record(),
violating the repo convention of copy-on-write runtime state; replace in-place
mutation by making TokenTracker immutable and having record return a new
instance with updated input_tokens/output_tokens (or convert to a frozen
Pydantic model and use model_copy(update=...)). Update the TokenTracker
dataclass to be frozen (or use Pydantic BaseModel frozen) and change the record
method signature to return TokenTracker with the new totals; keep the used,
remaining, and is_exhausted properties as-is but computed from the new instance
to preserve behavior. Ensure all call sites that previously relied on in-place
updates now reassign the returned TokenTracker from record(...) to propagate
state.
- Around line 25-38: The record method currently accepts any integers and can
silently under/over-count tokens; update record(input_tokens, output_tokens) to
validate both deltas are non-negative ints, compute projected_used = self.used +
input_tokens + output_tokens and raise a ValueError (or custom exception) before
mutating if projected_used > self.budget or if either delta < 0; keep remaining
and is_exhausted as-is but rely on this check to prevent negative/overspend
states and perform the validation atomically so
self.input_tokens/self.output_tokens/self.used are only updated after the checks
pass.

In `@src/ai_company/communication/meeting/config.py`:
- Around line 61-66: Remove the stale attribute mention from
StructuredPhasesConfig's docstring: delete the "auto_create_tasks" entry under
Attributes so the docstring only documents skip_discussion_if_no_conflicts and
max_discussion_tokens; reference that the auto_create_tasks setting now lives on
MeetingProtocolConfig.auto_create_tasks so do not reintroduce it into
StructuredPhasesConfig's public docs.

In `@src/ai_company/communication/meeting/models.py`:
- Around line 265-289: The validator _validate_status_consistency should reject
blank/whitespace error messages for failed states: when self.status is
MeetingStatus.FAILED or MeetingStatus.BUDGET_EXHAUSTED, check that
self.error_message is not None and that self.error_message.strip() is non-empty
(or enforce min_length=1 on the error_message Field) and raise the existing
ValueError if the check fails so empty strings/whitespace are not accepted.
- Around line 184-218: The MeetingMinutes model currently allows independently
set total_input_tokens and total_output_tokens that can diverge from
per-contribution counts in contributions; update MeetingMinutes to enforce
consistency by deriving or validating these totals from MeetingContribution
entries: either convert total_input_tokens and total_output_tokens into computed
fields (like total_tokens is computed) or add a model validator that sums
MeetingContribution.total_input_tokens and .total_output_tokens and
raises/normalizes when they differ from the stored
total_input_tokens/total_output_tokens; refer to MeetingContribution,
MeetingMinutes, contributions, total_input_tokens, total_output_tokens, and
total_tokens when making the change.

In `@src/ai_company/communication/meeting/orchestrator.py`:
- Around line 332-378: Extend _validate_inputs to also reject blank/whitespace
IDs and duplicate participant IDs before execution: after the existing
token_budget check, validate leader_id.strip() is non-empty and that every
participant_id.strip() is non-empty (treat trimmed values as the canonical IDs),
logging MEETING_VALIDATION_FAILED and raising MeetingParticipantError with
context={"meeting_id": meeting_id, "leader_id": leader_id} (or include offending
participant id) when any are blank; then check for duplicates among the trimmed
participant_ids (e.g., by comparing length of set(trimmed_participant_ids) to
list length), log MEETING_VALIDATION_FAILED with error="duplicate
participant_ids" and raise MeetingParticipantError with context={"meeting_id":
meeting_id, "duplicates": list_of_duplicates}; keep the existing
leader-in-participants check using the trimmed canonical IDs so leader_id
present in participant_ids still fails as before.
- Around line 85-88: The constructor currently assigns the caller-owned
protocol_registry directly to self._protocol_registry which allows external
mutation; in __init__ make a defensive deep copy of the passed protocol_registry
(use copy.deepcopy) and wrap that copy in types.MappingProxyType before
assigning to self._protocol_registry so the orchestrator holds an immutable
snapshot for its lifetime; update any imports (copy and MappingProxyType) if
needed and keep the attribute name self._protocol_registry and its type
annotation list[MeetingRecord] unchanged.
- Around line 346-348: Before raising the ValueError in the token_budget <= 0
branch, emit a warning using the existing logger with context and the
MEETING_VALIDATION_FAILED event: log a message that includes the token_budget
value and MEETING_VALIDATION_FAILED (use logger.warning or self.logger.warning
consistent with surrounding code) and any other relevant identifiers, then raise
ValueError(msg); ensure MEETING_VALIDATION_FAILED is referenced/imported the
same way other validation errors are logged in this module so observability
captures this public error path.

In `@src/ai_company/communication/meeting/position_papers.py`:
- Around line 156-167: The MeetingMinutes returned for POSITION_PAPERS only sets
summary, leaving decisions and action_items empty so
MeetingOrchestrator._create_tasks() cannot generate tasks; update the return in
position_papers flow to populate MeetingMinutes.decisions and .action_items from
the synthesis contribution (e.g., use synthesis_contribution.decisions and
synthesis_contribution.action_items or parse structured fields from
synthesis_contribution.content if needed), converting them to the expected tuple
types, and keep summary as synthesis_contribution.content so tasks are created
correctly.
- Around line 203-215: The current pre-division uses tokens_per_agent = max(1,
tracker.remaining // max(1, n)) which can over-allocate when tracker.remaining <
number of parallel workers; change the logic before spawning _collect_paper
tasks to compute an effective fan-out: set n_effective = min(n,
tracker.remaining) (or if tracker.remaining == 0 raise/return early), then
compute tokens_per_agent = tracker.remaining // max(1, n_effective); if
tokens_per_agent == 0 either reduce n_effective to tracker.remaining or fail
fast; update all places that use n (e.g., the TaskGroup spawn and participant
loop that starts _collect_paper) to use n_effective so no worker receives a zero
or implicit extra token.

In `@src/ai_company/communication/meeting/round_robin.py`:
- Around line 125-126: The code always subtracts a summary reserve
(summary_reserve = int(token_budget * _SUMMARY_RESERVE_FRACTION)) from
discussion_budget even when summaries are disabled; change the logic in the
block that computes summary_reserve/discussion_budget so that summary_reserve is
set to 0 when leader_summarizes is False (and only compute int(token_budget *
_SUMMARY_RESERVE_FRACTION) when leader_summarizes is True), then set
discussion_budget = token_budget - summary_reserve; update any code paths that
use summary_reserve to assume zero when summaries are disabled (references:
summary_reserve, _SUMMARY_RESERVE_FRACTION, discussion_budget,
leader_summarizes).

In `@src/ai_company/communication/meeting/structured_phases.py`:
- Around line 388-392: The conflict detection call currently passes
tracker.remaining directly to agent_caller (conflict_response = await
agent_caller(leader_id, conflict_prompt, tracker.remaining)), which can exhaust
the whole meeting budget; change it to cap the tokens by
StructuredPhasesConfig.max_discussion_tokens (e.g., compute allowed =
min(tracker.remaining, StructuredPhasesConfig.max_discussion_tokens) and pass
allowed) and ensure the value is non-negative so phase 3 discussion/synthesis
retain their token budget; update any related variable names in the same scope
and keep the agent_caller invocation using leader_id and conflict_prompt.
- Around line 278-279: The current calculation forces tokens_per_agent = 1 even
when tracker.remaining < num_participants, causing total allocation to
oversubscribe the remaining budget; change the logic around
participant_ids/tokens_per_agent so you either (A) cap fan-out by computing
max_agents = min(len(participant_ids), tracker.remaining) and use
tokens_per_agent = tracker.remaining // max(1, max_agents) (and reduce/choose a
subset of participant_ids to max_agents), or (B) fail fast when
tracker.remaining < len(participant_ids) by raising or returning an error;
update the code that sets tokens_per_agent and any subsequent use of
participant_ids accordingly (refer to participant_ids, tokens_per_agent,
tracker.remaining).
- Around line 238-250: The MeetingMinutes being returned from the
structured-phases flow omits decisions and action_items even though the
synthesis collects them; update the MeetingMinutes constructor call in
structured_phases.py to include decisions and action_items (e.g.,
decisions=tuple(decisions) and action_items=tuple(action_items)), ensuring the
local variables (decisions, action_items) are defined or defaulted to empty
lists/tuples before the return so downstream task creation sees the populated
fields; keep protocol_type as MeetingProtocolType.STRUCTURED_PHASES and preserve
existing fields like summary.

In `@tests/integration/communication/test_meeting_integration.py`:
- Around line 121-145: The test test_round_robin_full_lifecycle currently only
asserts len(record.minutes.contributions) > 0 which doesn't verify the
RoundRobinConfig was applied; update the assertion to check that the
contributions match the expected deterministic number given
RoundRobinConfig(max_turns_per_agent=1) (i.e., two participant turns plus one
meeting summary => assert len(record.minutes.contributions) == 3) and optionally
assert the participant contributions contain the expected participant_ids
("agent-a", "agent-b") in the expected order to ensure max_turns_per_agent was
honored; locate these checks around the existing MeetingProtocolConfig /
RoundRobinConfig usage and the record.minutes.contributions assertions.

In `@tests/unit/communication/meeting/conftest.py`:
- Around line 58-79: The fake caller _caller currently ignores the max_tokens
argument so budget tests can't observe or enforce token limits; modify _caller
to record the requested max_tokens (e.g., push it into a shared list or dict
keyed by agent_id/call index) and/or clamp the returned
input_tokens/output_tokens to not exceed max_tokens before constructing the
AgentResponse; update references inside _caller (max_tokens, input_tokens,
output_tokens, call_counts, _responses, default_content, AgentResponse) so tests
can inspect the captured requested budgets or rely on the clamped token counts.

In `@tests/unit/communication/meeting/test_orchestrator.py`:
- Around line 374-402: The tests test_task_creator_not_called_when_disabled and
its paired test never exercise task creation because the mocked protocol/minutes
return no ActionItem; update the tests to use a stub protocol or mocked minutes
that include at least one ActionItem so the orchestrator path through
_create_tasks() is executed (for example, provide a custom protocol/minutes
object returned by the orchestrator run that contains an ActionItem), then
verify behavior when MeetingProtocolConfig(auto_create_tasks=False) and when
task_creator raises—this will exercise the gate and exception-handling paths;
reference the test functions test_task_creator_not_called_when_disabled, the
MeetingProtocolConfig(auto_create_tasks=False) usage, the task_creator stub
passed into _make_orchestrator, and the orchestrator.run_meeting call to locate
where to inject the stubbed minutes or protocol.
- Around line 292-310: The test test_budget_exhaustion_produces_record currently
accepts MeetingStatus.FAILED which permits regressions; change the assertion to
require the dedicated budget-exhaustion status only by asserting record.status
== MeetingStatus.BUDGET_EXHAUSTED (keep the existing check that
record.error_message is not None) so the test verifies
orchestrator.run_meeting(..., token_budget=1) produces the explicit
BUDGET_EXHAUSTED outcome.

In `@tests/unit/communication/meeting/test_position_papers.py`:
- Around line 60-118: The tests test_contributions_structure and
test_parallel_execution_all_participants currently only assert counts and set
membership which won't catch nondeterministic ordering regressions; update them
to assert a deterministic order by inspecting minutes.contributions (filtering
by MeetingPhase.POSITION_PAPER) and then check the sequence of each
contribution's turn_number or agent_id matches the expected participant_ids
order (or a sorted/explicit expected sequence), e.g., build the ordered list
from [c for c in minutes.contributions if c.phase ==
MeetingPhase.POSITION_PAPER] and assert their turn_number or agent_id sequence
equals the expected ordering to lock in the deterministic contract.

In `@tests/unit/communication/meeting/test_round_robin.py`:
- Around line 220-247: The test currently treats a call that would push total
tokens past token_budget as valid; update the test
(test_budget_exhaustion_stops_turns and the similar block at 249-280) so the
RoundRobinProtocol enforces the budget pre-turn: invoke protocol.run with
make_mock_agent_caller and token_budget=50 but assert that the protocol either
raises MeetingBudgetExhaustedError before executing the turn that would cross
the budget or verify the mock caller was not invoked for that crossing turn;
specifically reference RoundRobinProtocol.run, make_mock_agent_caller,
MeetingBudgetExhaustedError and the token_budget argument when changing the
expectation so the test fails if an over-budget turn is allowed to complete.

In `@tests/unit/communication/meeting/test_structured_phases.py`:
- Around line 105-112: The test mocks use a hardcoded key "leader-agent" in the
responses mapping which breaks if the fixture leader_id changes; update each
responses dict (e.g., the one passed to make_mock_agent_caller in
test_structured_phases.py) to use the injected leader_id variable as the key
instead of the literal string, and apply the same change to the other response
dictionaries in this file (the blocks at the other ranges noted) so mocks match
the fixture-provided leader_id.

In `@tests/unit/communication/test_enums.py`:
- Around line 19-20: Update the test to assert the new enum member directly
instead of relying only on the enum length: in
tests/unit/communication/test_enums.py modify the test_values() (or add to
test_member_count) to include an explicit assertion that
MessageType.MEETING_CONTRIBUTION exists (e.g., assert
MessageType.MEETING_CONTRIBUTION in MessageType or assert hasattr(MessageType,
"MEETING_CONTRIBUTION")) so the test fails if that public enum value is removed
or renamed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 38d2d420-bcd8-46b8-a5e3-718f69959a38

📥 Commits

Reviewing files that changed from the base of the PR and between c02832a and af5d72a.

📒 Files selected for processing (32)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • src/ai_company/communication/__init__.py
  • src/ai_company/communication/config.py
  • src/ai_company/communication/enums.py
  • src/ai_company/communication/meeting/__init__.py
  • src/ai_company/communication/meeting/_prompts.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • src/ai_company/communication/meeting/config.py
  • src/ai_company/communication/meeting/enums.py
  • src/ai_company/communication/meeting/errors.py
  • src/ai_company/communication/meeting/models.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/communication/meeting/position_papers.py
  • src/ai_company/communication/meeting/protocol.py
  • src/ai_company/communication/meeting/round_robin.py
  • src/ai_company/communication/meeting/structured_phases.py
  • src/ai_company/observability/events/meeting.py
  • tests/integration/communication/test_meeting_integration.py
  • tests/unit/communication/meeting/__init__.py
  • tests/unit/communication/meeting/conftest.py
  • tests/unit/communication/meeting/test_config.py
  • tests/unit/communication/meeting/test_enums.py
  • tests/unit/communication/meeting/test_errors.py
  • tests/unit/communication/meeting/test_models.py
  • tests/unit/communication/meeting/test_orchestrator.py
  • tests/unit/communication/meeting/test_position_papers.py
  • tests/unit/communication/meeting/test_protocol.py
  • tests/unit/communication/meeting/test_round_robin.py
  • tests/unit/communication/meeting/test_structured_phases.py
  • tests/unit/communication/test_enums.py
  • tests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations in Python 3.14+ code — Python 3.14 has native PEP 649 lazy annotations
Use except A, B: (no parentheses) for exception syntax in Python 3.14 — PEP 758 except syntax enforced by ruff
All public functions and classes must have type hints; mypy strict mode is enforced
All public classes and functions must have Google-style docstrings; ruff D rules enforce this
Create new objects instead of mutating existing ones; use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement of non-Pydantic internal collections (registries, BaseTool)
For non-Pydantic internal collections and dict/list fields in frozen Pydantic models, use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves; never mix static config fields with mutable runtime fields
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values instead of storing redundant fields
Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow errors
Validate at system boundaries (user input, external APIs, config files)
Line length must be 88 characters (enforced by ruff)

Files:

  • tests/unit/communication/test_enums.py
  • src/ai_company/communication/meeting/_prompts.py
  • src/ai_company/communication/meeting/enums.py
  • tests/unit/communication/meeting/test_position_papers.py
  • tests/unit/communication/meeting/test_round_robin.py
  • src/ai_company/communication/enums.py
  • tests/unit/communication/meeting/test_protocol.py
  • src/ai_company/communication/config.py
  • src/ai_company/communication/meeting/round_robin.py
  • tests/unit/communication/meeting/conftest.py
  • tests/unit/communication/meeting/test_errors.py
  • tests/integration/communication/test_meeting_integration.py
  • tests/unit/communication/meeting/test_structured_phases.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • tests/unit/communication/meeting/test_config.py
  • src/ai_company/communication/meeting/position_papers.py
  • src/ai_company/communication/meeting/protocol.py
  • src/ai_company/communication/meeting/__init__.py
  • src/ai_company/communication/meeting/errors.py
  • src/ai_company/communication/meeting/config.py
  • src/ai_company/observability/events/meeting.py
  • tests/unit/communication/meeting/test_enums.py
  • src/ai_company/communication/meeting/orchestrator.py
  • tests/unit/communication/meeting/test_models.py
  • src/ai_company/communication/meeting/models.py
  • src/ai_company/communication/__init__.py
  • tests/unit/communication/meeting/test_orchestrator.py
  • src/ai_company/communication/meeting/structured_phases.py
  • tests/unit/observability/test_events.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow as appropriate
Prefer @pytest.mark.parametrize for testing similar cases
Use vendor-agnostic test names: test-provider, test-small-001, etc. instead of real vendor names

Files:

  • tests/unit/communication/test_enums.py
  • tests/unit/communication/meeting/test_position_papers.py
  • tests/unit/communication/meeting/test_round_robin.py
  • tests/unit/communication/meeting/test_protocol.py
  • tests/unit/communication/meeting/conftest.py
  • tests/unit/communication/meeting/test_errors.py
  • tests/integration/communication/test_meeting_integration.py
  • tests/unit/communication/meeting/test_structured_phases.py
  • tests/unit/communication/meeting/test_config.py
  • tests/unit/communication/meeting/test_enums.py
  • tests/unit/communication/meeting/test_models.py
  • tests/unit/communication/meeting/test_orchestrator.py
  • tests/unit/observability/test_events.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic must import from ai_company.observability import get_logger and create logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code
Always use variable name logger (not _logger, not log) for the logger instance
Always use constants from domain-specific modules under ai_company.observability.events for event names (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget); import directly
Always use structured logging with kwargs: logger.info(EVENT, key=value) — never use formatted strings like logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging is for object creation, internal flow, and entry/exit of key functions

Files:

  • src/ai_company/communication/meeting/_prompts.py
  • src/ai_company/communication/meeting/enums.py
  • src/ai_company/communication/enums.py
  • src/ai_company/communication/config.py
  • src/ai_company/communication/meeting/round_robin.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • src/ai_company/communication/meeting/position_papers.py
  • src/ai_company/communication/meeting/protocol.py
  • src/ai_company/communication/meeting/__init__.py
  • src/ai_company/communication/meeting/errors.py
  • src/ai_company/communication/meeting/config.py
  • src/ai_company/observability/events/meeting.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/communication/meeting/models.py
  • src/ai_company/communication/__init__.py
  • src/ai_company/communication/meeting/structured_phases.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases

Files:

  • src/ai_company/communication/meeting/_prompts.py
  • src/ai_company/communication/meeting/enums.py
  • src/ai_company/communication/enums.py
  • src/ai_company/communication/config.py
  • src/ai_company/communication/meeting/round_robin.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • src/ai_company/communication/meeting/position_papers.py
  • src/ai_company/communication/meeting/protocol.py
  • src/ai_company/communication/meeting/__init__.py
  • src/ai_company/communication/meeting/errors.py
  • src/ai_company/communication/meeting/config.py
  • src/ai_company/observability/events/meeting.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/communication/meeting/models.py
  • src/ai_company/communication/__init__.py
  • src/ai_company/communication/meeting/structured_phases.py
🧠 Learnings (9)
📚 Learning: 2026-03-07T19:34:16.191Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T19:34:16.191Z
Learning: Applies to src/ai_company/**/*.py : Every module with business logic must import `from ai_company.observability import get_logger` and create `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
  • src/ai_company/observability/events/meeting.py
📚 Learning: 2026-03-07T19:34:16.191Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T19:34:16.191Z
Learning: Applies to src/ai_company/**/*.py : Always use constants from domain-specific modules under ai_company.observability.events for event names (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget); import directly

Applied to files:

  • CLAUDE.md
  • src/ai_company/observability/events/meeting.py
  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-07T19:34:16.191Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T19:34:16.191Z
Learning: Applies to src/ai_company/**/*.py : Always use structured logging with kwargs: `logger.info(EVENT, key=value)` — never use formatted strings like `logger.info("msg %s", val)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-07T19:34:16.191Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T19:34:16.191Z
Learning: Applies to src/ai_company/**/*.py : Never use `import logging` / `logging.getLogger()` / `print()` in application code

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-07T19:34:16.191Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T19:34:16.191Z
Learning: Applies to src/ai_company/**/*.py : Always use variable name `logger` (not `_logger`, not `log`) for the logger instance

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-07T19:34:16.191Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T19:34:16.191Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-07T19:34:16.191Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T19:34:16.191Z
Learning: Applies to src/ai_company/**/*.py : All state transitions must log at INFO level

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-07T19:34:16.191Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T19:34:16.191Z
Learning: Applies to src/ai_company/**/*.py : DEBUG logging is for object creation, internal flow, and entry/exit of key functions

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-07T19:34:16.191Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-07T19:34:16.191Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves; never mix static config fields with mutable runtime fields

Applied to files:

  • src/ai_company/communication/meeting/config.py
🧬 Code graph analysis (18)
tests/unit/communication/test_enums.py (1)
src/ai_company/communication/enums.py (1)
  • MessageType (6-20)
src/ai_company/communication/meeting/_prompts.py (3)
src/ai_company/communication/meeting/models.py (1)
  • MeetingAgenda (77-96)
tests/integration/communication/test_meeting_integration.py (1)
  • agenda (98-114)
src/ai_company/tools/base.py (1)
  • description (115-117)
tests/unit/communication/meeting/test_round_robin.py (2)
src/ai_company/communication/meeting/errors.py (1)
  • MeetingBudgetExhaustedError (14-15)
src/ai_company/communication/meeting/round_robin.py (2)
  • get_protocol_type (94-96)
  • run (98-196)
tests/unit/communication/meeting/test_protocol.py (3)
src/ai_company/communication/meeting/enums.py (1)
  • MeetingProtocolType (6-17)
src/ai_company/communication/meeting/models.py (2)
  • MeetingAgenda (77-96)
  • MeetingMinutes (153-235)
tests/unit/communication/meeting/conftest.py (3)
  • meeting_id (114-116)
  • leader_id (102-104)
  • participant_ids (108-110)
src/ai_company/communication/config.py (1)
src/ai_company/communication/meeting/config.py (1)
  • MeetingProtocolConfig (81-117)
src/ai_company/communication/meeting/round_robin.py (6)
src/ai_company/communication/meeting/_prompts.py (1)
  • build_agenda_prompt (9-28)
src/ai_company/communication/meeting/_token_tracker.py (5)
  • TokenTracker (7-38)
  • is_exhausted (31-33)
  • used (21-23)
  • record (35-38)
  • remaining (26-28)
src/ai_company/communication/meeting/config.py (1)
  • RoundRobinConfig (9-33)
src/ai_company/communication/meeting/enums.py (2)
  • MeetingPhase (20-46)
  • MeetingProtocolType (6-17)
src/ai_company/communication/meeting/errors.py (1)
  • MeetingBudgetExhaustedError (14-15)
src/ai_company/communication/meeting/models.py (4)
  • MeetingAgenda (77-96)
  • MeetingContribution (99-128)
  • MeetingMinutes (153-235)
  • total_tokens (216-218)
tests/unit/communication/meeting/conftest.py (2)
src/ai_company/communication/meeting/config.py (4)
  • MeetingProtocolConfig (81-117)
  • PositionPapersConfig (36-55)
  • RoundRobinConfig (9-33)
  • StructuredPhasesConfig (58-78)
src/ai_company/communication/meeting/models.py (3)
  • AgentResponse (23-52)
  • MeetingAgenda (77-96)
  • MeetingAgendaItem (55-74)
tests/unit/communication/meeting/test_errors.py (2)
src/ai_company/communication/errors.py (1)
  • CommunicationError (12-44)
src/ai_company/communication/meeting/errors.py (5)
  • MeetingAgentError (26-27)
  • MeetingBudgetExhaustedError (14-15)
  • MeetingError (10-11)
  • MeetingParticipantError (22-23)
  • MeetingProtocolNotFoundError (18-19)
tests/integration/communication/test_meeting_integration.py (12)
src/ai_company/communication/meeting/config.py (4)
  • MeetingProtocolConfig (81-117)
  • PositionPapersConfig (36-55)
  • RoundRobinConfig (9-33)
  • StructuredPhasesConfig (58-78)
src/ai_company/communication/meeting/enums.py (2)
  • MeetingProtocolType (6-17)
  • MeetingStatus (49-72)
src/ai_company/communication/meeting/models.py (3)
  • AgentResponse (23-52)
  • MeetingAgenda (77-96)
  • MeetingAgendaItem (55-74)
src/ai_company/communication/meeting/orchestrator.py (3)
  • MeetingOrchestrator (58-405)
  • run_meeting (90-168)
  • get_records (170-176)
src/ai_company/communication/meeting/position_papers.py (1)
  • PositionPapersProtocol (71-363)
src/ai_company/communication/meeting/protocol.py (1)
  • MeetingProtocol (39-78)
src/ai_company/communication/meeting/round_robin.py (1)
  • RoundRobinProtocol (79-415)
src/ai_company/communication/meeting/structured_phases.py (1)
  • StructuredPhasesProtocol (120-620)
src/ai_company/tools/base.py (1)
  • description (115-117)
src/ai_company/communication/config.py (1)
  • MeetingTypeConfig (79-133)
src/ai_company/communication/meeting/errors.py (1)
  • MeetingError (10-11)
src/ai_company/communication/enums.py (1)
  • MessageType (6-20)
tests/unit/communication/meeting/test_config.py (2)
src/ai_company/communication/meeting/config.py (4)
  • MeetingProtocolConfig (81-117)
  • PositionPapersConfig (36-55)
  • RoundRobinConfig (9-33)
  • StructuredPhasesConfig (58-78)
src/ai_company/communication/meeting/enums.py (1)
  • MeetingProtocolType (6-17)
src/ai_company/communication/meeting/position_papers.py (7)
src/ai_company/communication/meeting/_prompts.py (1)
  • build_agenda_prompt (9-28)
src/ai_company/communication/meeting/_token_tracker.py (5)
  • TokenTracker (7-38)
  • used (21-23)
  • remaining (26-28)
  • record (35-38)
  • is_exhausted (31-33)
src/ai_company/communication/meeting/config.py (1)
  • PositionPapersConfig (36-55)
src/ai_company/communication/meeting/enums.py (2)
  • MeetingPhase (20-46)
  • MeetingProtocolType (6-17)
src/ai_company/communication/meeting/errors.py (1)
  • MeetingBudgetExhaustedError (14-15)
src/ai_company/communication/meeting/models.py (4)
  • MeetingAgenda (77-96)
  • MeetingContribution (99-128)
  • MeetingMinutes (153-235)
  • total_tokens (216-218)
src/ai_company/communication/meeting/protocol.py (1)
  • run (47-70)
src/ai_company/communication/meeting/protocol.py (3)
src/ai_company/communication/meeting/enums.py (1)
  • MeetingProtocolType (6-17)
src/ai_company/communication/meeting/models.py (3)
  • AgentResponse (23-52)
  • MeetingAgenda (77-96)
  • MeetingMinutes (153-235)
src/ai_company/core/enums.py (1)
  • Priority (162-168)
src/ai_company/communication/meeting/errors.py (1)
src/ai_company/communication/errors.py (1)
  • CommunicationError (12-44)
src/ai_company/communication/meeting/config.py (1)
src/ai_company/communication/meeting/enums.py (1)
  • MeetingProtocolType (6-17)
tests/unit/communication/meeting/test_enums.py (2)
src/ai_company/communication/meeting/enums.py (3)
  • MeetingPhase (20-46)
  • MeetingProtocolType (6-17)
  • MeetingStatus (49-72)
tests/unit/communication/test_enums.py (6)
  • test_member_count (19-20)
  • test_member_count (38-39)
  • test_member_count (56-57)
  • test_member_count (67-68)
  • test_member_count (78-79)
  • test_member_count (99-100)
tests/unit/communication/meeting/test_models.py (1)
src/ai_company/communication/meeting/models.py (8)
  • ActionItem (131-150)
  • AgentResponse (23-52)
  • MeetingAgenda (77-96)
  • MeetingAgendaItem (55-74)
  • MeetingContribution (99-128)
  • MeetingMinutes (153-235)
  • MeetingRecord (238-290)
  • total_tokens (216-218)
src/ai_company/communication/meeting/models.py (3)
src/ai_company/communication/meeting/enums.py (3)
  • MeetingPhase (20-46)
  • MeetingProtocolType (6-17)
  • MeetingStatus (49-72)
src/ai_company/core/enums.py (1)
  • Priority (162-168)
src/ai_company/core/types.py (1)
  • validate_unique_strings (25-37)
src/ai_company/communication/meeting/structured_phases.py (7)
src/ai_company/communication/meeting/_prompts.py (1)
  • build_agenda_prompt (9-28)
src/ai_company/communication/meeting/_token_tracker.py (5)
  • TokenTracker (7-38)
  • is_exhausted (31-33)
  • used (21-23)
  • remaining (26-28)
  • record (35-38)
src/ai_company/communication/meeting/config.py (1)
  • StructuredPhasesConfig (58-78)
src/ai_company/communication/meeting/enums.py (2)
  • MeetingPhase (20-46)
  • MeetingProtocolType (6-17)
src/ai_company/communication/meeting/errors.py (1)
  • MeetingBudgetExhaustedError (14-15)
src/ai_company/communication/meeting/models.py (4)
  • MeetingAgenda (77-96)
  • MeetingContribution (99-128)
  • MeetingMinutes (153-235)
  • total_tokens (216-218)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
🔇 Additional comments (1)
src/ai_company/observability/events/meeting.py (1)

5-40: Good centralization of meeting event names.

Keeping the meeting telemetry surface in its own events module will make the new protocols and orchestrator logs consistent and easy to query.

Based on learnings, "Always use constants from domain-specific modules under ai_company.observability.events for event names (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget)".

DESIGN_SPEC.md Outdated
Meetings (§5.1 Pattern 3) follow configurable protocols that determine how agents interact during structured multi-agent conversations. Different meeting types naturally suit different protocols. All protocols implement a `MeetingProtocol` protocol, making the system extensible — new protocols can be registered and selected per meeting type. Cost bounds are enforced by `duration_tokens` in meeting config (§5.4).

> **MVP: Not in M3.** Meetings are an M4 feature. Round-Robin (Protocol 1) is the initial default.
> **Current state (M4 complete):** All 3 meeting protocols are implemented in `communication/meeting/`: `RoundRobinProtocol`, `PositionPapersProtocol`, and `StructuredPhasesProtocol`. The `MeetingOrchestrator` runs meetings end-to-end with token budget enforcement via `TokenBudgetTracker`. All protocols implement the `MeetingProtocol` protocol interface.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the spec to the current meeting symbol names.

These sections still refer to TokenBudgetTracker and AgendaItem, but the implemented symbols are TokenTracker and MeetingAgendaItem. Leaving the old names here sends readers to non-existent APIs.

Also applies to: 2421-2425

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` at line 653, Update the spec text to use the current symbol
names: replace references to TokenBudgetTracker with TokenTracker and AgendaItem
with MeetingAgendaItem wherever they appear (including the section currently
mentioning all 3 meeting protocols and the noted repeat at lines ~2421-2425);
ensure any inline code formatting or API references match the implemented
symbols RoundRobinProtocol, PositionPapersProtocol, StructuredPhasesProtocol,
MeetingOrchestrator, and TokenTracker so readers are directed to existing APIs.

Comment on lines +113 to +116
protocol_config: MeetingProtocolConfig = Field(
default_factory=MeetingProtocolConfig,
description="Meeting protocol configuration",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update MeetingTypeConfig docs for the new public field.

protocol_config is now part of the public model, but the MeetingTypeConfig docstring still stops at duration_tokens. Please document the new attribute there so generated docs and IDE help stay accurate. As per coding guidelines, "All public classes and functions must have Google-style docstrings; ruff D rules enforce this".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/config.py` around lines 113 - 116, Update the
Google-style docstring for the public class MeetingTypeConfig to include the new
public attribute `protocol_config` (type MeetingProtocolConfig) and a short
description such as "Meeting protocol configuration" so generated docs and IDE
hints reflect the new field; ensure the docstring follows the existing
Google-style sections (Args/Attributes) and mentions `duration_tokens` and the
new `protocol_config` entry consistently with other attributes.

Comment on lines +3 to +9
from typing import TYPE_CHECKING

if TYPE_CHECKING:
from ai_company.communication.meeting.models import MeetingAgenda


def build_agenda_prompt(agenda: MeetingAgenda) -> str:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In Python 3.14 with PEP 649 lazy annotations, what happens when a function annotation references a type imported only inside if TYPE_CHECKING:andtyping.get_type_hints() evaluates that annotation?

💡 Result:

With PEP 649 lazy annotations (Python 3.14+), the annotation expression isn’t evaluated at function definition time, so referencing a name that’s only imported under if TYPE_CHECKING: is “harmless” until you actually introspect/evaluate the annotations. However, typing.get_type_hints() does try to resolve/evaluate annotations, and if a referenced forward name can’t be resolved (because the TYPE_CHECKING import never ran at runtime), it raises an exception such as NameError at the moment get_type_hints() evaluates that annotation. [1]

This matches the general PEP 649/749 model: evaluation is deferred, but when the deferred evaluation runs, missing names can still produce runtime errors like NameError. [2]

Implication: if you want get_type_hints() to succeed, you must ensure the referenced type is available in the evaluation namespace (e.g., do a real runtime import, or pass an appropriate globalns/localns), or use annotationlib.get_annotations(..., format=Format.FORWARDREF or Format.STRING) when you need to inspect annotations that may contain undefined symbols without failing. [1]

Sources:
[1] Python 3.14 typing docs for get_type_hints() / TYPE_CHECKING behavior and the NameError warning for unresolvable forward references (https://docs.python.org/bn-in/3.14/library/typing.html)
[2] PEP 749 (implementation notes/examples showing deferred evaluation can still raise NameError when evaluated) (https://peps.python.org/pep-0749/)


🏁 Script executed:

# Check if there's a circular import between models.py and _prompts.py
cd src/ai_company/communication/meeting

echo "=== Checking models.py imports ==="
head -20 models.py | grep -E "^(import|from)" || echo "No imports in first 20 lines"

echo -e "\n=== Checking _prompts.py imports ==="
head -20 _prompts.py | grep -E "^(import|from)" || echo "No imports in first 20 lines"

echo -e "\n=== Checking if models.py imports from _prompts.py ==="
grep -n "_prompts" models.py || echo "No imports from _prompts in models.py"

Repository: Aureliolo/ai-company

Length of output: 477


🏁 Script executed:

# Read full _prompts.py to see the complete function implementation
cat -n src/ai_company/communication/meeting/_prompts.py

Repository: Aureliolo/ai-company

Length of output: 1097


🏁 Script executed:

# Check MeetingAgenda and MeetingAgendaItem definitions in models.py
cat -n src/ai_company/communication/meeting/models.py | head -100

Repository: Aureliolo/ai-company

Length of output: 3382


🏁 Script executed:

# Find usages of build_agenda_prompt
rg "build_agenda_prompt" --type py -B 2 -A 2

Repository: Aureliolo/ai-company

Length of output: 3462


🏁 Script executed:

# Check for test files
find src/ai_company/communication/meeting -name "*test*" -o -name "test_*"

Repository: Aureliolo/ai-company

Length of output: 46


🏁 Script executed:

# Check if there's documentation about what presenter_id should do
grep -rn "presenter_id" src/ai_company/communication/meeting/ --type py

Repository: Aureliolo/ai-company

Length of output: 181


🏁 Script executed:

# Search for presenter_id usage in meeting module
rg "presenter_id" src/ai_company/communication/meeting/

Repository: Aureliolo/ai-company

Length of output: 266


🏁 Script executed:

# Check if there are any test files for the prompts
find . -path "*/test*" -name "*prompt*" -o -path "*/test*" -name "*meeting*" | grep -E "(test|spec)" | head -20

Repository: Aureliolo/ai-company

Length of output: 239


🏁 Script executed:

# Check the docstring and usage context more broadly
rg "presenter" src/ai_company/communication/meeting/ -B 2 -A 2

Repository: Aureliolo/ai-company

Length of output: 888


🏁 Script executed:

# Check unit tests for meeting module to see what's expected
ls -la tests/unit/communication/meeting/

Repository: Aureliolo/ai-company

Length of output: 918


🏁 Script executed:

# Look for tests specifically for _prompts.py
cat tests/unit/communication/meeting/*prompt* 2>/dev/null || echo "No prompt tests found, checking all tests"
ls tests/unit/communication/meeting/

Repository: Aureliolo/ai-company

Length of output: 296


🏁 Script executed:

# Check integration test for meeting to see how presenter_id is used
cat tests/integration/communication/test_meeting_integration.py | head -100

Repository: Aureliolo/ai-company

Length of output: 2998


Use a real import for MeetingAgenda here.

build_agenda_prompt() exposes MeetingAgenda in a public annotation, but the name only exists under TYPE_CHECKING. With Python 3.14 lazy annotation evaluation, runtime consumers such as typing.get_type_hints() will raise NameError when trying to resolve the annotation at runtime, since the TYPE_CHECKING block never executes. No circular dependency exists between these modules, so a real import is safe.

Safer import pattern
-from typing import TYPE_CHECKING
-
-if TYPE_CHECKING:
-    from ai_company.communication.meeting.models import MeetingAgenda
+from ai_company.communication.meeting.models import MeetingAgenda
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from ai_company.communication.meeting.models import MeetingAgenda
def build_agenda_prompt(agenda: MeetingAgenda) -> str:
from ai_company.communication.meeting.models import MeetingAgenda
def build_agenda_prompt(agenda: MeetingAgenda) -> str:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/_prompts.py` around lines 3 - 9, The
public annotation for MeetingAgenda in build_agenda_prompt is only available
under TYPE_CHECKING which causes runtime resolution errors; replace the
TYPE_CHECKING-only import with a real import of MeetingAgenda from
ai_company.communication.meeting.models at module level (remove the conditional
block or add a normal import alongside it) so that build_agenda_prompt can be
resolved at runtime by typing.get_type_hints() and other consumers.

Comment on lines +23 to +27
for i, item in enumerate(agenda.items, 1):
entry = f" {i}. {item.title}"
if item.description:
entry += f" — {item.description}"
parts.append(entry)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't drop presenter_id from the shared agenda prompt.

MeetingAgendaItem includes an optional presenter assignment in src/ai_company/communication/meeting/models.py, but this formatter strips it. Any protocol that uses this helper loses agenda ownership and can prompt the wrong meeting flow for presenter-specific items.

📝 Suggested formatter fix
         for i, item in enumerate(agenda.items, 1):
             entry = f"  {i}. {item.title}"
+            if item.presenter_id:
+                entry += f" (presenter: {item.presenter_id})"
             if item.description:
                 entry += f" — {item.description}"
             parts.append(entry)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/_prompts.py` around lines 23 - 27, The
agenda item formatter in _prompts.py currently drops
MeetingAgendaItem.presenter_id when building entries from agenda.items; update
the loop that builds entry (for i, item in enumerate(agenda.items, 1)) to
include the optional presenter assignment when present (e.g., append a presenter
indicator using item.presenter_id or resolved presenter name) so presenter
ownership is preserved in the shared agenda prompt and any downstream protocol
that consumes MeetingAgendaItem can route presenter-specific flows correctly.

Comment on lines +6 to +38
@dataclasses.dataclass
class TokenTracker:
"""Mutable token budget tracker scoped to a single meeting execution.

Attributes:
budget: Total token budget for the meeting.
input_tokens: Total prompt tokens consumed so far.
output_tokens: Total response tokens generated so far.
"""

budget: int
input_tokens: int = 0
output_tokens: int = 0

@property
def used(self) -> int:
"""Total tokens consumed so far."""
return self.input_tokens + self.output_tokens

@property
def remaining(self) -> int:
"""Tokens remaining in the budget."""
return max(0, self.budget - self.used)

@property
def is_exhausted(self) -> bool:
"""Whether the budget is fully consumed."""
return self.remaining == 0

def record(self, input_tokens: int, output_tokens: int) -> None:
"""Record token usage from an agent call."""
self.input_tokens += input_tokens
self.output_tokens += output_tokens
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Prefer copy-on-write state over mutating TokenTracker in place.

This helper introduces shared mutable state via a dataclass and record(), which goes against the repo’s runtime-state convention and makes retries or parallel branches harder to reason about. Returning an updated tracker, or switching to a frozen Pydantic model plus model_copy(update=...), would keep budget accounting consistent with the rest of the codebase.

As per coding guidelines, "Create new objects instead of mutating existing ones" and "Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves; never mix static config fields with mutable runtime fields".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/_token_tracker.py` around lines 6 - 38,
TokenTracker currently mutates shared state via record(), violating the repo
convention of copy-on-write runtime state; replace in-place mutation by making
TokenTracker immutable and having record return a new instance with updated
input_tokens/output_tokens (or convert to a frozen Pydantic model and use
model_copy(update=...)). Update the TokenTracker dataclass to be frozen (or use
Pydantic BaseModel frozen) and change the record method signature to return
TokenTracker with the new totals; keep the used, remaining, and is_exhausted
properties as-is but computed from the new instance to preserve behavior. Ensure
all call sites that previously relied on in-place updates now reassign the
returned TokenTracker from record(...) to propagate state.

Comment on lines +60 to +118
async def test_contributions_structure(
self,
simple_agenda: MeetingAgenda,
leader_id: str,
participant_ids: tuple[str, ...],
meeting_id: str,
) -> None:
caller = make_mock_agent_caller()
protocol = PositionPapersProtocol(config=PositionPapersConfig())

minutes = await protocol.run(
meeting_id=meeting_id,
agenda=simple_agenda,
leader_id=leader_id,
participant_ids=participant_ids,
agent_caller=caller,
token_budget=10000,
)

# 3 position papers + 1 synthesis = 4 contributions
assert len(minutes.contributions) == 4

# Position papers
papers = [
c for c in minutes.contributions if c.phase == MeetingPhase.POSITION_PAPER
]
assert len(papers) == 3

# Synthesis
synthesis = [
c for c in minutes.contributions if c.phase == MeetingPhase.SYNTHESIS
]
assert len(synthesis) == 1

async def test_parallel_execution_all_participants(
self,
simple_agenda: MeetingAgenda,
leader_id: str,
participant_ids: tuple[str, ...],
meeting_id: str,
) -> None:
caller = make_mock_agent_caller()
protocol = PositionPapersProtocol(config=PositionPapersConfig())

minutes = await protocol.run(
meeting_id=meeting_id,
agenda=simple_agenda,
leader_id=leader_id,
participant_ids=participant_ids,
agent_caller=caller,
token_budget=10000,
)

paper_agents = {
c.agent_id
for c in minutes.contributions
if c.phase == MeetingPhase.POSITION_PAPER
}
assert paper_agents == set(participant_ids)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Pin the deterministic ordering contract in the parallel-path tests.

These assertions only check counts and set membership. If minutes.contributions regresses back to nondeterministic ordering, this file will still stay green. Please add an ordered assertion on turn_number or the expected participant sequence to lock in the regression fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/communication/meeting/test_position_papers.py` around lines 60 -
118, The tests test_contributions_structure and
test_parallel_execution_all_participants currently only assert counts and set
membership which won't catch nondeterministic ordering regressions; update them
to assert a deterministic order by inspecting minutes.contributions (filtering
by MeetingPhase.POSITION_PAPER) and then check the sequence of each
contribution's turn_number or agent_id matches the expected participant_ids
order (or a sorted/explicit expected sequence), e.g., build the ordered list
from [c for c in minutes.contributions if c.phase ==
MeetingPhase.POSITION_PAPER] and assert their turn_number or agent_id sequence
equals the expected ordering to lock in the deterministic contract.

Comment on lines +220 to +247
async def test_budget_exhaustion_stops_turns(
self,
simple_agenda: MeetingAgenda,
leader_id: str,
meeting_id: str,
) -> None:
# Each call uses 30 tokens, budget is 50 (20% reserve = 40 discussion).
# agent-a: 30 used (< 40), agent-b: 60 used (>= 40, stops after).
# Budget check is pre-turn, so the call that crosses is completed.
# With leader_summarizes=True (default), budget exhaustion raises
# MeetingBudgetExhaustedError when summary cannot be generated.
caller = make_mock_agent_caller(input_tokens=10, output_tokens=20)
config = RoundRobinConfig(
max_turns_per_agent=5,
max_total_turns=100,
)
protocol = RoundRobinProtocol(config=config)
participants = ("agent-a", "agent-b", "agent-c")

with pytest.raises(MeetingBudgetExhaustedError, match="budget exhausted"):
await protocol.run(
meeting_id=meeting_id,
agenda=simple_agenda,
leader_id=leader_id,
participant_ids=participants,
agent_caller=caller,
token_budget=50,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These tests lock in over-budget execution as valid behavior.

With token_budget=50 and a 30-token mock response, both cases expect the second turn to complete even though the meeting has already exceeded its declared budget. That contradicts the duration-token bound in the PR objective and will keep budget-overrun bugs green.

Also applies to: 249-280

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/communication/meeting/test_round_robin.py` around lines 220 - 247,
The test currently treats a call that would push total tokens past token_budget
as valid; update the test (test_budget_exhaustion_stops_turns and the similar
block at 249-280) so the RoundRobinProtocol enforces the budget pre-turn: invoke
protocol.run with make_mock_agent_caller and token_budget=50 but assert that the
protocol either raises MeetingBudgetExhaustedError before executing the turn
that would cross the budget or verify the mock caller was not invoked for that
crossing turn; specifically reference RoundRobinProtocol.run,
make_mock_agent_caller, MeetingBudgetExhaustedError and the token_budget
argument when changing the expectation so the test fails if an over-budget turn
is allowed to complete.

Comment on lines +105 to +112
# Leader response says no conflicts
responses = {
"leader-agent": [
"CONFLICTS: NO\nAll participants agree.",
"Final synthesis with decisions.",
],
}
caller = make_mock_agent_caller(responses=responses)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use the injected leader_id instead of hardcoding "leader-agent".

These mocked response maps are keyed to a literal leader ID even though the test already receives leader_id. If that fixture changes or becomes parameterized, the custom responses stop matching and these tests fail for the wrong reason.

🧪 Proposed fix
-        responses = {
-            "leader-agent": [
+        responses = {
+            leader_id: [
                 "CONFLICTS: NO\nAll participants agree.",
                 "Final synthesis with decisions.",
             ],
         }

Apply the same replacement to the other response dictionaries in this file.

Also applies to: 143-149, 178-184, 215-221, 248-255, 330-333

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/communication/meeting/test_structured_phases.py` around lines 105
- 112, The test mocks use a hardcoded key "leader-agent" in the responses
mapping which breaks if the fixture leader_id changes; update each responses
dict (e.g., the one passed to make_mock_agent_caller in
test_structured_phases.py) to use the injected leader_id variable as the key
instead of the literal string, and apply the same change to the other response
dictionaries in this file (the blocks at the other ranges noted) so mocks match
the fixture-provided leader_id.

Comment on lines 19 to +20
def test_member_count(self) -> None:
assert len(MessageType) == 8
assert len(MessageType) == 9
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the new enum member directly.

Line 20 updates the count, but test_values() still never checks MessageType.MEETING_CONTRIBUTION. This can pass even if the new public value regresses.

🔎 Suggested test update
     def test_values(self) -> None:
         assert MessageType.TASK_UPDATE.value == "task_update"
         assert MessageType.QUESTION.value == "question"
         assert MessageType.ANNOUNCEMENT.value == "announcement"
         assert MessageType.REVIEW_REQUEST.value == "review_request"
         assert MessageType.APPROVAL.value == "approval"
         assert MessageType.DELEGATION.value == "delegation"
         assert MessageType.STATUS_REPORT.value == "status_report"
         assert MessageType.ESCALATION.value == "escalation"
+        assert MessageType.MEETING_CONTRIBUTION.value == "meeting_contribution"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/communication/test_enums.py` around lines 19 - 20, Update the test
to assert the new enum member directly instead of relying only on the enum
length: in tests/unit/communication/test_enums.py modify the test_values() (or
add to test_member_count) to include an explicit assertion that
MessageType.MEETING_CONTRIBUTION exists (e.g., assert
MessageType.MEETING_CONTRIBUTION in MessageType or assert hasattr(MessageType,
"MEETING_CONTRIBUTION")) so the test fails if that public enum value is removed
or renamed.

- Add ConflictDetector protocol + KeywordConflictDetector (pluggable strategy)
- Refactor structured_phases.py for immutability (sub-methods return contributions)
- Add inverse status constraints on MeetingRecord
- Add TokenTracker validation (positive budget, non-negative tokens)
- Fix MeetingMinutes participant validators (duplicate IDs, leader-in-participants)
- Add _format_exception for nested ExceptionGroup handling
- Fix orchestrator budget exhaustion logging and task creation error handling
- Update DESIGN_SPEC.md class names (TokenTracker, MeetingProtocolConfig)
- Add test_token_tracker.py, test_prompts.py, and expand all test files
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (17)
tests/unit/communication/meeting/test_position_papers.py (1)

63-121: ⚠️ Potential issue | 🟡 Minor

Assert the deterministic POSITION_PAPER order.

These tests still pass if minutes.contributions regresses back to nondeterministic ordering, because they only check counts and set membership. Add an ordered assertion over the paper contributions (agent_id and/or turn_number) to pin the regression fix.

🧪 Suggested assertions
         papers = [
             c for c in minutes.contributions if c.phase == MeetingPhase.POSITION_PAPER
         ]
         assert len(papers) == 3
+        assert [c.agent_id for c in papers] == list(participant_ids)
+        assert [c.turn_number for c in papers] == list(range(len(participant_ids)))
...
         assert paper_agents == set(participant_ids)
+        assert [
+            c.agent_id
+            for c in minutes.contributions
+            if c.phase == MeetingPhase.POSITION_PAPER
+        ] == list(participant_ids)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/communication/meeting/test_position_papers.py` around lines 63 -
121, The tests currently only assert counts and set membership, which allows
nondeterministic ordering to pass; update the two tests
test_contributions_structure and test_parallel_execution_all_participants to
assert a deterministic ordering for POSITION_PAPER contributions by extracting
the list of contributions where c.phase == MeetingPhase.POSITION_PAPER from
minutes.contributions and then assert their order (e.g., by comparing the
sequence of c.agent_id or c.turn_number to the expected order derived from
participant_ids or expected turn numbers). Ensure you reference the
minutes.contributions list, MeetingPhase.POSITION_PAPER, and the contribution
fields agent_id and/or turn_number when implementing the ordered assertion so
the regression to nondeterministic ordering is pinned.
src/ai_company/communication/meeting/position_papers.py (2)

156-168: ⚠️ Potential issue | 🟠 Major

Return structured decisions/action items, not only summary.

MeetingMinutes exposes decisions and action_items, and auto_create_tasks depends on them. This return path only copies synthesis_contribution.content, so successful position-papers meetings still emit empty structured outputs and won't create tasks. Given MeetingContribution is free-text only, this likely needs a parser/helper here or a richer synthesis result type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/position_papers.py` around lines 156 -
168, The return currently only sets MeetingMinutes.summary from
synthesis_contribution.content, leaving MeetingMinutes.decisions and
.action_items empty; update the position-papers finalization in the function
that builds MeetingMinutes (the block returning MeetingMinutes with
protocol_type=MeetingProtocolType.POSITION_PAPERS) to parse/derive structured
decisions and action items from the synthesis_contribution (or from combined
contributions) and populate MeetingMinutes.decisions and
MeetingMinutes.action_items accordingly so auto_create_tasks can consume them;
you can add a small helper (e.g., parse_decisions_and_actions_from_text) to
extract items from synthesis_contribution.content or change the synthesis step
to return a richer object and map its fields into MeetingMinutes.decisions and
.action_items.

203-205: ⚠️ Potential issue | 🟠 Major

Don't overcommit the paper-phase budget.

max(1, tracker.remaining // n) guarantees every worker at least one token. When tracker.remaining < n, the slices already sum to more than the meeting budget before any agent call starts.

💸 Suggested guard
-        tokens_per_agent = max(1, tracker.remaining // max(1, n))
+        if n > 0 and tracker.remaining < n:
+            msg = "Token budget exhausted before collecting position papers"
+            raise MeetingBudgetExhaustedError(
+                msg,
+                context={
+                    "meeting_id": meeting_id,
+                    "budget": tracker.budget,
+                    "used": tracker.used,
+                    "participant_count": n,
+                },
+            )
+        tokens_per_agent = tracker.remaining // max(1, n)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/position_papers.py` around lines 203 -
205, The current pre-division line tokens_per_agent = max(1, tracker.remaining
// max(1, n)) can overcommit when tracker.remaining < n; instead compute a base
and remainder and distribute without exceeding tracker.remaining: set base =
tracker.remaining // n and rem = tracker.remaining % n, then assign each agent
tokens = base + (1 if agent_index < rem else 0) (this yields some agents 0 when
tracker.remaining < n and guarantees the total sum <= tracker.remaining). Update
the logic around tokens_per_agent and any loop that consumes it (referencing
tokens_per_agent, tracker.remaining and n) to use this distribution scheme so no
more tokens are allocated than available.
DESIGN_SPEC.md (1)

2418-2425: ⚠️ Potential issue | 🟡 Minor

Use the current model name in the project tree.

AgendaItem is stale here; the implemented symbol is MeetingAgendaItem. Leaving the old name in §15.3 sends readers to a non-existent API.

📝 Suggested fix
-│       │   │   ├── models.py      # MeetingRecord, AgendaItem, ActionItem, etc.
+│       │   │   ├── models.py      # MeetingRecord, MeetingAgendaItem, ActionItem, etc.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` around lines 2418 - 2425, Update the project tree and §15.3
references to use the actual implemented symbol name: replace the stale
"AgendaItem" entry with "MeetingAgendaItem" (and update any other mentions of
AgendaItem to MeetingAgendaItem) so readers are directed to the existing API;
ensure the package list line under the meeting subsystem and the §15.3 text use
MeetingAgendaItem consistently.
src/ai_company/communication/meeting/_token_tracker.py (2)

16-18: 🛠️ Refactor suggestion | 🟠 Major

TokenTracker still breaks the repo's copy-on-write state model.

A shared mutable dataclass here reintroduces in-place runtime state at the center of budget accounting. Prefer an immutable tracker that returns a new instance from record() (or a frozen Pydantic model with model_copy(update=...)). As per coding guidelines, "Create new objects instead of mutating existing ones" and "Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves; never mix static config fields with mutable runtime fields".

Also applies to: 67-68

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/_token_tracker.py` around lines 16 - 18,
TokenTracker is a mutable dataclass that violates the repo's copy-on-write
model; change it to an immutable/frozen model and make its state-update methods
return a new instance rather than mutating self. Specifically, convert
TokenTracker to a frozen dataclass or a frozen Pydantic model and update the
record() (and any other state-changing helpers referenced around TokenTracker)
to perform a copy/update and return the new TokenTracker instance (e.g., via
dataclasses.replace or model_copy(update=...)) so callers always receive a new
object instead of mutating shared state.

41-68: ⚠️ Potential issue | 🟠 Major

Reject budget overruns before updating the totals.

record() validates negative deltas, but it still accepts a call that pushes used past budget. If an agent returns more tokens than its assigned slice or max_tokens, the protocols silently overspend and remaining hides it by clamping to zero. Compute the projected total first and raise MeetingBudgetExhaustedError before mutating state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/_token_tracker.py` around lines 41 - 68,
The record() method currently updates input_tokens/output_tokens before checking
budget, allowing silent overruns; modify MeetingTokenTracker.record to compute
projected_used = (self.used or self.input_tokens + self.output_tokens) +
input_tokens + output_tokens (or simply self.input_tokens + self.output_tokens +
input_tokens + output_tokens) and if projected_used > self.budget raise
MeetingBudgetExhaustedError before mutating state, otherwise proceed to
increment self.input_tokens and self.output_tokens; keep the existing
non-negative validation and use the existing property names remaining,
is_exhausted, budget, used, input_tokens, output_tokens and the
MeetingBudgetExhaustedError exception.
src/ai_company/communication/meeting/structured_phases.py (3)

441-445: ⚠️ Potential issue | 🟠 Major

Cap the conflict-check call with max_discussion_tokens.

This leader call currently spends from tracker.remaining, so the pre-discussion conflict check can consume the entire remaining meeting budget before the actual discussion round or synthesis runs. It needs to draw from the same StructuredPhasesConfig.max_discussion_tokens allowance as the discussion phase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/structured_phases.py` around lines 441 -
445, The pre-discussion conflict check currently calls agent_caller(leader_id,
conflict_prompt, tracker.remaining) and can consume the whole meeting budget;
change it to cap the token allowance with
StructuredPhasesConfig.max_discussion_tokens (e.g., compute allowed_tokens =
min(tracker.remaining, StructuredPhasesConfig.max_discussion_tokens)) and pass
allowed_tokens to agent_caller so conflict_response uses the same
max_discussion_tokens limit as the discussion phase.

282-295: ⚠️ Potential issue | 🟠 Major

Persist structured decisions and action items from synthesis.

The synthesis step asks the leader for decisions and action items, but MeetingMinutes is returned with only summary. action_items therefore stays empty for structured-phases meetings, so downstream task creation never runs on this protocol. Parse the synthesis output (or switch to a structured response) and populate decisions/action_items before returning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/structured_phases.py` around lines 282 -
295, The MeetingMinutes returned by finalize in structured_phases omits
decisions and action_items, so downstream task creation never runs; update the
finalize flow (inside structured_phases where MeetingMinutes(...) is
constructed) to parse the synthesis output produced by the leader synthesis step
(or switch that step to request a structured JSON response) and populate the
MeetingMinutes.decisions and MeetingMinutes.action_items fields before
returning; locate the synthesis result variable used earlier in the phase (the
leader synthesis response) and extract/validate decisions and action items,
converting them into the same types used by MeetingMinutes and include them in
the MeetingMinutes(...) constructor along with summary.

322-323: ⚠️ Potential issue | 🟠 Major

Don't oversubscribe the parallel input budget.

max(1, tracker.remaining // num_participants) gives every participant a token even when the remaining budget is smaller than the participant count, so the TaskGroup can launch more aggregate budget than the meeting has left. Fail fast or cap fan-out instead of forcing a minimum slice of 1.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/structured_phases.py` around lines 322 -
323, The current calculation in structured_phases.py that sets tokens_per_agent
= max(1, tracker.remaining // max(1, num_participants)) can oversubscribe the
TaskGroup when tracker.remaining < num_participants; change the logic to either
fail fast or cap fan-out—e.g., compute allowed_agents = min(num_participants,
tracker.remaining) and then set tokens_per_agent = max(1, tracker.remaining //
allowed_agents) (or alternatively raise an error when tracker.remaining <
num_participants); update the code paths that spawn the TaskGroup (referencing
num_participants, tokens_per_agent, tracker.remaining and the TaskGroup spawn
logic) to use allowed_agents so aggregate token usage never exceeds
tracker.remaining.
src/ai_company/communication/meeting/round_robin.py (2)

125-126: ⚠️ Potential issue | 🟠 Major

Don't reserve summary budget when summaries are disabled.

With leader_summarizes=False, this still withholds 20% of the caller's budget from discussion. The protocol can stop early even though summary generation is disabled.

Suggested change
-        summary_reserve = int(token_budget * _SUMMARY_RESERVE_FRACTION)
+        summary_reserve = (
+            int(token_budget * _SUMMARY_RESERVE_FRACTION)
+            if self._config.leader_summarizes
+            else 0
+        )
         discussion_budget = token_budget - summary_reserve
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/round_robin.py` around lines 125 - 126,
The code always reserves summary_reserve = int(token_budget *
_SUMMARY_RESERVE_FRACTION) and subtracts it from discussion_budget even when
leader_summarizes is False; change the logic in
src/ai_company/communication/meeting/round_robin.py so that summary_reserve is
only computed and withheld when leader_summarizes is True (otherwise set
summary_reserve = 0 and discussion_budget = token_budget), updating the
variables summary_reserve and discussion_budget accordingly where they are used
so the protocol doesn't prematurely stop discussion when summaries are disabled.

185-197: ⚠️ Potential issue | 🟠 Major

Return structured decisions and action items from the summary phase.

The leader is prompted to list decisions and action items, but the returned MeetingMinutes only keeps the free-form summary. That leaves action_items empty on the successful round-robin path, so auto_create_tasks cannot work for round-robin meetings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/round_robin.py` around lines 185 - 197,
The MeetingMinutes returned at the end of the round-robin flow only sets summary
and omits the structured decisions/action_items captured during the summary
phase; update the return in round_robin.py (the MeetingMinutes construction) to
include the parsed decisions and action_items variables produced during the
summary step (e.g., decisions and action_items or
parsed_decisions/parsed_action_items) so MeetingMinutes.decisions and
MeetingMinutes.action_items are populated for the successful round-robin path
and allow auto_create_tasks to work.
tests/unit/communication/meeting/test_structured_phases.py (1)

107-112: ⚠️ Potential issue | 🟡 Minor

Use the injected leader_id in these mock response maps.

These tests already receive leader_id, but the response dictionaries are still keyed by the literal "leader-agent". If that fixture changes or becomes parameterized, the mocks stop matching and the failures no longer reflect protocol behavior.

Also applies to: 145-149, 180-184, 217-221, 249-251, 331-333, 385-390, 431-435

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/communication/meeting/test_structured_phases.py` around lines 107
- 112, Mocks in these tests hard-code the key "leader-agent" in the responses
dict instead of using the injected leader_id fixture, which will break if
leader_id changes; update each responses map (the dict named responses in the
test) to use the leader_id variable as the key (e.g., replace "leader-agent":
[...] with leader_id: [...]) for all occurrences so the mock matches the
injected fixture value and any parameterizations of leader_id.
tests/unit/communication/meeting/test_orchestrator.py (1)

455-483: ⚠️ Potential issue | 🟠 Major

These two tests never enter the task-creation path.

Both tests use the default protocol/mocks, which currently produce no ActionItems. That means the auto_create_tasks=False gate and the failing task_creator branch never execute, so these tests can pass even if _create_tasks() is broken. Reuse the stub-protocol pattern from test_task_creator_called_with_correct_args() and return minutes with at least one action item.

Also applies to: 485-511

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/communication/meeting/test_orchestrator.py` around lines 455 -
483, The tests test_task_creator_not_called_when_disabled (and the duplicate at
485-511) never exercise the task-creation path because the default stub protocol
returns no ActionItem minutes; update these tests to reuse the stub-protocol
used in test_task_creator_called_with_correct_args() so the protocol returns
Minutes containing at least one ActionItem, then assert auto_create_tasks=False
prevents _create_tasks() from calling the provided _creator. Specifically,
adjust the protocol stub used when constructing orchestrator via
_make_orchestrator to produce Minutes with an ActionItem (matching the stub
pattern in test_task_creator_called_with_correct_args()), keep
MeetingProtocolConfig(auto_create_tasks=False), pass the same _creator, and call
orchestrator.run_meeting so the task-creation branch is actually reached and can
be validated.
src/ai_company/communication/meeting/orchestrator.py (2)

86-96: ⚠️ Potential issue | 🟠 Major

Freeze the protocol registry to prevent external mutation.

The constructor stores the caller-owned protocol_registry directly. External mutations after construction can change protocol resolution behavior unexpectedly.

♻️ Proposed fix with defensive copy
+from copy import deepcopy
+from types import MappingProxyType
 ...
     def __init__(
         self,
         *,
         protocol_registry: Mapping[MeetingProtocolType, MeetingProtocol],
         agent_caller: AgentCaller,
         task_creator: TaskCreator | None = None,
     ) -> None:
-        self._protocol_registry = protocol_registry
+        self._protocol_registry: Mapping[MeetingProtocolType, MeetingProtocol] = (
+            MappingProxyType(deepcopy(dict(protocol_registry)))
+        )
         self._agent_caller = agent_caller
         self._task_creator = task_creator
         self._records: list[MeetingRecord] = []

As per coding guidelines, "Create new objects instead of mutating existing ones; use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement of non-Pydantic internal collections (registries, BaseTool)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/orchestrator.py` around lines 86 - 96,
The constructor currently stores the caller-owned protocol_registry directly
(parameter protocol_registry -> attribute self._protocol_registry), allowing
external mutation; fix by creating a defensive copy and exposing it as
read-only: deepcopy the incoming protocol_registry (use copy.deepcopy) and then
wrap the result with types.MappingProxyType before assigning to
self._protocol_registry so the internal registry cannot be mutated after
construction (update imports as needed to include copy and MappingProxyType).

358-409: ⚠️ Potential issue | 🟠 Major

Validate blank and duplicate IDs at the system boundary.

This validation only checks for empty participant_ids and leader inclusion. Blank/whitespace IDs and duplicate participants will pass through and fail later in MeetingMinutes validation after tokens have been spent. Validate these constraints here to fail fast.

🛡️ Proposed fix to add boundary validation
     def _validate_inputs(
         self,
         meeting_id: str,
         leader_id: str,
         participant_ids: tuple[str, ...],
         token_budget: int,
     ) -> None:
         """Validate meeting inputs.

         Raises:
             MeetingParticipantError: If participants are empty or leader
                 is in participants.
             ValueError: If token_budget is not positive.
         """
         if token_budget <= 0:
             logger.warning(
                 MEETING_VALIDATION_FAILED,
                 meeting_id=meeting_id,
                 error=f"token_budget must be positive, got {token_budget}",
             )
             msg = f"token_budget must be positive, got {token_budget}"
             raise ValueError(msg)

+        # Validate leader_id is not blank
+        if not leader_id or not leader_id.strip():
+            logger.warning(
+                MEETING_VALIDATION_FAILED,
+                meeting_id=meeting_id,
+                error="leader_id must not be blank",
+            )
+            msg = "leader_id must not be blank"
+            raise MeetingParticipantError(
+                msg,
+                context={"meeting_id": meeting_id},
+            )
+
         if not participant_ids:
             logger.warning(
                 MEETING_VALIDATION_FAILED,
                 meeting_id=meeting_id,
                 error="at least one participant is required",
             )
             msg = "At least one participant is required"
             raise MeetingParticipantError(
                 msg,
                 context={"meeting_id": meeting_id},
             )
+
+        # Validate no blank participant IDs
+        blank_ids = [p for p in participant_ids if not p or not p.strip()]
+        if blank_ids:
+            logger.warning(
+                MEETING_VALIDATION_FAILED,
+                meeting_id=meeting_id,
+                error="participant_ids contain blank values",
+            )
+            msg = "participant_ids must not contain blank values"
+            raise MeetingParticipantError(
+                msg,
+                context={"meeting_id": meeting_id},
+            )
+
+        # Validate no duplicate participant IDs
+        if len(participant_ids) != len(set(participant_ids)):
+            from collections import Counter
+            dupes = [v for v, c in Counter(participant_ids).items() if c > 1]
+            logger.warning(
+                MEETING_VALIDATION_FAILED,
+                meeting_id=meeting_id,
+                error="duplicate participant_ids",
+                duplicates=dupes,
+            )
+            msg = f"Duplicate participant_ids: {dupes}"
+            raise MeetingParticipantError(
+                msg,
+                context={"meeting_id": meeting_id, "duplicates": dupes},
+            )
+
         if leader_id in participant_ids:
             ...

As per coding guidelines, "Validate at system boundaries (user input, external APIs, config files)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/orchestrator.py` around lines 358 - 409,
The _validate_inputs method currently only checks for empty participant_ids,
leader inclusion, and token_budget; extend it to also reject blank/whitespace
participant IDs and duplicate participant IDs at the system boundary: iterate
over participant_ids in _validate_inputs, trim each ID and if any are empty/only
whitespace log MEETING_VALIDATION_FAILED and raise MeetingParticipantError with
context {"meeting_id": meeting_id} and a clear message about blank IDs, and if
len(set(participant_ids)) != len(participant_ids) log and raise
MeetingParticipantError indicating duplicate participant IDs (include meeting_id
and optionally leader_id in context), keeping existing token_budget and leader
checks intact and using the same logger and error patterns used elsewhere in
_validate_inputs.
src/ai_company/communication/meeting/models.py (2)

201-218: 🛠️ Refactor suggestion | 🟠 Major

Derive token totals from contributions to prevent inconsistency.

total_input_tokens and total_output_tokens are independent stored fields that can diverge from the sum of per-contribution token counts. Since total_tokens is already a @computed_field, consider making these derived as well to enforce consistency and avoid redundant state.

♻️ Proposed refactor to use computed fields
-    total_input_tokens: int = Field(
-        default=0,
-        ge=0,
-        description="Total prompt tokens",
-    )
-    total_output_tokens: int = Field(
-        default=0,
-        ge=0,
-        description="Total response tokens",
-    )
+    `@computed_field`  # type: ignore[prop-decorator]
+    `@property`
+    def total_input_tokens(self) -> int:
+        """Total prompt tokens consumed."""
+        return sum(c.input_tokens for c in self.contributions)
+
+    `@computed_field`  # type: ignore[prop-decorator]
+    `@property`
+    def total_output_tokens(self) -> int:
+        """Total response tokens generated."""
+        return sum(c.output_tokens for c in self.contributions)

As per coding guidelines, "Use @computed_field for derived values instead of storing redundant fields".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/models.py` around lines 201 - 218, The
stored fields total_input_tokens and total_output_tokens can diverge from
per-contribution sums; change them to computed fields that sum over the
meeting's contributions (e.g., sum(c.input_tokens) for total_input_tokens and
sum(c.output_tokens) for total_output_tokens) instead of keeping default stored
ints, and ensure total_tokens remains (or is updated to be) the computed sum of
those computed fields to remove redundant state; update the Field
definitions/annotations for total_input_tokens and total_output_tokens to use
`@computed_field` and reference the contributions relationship when calculating
the totals (keeping the existing total_tokens property as-is or adjust to return
the sum of the newly computed fields).

265-268: ⚠️ Potential issue | 🟡 Minor

Enforce non-blank error_message for failed statuses.

The validator at lines 284-296 requires error_message for FAILED and BUDGET_EXHAUSTED statuses, but using str | None still allows empty or whitespace-only strings to pass. Use NotBlankStr | None to ensure meaningful error descriptions.

🛠️ Proposed fix
-    error_message: str | None = Field(
+    error_message: NotBlankStr | None = Field(
         default=None,
         description="Error description on failure",
     )

Based on learnings, "Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/models.py` around lines 265 - 268, The
error_message field uses type str | None which allows empty/whitespace strings;
change its annotation to NotBlankStr | None (import NotBlankStr from core.types)
so validators for statuses FAILED and BUDGET_EXHAUSTED enforce non-blank
messages, i.e., replace "error_message: str | None = Field(...)" with
"error_message: NotBlankStr | None = Field(...)" and add the necessary import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/communication/meeting/position_papers.py`:
- Around line 212-229: The current code sets max_tokens using budget_slice but
ignores prompt token cost so agent_caller(...) can generate within max_tokens
while the prompt pushes tracker.used past token_budget; fix by estimating the
prompt token count before calling agent_caller (e.g., via the project's
tokenizer or a conservative heuristic) and compute output_allowance = max(0,
min(self._config.max_tokens_per_position, budget_slice -
estimated_prompt_tokens)); pass output_allowance as max_tokens to agent_caller,
record both estimated_prompt_tokens + response.output_tokens to tracker (instead
of waiting for response.input_tokens), and apply the same change to the other
occurrence around the block handling lines ~338-347 to ensure the
duration_tokens/token_budget bound is respected.

In `@src/ai_company/communication/meeting/protocol.py`:
- Around line 61-100: The MeetingProtocol interface currently only exposes run()
and get_protocol_type(), which prevents callers from composing standard
lifecycle phases; update the MeetingProtocol class to declare the standard
lifecycle methods start_meeting(...), run_round(...), summarize(...), and
extract_action_items(...) (with appropriate async signatures and the same domain
types used by run: meeting_id: str, agenda: MeetingAgenda, leader_id: str,
participant_ids: tuple[str, ...], agent_caller: AgentCaller, token_budget: int,
returning MeetingMinutes or specific partial results as appropriate) so
implementations must implement these phases, and keep run() as a convenience
orchestrator that can call those methods; reference MeetingProtocol, run,
get_protocol_type, start_meeting, run_round, summarize, and extract_action_items
when making the change.

In `@src/ai_company/communication/meeting/structured_phases.py`:
- Around line 467-472: The logger call using MEETING_CONFLICT_DETECTED currently
writes the full meeting analysis via raw_response=conflict_response.content;
replace this with non-sensitive metadata only (e.g., a boolean or redacted
summary) so sensitive participant text is not stored. Update the logger.info
call that includes MEETING_CONFLICT_DETECTED, meeting_id, conflicts_found
(conflicts_detected) to remove or redact raw_response and instead pass a
redact-safe field like conflict_summary (e.g., "redacted" or a short
summary/flag) or a hashed/length-only marker derived from
conflict_response.content; ensure references are to logger.info,
MEETING_CONFLICT_DETECTED, meeting_id, conflicts_found, and conflict_response so
reviewers can find and verify the change.

In `@tests/unit/communication/meeting/test_token_tracker.py`:
- Around line 36-47: Update the two tests to expect over-budget recording to
fail instead of being allowed: wrap the calls to TokenTracker.record in
pytest.raises(ValueError) (or the specific exception your TokenTracker raises)
so test_record_exceeds_budget and test_remaining_never_negative assert that
calling TokenTracker.record would raise when the requested tokens exceed the
budget; after the exception assert that tracker.used remains at its previous
value (e.g., 30 for test_record_exceeds_budget or 0 for the other),
tracker.remaining is 0, and tracker.is_exhausted is True; this changes
expectations for the TokenTracker.record method and uses the TokenTracker,
record, used, remaining, and is_exhausted symbols to locate the code to update.

---

Duplicate comments:
In `@DESIGN_SPEC.md`:
- Around line 2418-2425: Update the project tree and §15.3 references to use the
actual implemented symbol name: replace the stale "AgendaItem" entry with
"MeetingAgendaItem" (and update any other mentions of AgendaItem to
MeetingAgendaItem) so readers are directed to the existing API; ensure the
package list line under the meeting subsystem and the §15.3 text use
MeetingAgendaItem consistently.

In `@src/ai_company/communication/meeting/_token_tracker.py`:
- Around line 16-18: TokenTracker is a mutable dataclass that violates the
repo's copy-on-write model; change it to an immutable/frozen model and make its
state-update methods return a new instance rather than mutating self.
Specifically, convert TokenTracker to a frozen dataclass or a frozen Pydantic
model and update the record() (and any other state-changing helpers referenced
around TokenTracker) to perform a copy/update and return the new TokenTracker
instance (e.g., via dataclasses.replace or model_copy(update=...)) so callers
always receive a new object instead of mutating shared state.
- Around line 41-68: The record() method currently updates
input_tokens/output_tokens before checking budget, allowing silent overruns;
modify MeetingTokenTracker.record to compute projected_used = (self.used or
self.input_tokens + self.output_tokens) + input_tokens + output_tokens (or
simply self.input_tokens + self.output_tokens + input_tokens + output_tokens)
and if projected_used > self.budget raise MeetingBudgetExhaustedError before
mutating state, otherwise proceed to increment self.input_tokens and
self.output_tokens; keep the existing non-negative validation and use the
existing property names remaining, is_exhausted, budget, used, input_tokens,
output_tokens and the MeetingBudgetExhaustedError exception.

In `@src/ai_company/communication/meeting/models.py`:
- Around line 201-218: The stored fields total_input_tokens and
total_output_tokens can diverge from per-contribution sums; change them to
computed fields that sum over the meeting's contributions (e.g.,
sum(c.input_tokens) for total_input_tokens and sum(c.output_tokens) for
total_output_tokens) instead of keeping default stored ints, and ensure
total_tokens remains (or is updated to be) the computed sum of those computed
fields to remove redundant state; update the Field definitions/annotations for
total_input_tokens and total_output_tokens to use `@computed_field` and reference
the contributions relationship when calculating the totals (keeping the existing
total_tokens property as-is or adjust to return the sum of the newly computed
fields).
- Around line 265-268: The error_message field uses type str | None which allows
empty/whitespace strings; change its annotation to NotBlankStr | None (import
NotBlankStr from core.types) so validators for statuses FAILED and
BUDGET_EXHAUSTED enforce non-blank messages, i.e., replace "error_message: str |
None = Field(...)" with "error_message: NotBlankStr | None = Field(...)" and add
the necessary import.

In `@src/ai_company/communication/meeting/orchestrator.py`:
- Around line 86-96: The constructor currently stores the caller-owned
protocol_registry directly (parameter protocol_registry -> attribute
self._protocol_registry), allowing external mutation; fix by creating a
defensive copy and exposing it as read-only: deepcopy the incoming
protocol_registry (use copy.deepcopy) and then wrap the result with
types.MappingProxyType before assigning to self._protocol_registry so the
internal registry cannot be mutated after construction (update imports as needed
to include copy and MappingProxyType).
- Around line 358-409: The _validate_inputs method currently only checks for
empty participant_ids, leader inclusion, and token_budget; extend it to also
reject blank/whitespace participant IDs and duplicate participant IDs at the
system boundary: iterate over participant_ids in _validate_inputs, trim each ID
and if any are empty/only whitespace log MEETING_VALIDATION_FAILED and raise
MeetingParticipantError with context {"meeting_id": meeting_id} and a clear
message about blank IDs, and if len(set(participant_ids)) !=
len(participant_ids) log and raise MeetingParticipantError indicating duplicate
participant IDs (include meeting_id and optionally leader_id in context),
keeping existing token_budget and leader checks intact and using the same logger
and error patterns used elsewhere in _validate_inputs.

In `@src/ai_company/communication/meeting/position_papers.py`:
- Around line 156-168: The return currently only sets MeetingMinutes.summary
from synthesis_contribution.content, leaving MeetingMinutes.decisions and
.action_items empty; update the position-papers finalization in the function
that builds MeetingMinutes (the block returning MeetingMinutes with
protocol_type=MeetingProtocolType.POSITION_PAPERS) to parse/derive structured
decisions and action items from the synthesis_contribution (or from combined
contributions) and populate MeetingMinutes.decisions and
MeetingMinutes.action_items accordingly so auto_create_tasks can consume them;
you can add a small helper (e.g., parse_decisions_and_actions_from_text) to
extract items from synthesis_contribution.content or change the synthesis step
to return a richer object and map its fields into MeetingMinutes.decisions and
.action_items.
- Around line 203-205: The current pre-division line tokens_per_agent = max(1,
tracker.remaining // max(1, n)) can overcommit when tracker.remaining < n;
instead compute a base and remainder and distribute without exceeding
tracker.remaining: set base = tracker.remaining // n and rem = tracker.remaining
% n, then assign each agent tokens = base + (1 if agent_index < rem else 0)
(this yields some agents 0 when tracker.remaining < n and guarantees the total
sum <= tracker.remaining). Update the logic around tokens_per_agent and any loop
that consumes it (referencing tokens_per_agent, tracker.remaining and n) to use
this distribution scheme so no more tokens are allocated than available.

In `@src/ai_company/communication/meeting/round_robin.py`:
- Around line 125-126: The code always reserves summary_reserve =
int(token_budget * _SUMMARY_RESERVE_FRACTION) and subtracts it from
discussion_budget even when leader_summarizes is False; change the logic in
src/ai_company/communication/meeting/round_robin.py so that summary_reserve is
only computed and withheld when leader_summarizes is True (otherwise set
summary_reserve = 0 and discussion_budget = token_budget), updating the
variables summary_reserve and discussion_budget accordingly where they are used
so the protocol doesn't prematurely stop discussion when summaries are disabled.
- Around line 185-197: The MeetingMinutes returned at the end of the round-robin
flow only sets summary and omits the structured decisions/action_items captured
during the summary phase; update the return in round_robin.py (the
MeetingMinutes construction) to include the parsed decisions and action_items
variables produced during the summary step (e.g., decisions and action_items or
parsed_decisions/parsed_action_items) so MeetingMinutes.decisions and
MeetingMinutes.action_items are populated for the successful round-robin path
and allow auto_create_tasks to work.

In `@src/ai_company/communication/meeting/structured_phases.py`:
- Around line 441-445: The pre-discussion conflict check currently calls
agent_caller(leader_id, conflict_prompt, tracker.remaining) and can consume the
whole meeting budget; change it to cap the token allowance with
StructuredPhasesConfig.max_discussion_tokens (e.g., compute allowed_tokens =
min(tracker.remaining, StructuredPhasesConfig.max_discussion_tokens)) and pass
allowed_tokens to agent_caller so conflict_response uses the same
max_discussion_tokens limit as the discussion phase.
- Around line 282-295: The MeetingMinutes returned by finalize in
structured_phases omits decisions and action_items, so downstream task creation
never runs; update the finalize flow (inside structured_phases where
MeetingMinutes(...) is constructed) to parse the synthesis output produced by
the leader synthesis step (or switch that step to request a structured JSON
response) and populate the MeetingMinutes.decisions and
MeetingMinutes.action_items fields before returning; locate the synthesis result
variable used earlier in the phase (the leader synthesis response) and
extract/validate decisions and action items, converting them into the same types
used by MeetingMinutes and include them in the MeetingMinutes(...) constructor
along with summary.
- Around line 322-323: The current calculation in structured_phases.py that sets
tokens_per_agent = max(1, tracker.remaining // max(1, num_participants)) can
oversubscribe the TaskGroup when tracker.remaining < num_participants; change
the logic to either fail fast or cap fan-out—e.g., compute allowed_agents =
min(num_participants, tracker.remaining) and then set tokens_per_agent = max(1,
tracker.remaining // allowed_agents) (or alternatively raise an error when
tracker.remaining < num_participants); update the code paths that spawn the
TaskGroup (referencing num_participants, tokens_per_agent, tracker.remaining and
the TaskGroup spawn logic) to use allowed_agents so aggregate token usage never
exceeds tracker.remaining.

In `@tests/unit/communication/meeting/test_orchestrator.py`:
- Around line 455-483: The tests test_task_creator_not_called_when_disabled (and
the duplicate at 485-511) never exercise the task-creation path because the
default stub protocol returns no ActionItem minutes; update these tests to reuse
the stub-protocol used in test_task_creator_called_with_correct_args() so the
protocol returns Minutes containing at least one ActionItem, then assert
auto_create_tasks=False prevents _create_tasks() from calling the provided
_creator. Specifically, adjust the protocol stub used when constructing
orchestrator via _make_orchestrator to produce Minutes with an ActionItem
(matching the stub pattern in test_task_creator_called_with_correct_args()),
keep MeetingProtocolConfig(auto_create_tasks=False), pass the same _creator, and
call orchestrator.run_meeting so the task-creation branch is actually reached
and can be validated.

In `@tests/unit/communication/meeting/test_position_papers.py`:
- Around line 63-121: The tests currently only assert counts and set membership,
which allows nondeterministic ordering to pass; update the two tests
test_contributions_structure and test_parallel_execution_all_participants to
assert a deterministic ordering for POSITION_PAPER contributions by extracting
the list of contributions where c.phase == MeetingPhase.POSITION_PAPER from
minutes.contributions and then assert their order (e.g., by comparing the
sequence of c.agent_id or c.turn_number to the expected order derived from
participant_ids or expected turn numbers). Ensure you reference the
minutes.contributions list, MeetingPhase.POSITION_PAPER, and the contribution
fields agent_id and/or turn_number when implementing the ordered assertion so
the regression to nondeterministic ordering is pinned.

In `@tests/unit/communication/meeting/test_structured_phases.py`:
- Around line 107-112: Mocks in these tests hard-code the key "leader-agent" in
the responses dict instead of using the injected leader_id fixture, which will
break if leader_id changes; update each responses map (the dict named responses
in the test) to use the leader_id variable as the key (e.g., replace
"leader-agent": [...] with leader_id: [...]) for all occurrences so the mock
matches the injected fixture value and any parameterizations of leader_id.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7f1dc386-0c0c-49ef-a80e-d278593c8af8

📥 Commits

Reviewing files that changed from the base of the PR and between af5d72a and 3101925.

📒 Files selected for processing (19)
  • DESIGN_SPEC.md
  • src/ai_company/communication/__init__.py
  • src/ai_company/communication/meeting/__init__.py
  • src/ai_company/communication/meeting/_token_tracker.py
  • src/ai_company/communication/meeting/config.py
  • src/ai_company/communication/meeting/enums.py
  • src/ai_company/communication/meeting/models.py
  • src/ai_company/communication/meeting/orchestrator.py
  • src/ai_company/communication/meeting/position_papers.py
  • src/ai_company/communication/meeting/protocol.py
  • src/ai_company/communication/meeting/round_robin.py
  • src/ai_company/communication/meeting/structured_phases.py
  • tests/unit/communication/meeting/test_models.py
  • tests/unit/communication/meeting/test_orchestrator.py
  • tests/unit/communication/meeting/test_position_papers.py
  • tests/unit/communication/meeting/test_prompts.py
  • tests/unit/communication/meeting/test_protocol.py
  • tests/unit/communication/meeting/test_structured_phases.py
  • tests/unit/communication/meeting/test_token_tracker.py

Comment on lines +212 to +229
max_tokens = min(
self._config.max_tokens_per_position,
budget_slice,
)

logger.debug(
MEETING_AGENT_CALLED,
meeting_id=meeting_id,
agent_id=participant_id,
phase=MeetingPhase.POSITION_PAPER,
)

response = await agent_caller(
participant_id,
prompt,
max_tokens,
)
tracker.record(response.input_tokens, response.output_tokens)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

max_tokens here does not cap total token spend.

agent_caller(..., max_tokens) only limits generated tokens. Prompt tokens are discovered afterward via response.input_tokens and recorded unconditionally, so a large prompt can push tracker.used past token_budget even when every slice looked safe. That breaks the hard duration_tokens bound from §5.4; reserve prompt cost up front or derive the output allowance from budget_slice - estimated_prompt_tokens before dispatch.

Also applies to: 338-347

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/position_papers.py` around lines 212 -
229, The current code sets max_tokens using budget_slice but ignores prompt
token cost so agent_caller(...) can generate within max_tokens while the prompt
pushes tracker.used past token_budget; fix by estimating the prompt token count
before calling agent_caller (e.g., via the project's tokenizer or a conservative
heuristic) and compute output_allowance = max(0,
min(self._config.max_tokens_per_position, budget_slice -
estimated_prompt_tokens)); pass output_allowance as max_tokens to agent_caller,
record both estimated_prompt_tokens + response.output_tokens to tracker (instead
of waiting for response.input_tokens), and apply the same change to the other
occurrence around the block handling lines ~338-347 to ensure the
duration_tokens/token_budget bound is respected.

Comment on lines +61 to +100
class MeetingProtocol(Protocol):
"""Strategy interface for meeting protocol implementations.

Each implementation defines a different structure for how agents
interact during a meeting (round-robin turns, parallel position
papers, structured phases with discussion).
"""

async def run( # noqa: PLR0913
self,
*,
meeting_id: str,
agenda: MeetingAgenda,
leader_id: str,
participant_ids: tuple[str, ...],
agent_caller: AgentCaller,
token_budget: int,
) -> MeetingMinutes:
"""Execute the meeting protocol and produce minutes.

Args:
meeting_id: Unique identifier for this meeting.
agenda: The meeting agenda.
leader_id: ID of the agent leading the meeting.
participant_ids: IDs of participating agents.
agent_caller: Callback to invoke agents.
token_budget: Maximum tokens for the entire meeting.

Returns:
Complete meeting minutes.
"""
...

def get_protocol_type(self) -> MeetingProtocolType:
"""Return the protocol type this implementation handles.

Returns:
The meeting protocol type enum value.
"""
...
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The MeetingProtocol surface no longer matches the linked objective.

#123 called for standard lifecycle operations (start_meeting, run_round, summarize, extract_action_items). With only run()/get_protocol_type(), each protocol becomes a black box and callers cannot compose or enforce those common phases consistently. Either expose the standard operations here or update the requirement/spec to match the abstraction you're actually shipping.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/protocol.py` around lines 61 - 100, The
MeetingProtocol interface currently only exposes run() and get_protocol_type(),
which prevents callers from composing standard lifecycle phases; update the
MeetingProtocol class to declare the standard lifecycle methods
start_meeting(...), run_round(...), summarize(...), and
extract_action_items(...) (with appropriate async signatures and the same domain
types used by run: meeting_id: str, agenda: MeetingAgenda, leader_id: str,
participant_ids: tuple[str, ...], agent_caller: AgentCaller, token_budget: int,
returning MeetingMinutes or specific partial results as appropriate) so
implementations must implement these phases, and keep run() as a convenience
orchestrator that can call those methods; reference MeetingProtocol, run,
get_protocol_type, start_meeting, run_round, summarize, and extract_action_items
when making the change.

Comment on lines +467 to +472
logger.info(
MEETING_CONFLICT_DETECTED,
meeting_id=meeting_id,
conflicts_found=conflicts_detected,
raw_response=conflict_response.content,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't log raw meeting analysis at INFO level.

raw_response=conflict_response.content writes the leader's full conflict analysis into application logs. That can include participant arguments or other sensitive meeting content, while observability only needs a boolean or a redacted summary. Log metadata instead of the full text.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/communication/meeting/structured_phases.py` around lines 467 -
472, The logger call using MEETING_CONFLICT_DETECTED currently writes the full
meeting analysis via raw_response=conflict_response.content; replace this with
non-sensitive metadata only (e.g., a boolean or redacted summary) so sensitive
participant text is not stored. Update the logger.info call that includes
MEETING_CONFLICT_DETECTED, meeting_id, conflicts_found (conflicts_detected) to
remove or redact raw_response and instead pass a redact-safe field like
conflict_summary (e.g., "redacted" or a short summary/flag) or a
hashed/length-only marker derived from conflict_response.content; ensure
references are to logger.info, MEETING_CONFLICT_DETECTED, meeting_id,
conflicts_found, and conflict_response so reviewers can find and verify the
change.

Comment on lines +36 to +47
def test_record_exceeds_budget(self) -> None:
tracker = TokenTracker(budget=50)
tracker.record(30, 30)
assert tracker.used == 60
assert tracker.remaining == 0
assert tracker.is_exhausted is True

def test_remaining_never_negative(self) -> None:
tracker = TokenTracker(budget=10)
tracker.record(100, 100)
assert tracker.remaining == 0
assert tracker.is_exhausted is True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These tests lock in over-budget recording as valid behavior.

TokenTracker is supposed to underpin duration_tokens enforcement, but these expectations bless used > budget and only clamp remaining to zero after overspending. That makes exhaustion detectable only after the cap is already violated; once the tracker/enforcers are tightened, these cases should flip to expecting a failure instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/communication/meeting/test_token_tracker.py` around lines 36 - 47,
Update the two tests to expect over-budget recording to fail instead of being
allowed: wrap the calls to TokenTracker.record in pytest.raises(ValueError) (or
the specific exception your TokenTracker raises) so test_record_exceeds_budget
and test_remaining_never_negative assert that calling TokenTracker.record would
raise when the requested tokens exceed the budget; after the exception assert
that tracker.used remains at its previous value (e.g., 30 for
test_record_exceeds_budget or 0 for the other), tracker.remaining is 0, and
tracker.is_exhausted is True; this changes expectations for the
TokenTracker.record method and uses the TokenTracker, record, used, remaining,
and is_exhausted symbols to locate the code to update.

)

- Merge origin/main (conflict resolution) into feat/meeting-protocol
- Move raw_response log from INFO to DEBUG (sensitive data concern)
- Use MEETING_BUDGET_EXHAUSTED event for budget-exhaustion log entries
- Add synthesis budget reserve (20%) to position_papers + structured_phases
Copilot AI review requested due to automatic review settings March 8, 2026 09:47
@Aureliolo Aureliolo merged commit ee7caca into main Mar 8, 2026
9 of 10 checks passed
@Aureliolo Aureliolo deleted the feat/meeting-protocol branch March 8, 2026 09:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 34 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +230 to +233
budget_group = exc.subgroup(MeetingBudgetExhaustedError)
if budget_group is not None and len(budget_group.exceptions) == len(
exc.exceptions
):
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExceptionGroup handling can misclassify mixed nested ExceptionGroups as fully budget-exhausted. subgroup(MeetingBudgetExhaustedError) preserves group structure, so the top-level len(...exceptions) comparison may still match even when some nested exceptions are non-budget errors, incorrectly returning BUDGET_EXHAUSTED. Use exc.split(MeetingBudgetExhaustedError) (or equivalent) and only mark BUDGET_EXHAUSTED when the remainder group is None.

Suggested change
budget_group = exc.subgroup(MeetingBudgetExhaustedError)
if budget_group is not None and len(budget_group.exceptions) == len(
exc.exceptions
):
matched, remainder = exc.split(MeetingBudgetExhaustedError)
if matched is not None and remainder is None:

Copilot uses AI. Check for mistakes.
Comment on lines +382 to +394
if not participant_ids:
logger.warning(
MEETING_VALIDATION_FAILED,
meeting_id=meeting_id,
error="at least one participant is required",
)
msg = "At least one participant is required"
raise MeetingParticipantError(
msg,
context={"meeting_id": meeting_id},
)
if leader_id in participant_ids:
logger.warning(
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_meeting() documents that input validation errors are raised directly, but _validate_inputs() does not reject duplicate participant_ids. Duplicates currently get detected later by MeetingMinutes validation, which then gets caught as a generic failure and turned into a FAILED record instead of a MeetingParticipantError. Add a uniqueness check here (e.g., via validate_unique_strings(...)) so invalid participant lists are handled consistently as validation errors.

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +277
assert all(r is not None for r in results), ( # noqa: S101
f"Expected {n} position papers but some slots are None"
)
assert all(c is not None for c in contrib_results), ( # noqa: S101
f"Expected {n} contributions but some slots are None"
)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These invariants are enforced with assert. Because Python can run with assertions disabled (-O), this could allow None entries to slip through and cause confusing downstream failures. Prefer an explicit runtime check and raise a normal exception (with context) if any slot is unexpectedly unfilled.

Suggested change
assert all(r is not None for r in results), ( # noqa: S101
f"Expected {n} position papers but some slots are None"
)
assert all(c is not None for c in contrib_results), ( # noqa: S101
f"Expected {n} contributions but some slots are None"
)
missing_paper_indices = [i for i, r in enumerate(results) if r is None]
if missing_paper_indices:
raise RuntimeError(
f"Expected {n} position papers but some slots are None "
f"at indices {missing_paper_indices}"
)
missing_contrib_indices = [i for i, c in enumerate(contrib_results) if c is None]
if missing_contrib_indices:
raise RuntimeError(
f"Expected {n} contributions but some slots are None "
f"at indices {missing_contrib_indices}"
)

Copilot uses AI. Check for mistakes.
Comment on lines +392 to +397
assert all(r is not None for r in result_inputs), ( # noqa: S101
f"Expected {num_participants} inputs but some slots are None"
)
assert all(c is not None for c in result_contributions), ( # noqa: S101
f"Expected {num_participants} contributions but some slots are None"
)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These invariants are enforced with assert. If assertions are disabled (python -O), result_inputs / result_contributions could still contain None and the function would proceed, leading to harder-to-debug type errors later. Prefer explicit runtime checks and raise a normal exception if any slot is unexpectedly unfilled.

Suggested change
assert all(r is not None for r in result_inputs), ( # noqa: S101
f"Expected {num_participants} inputs but some slots are None"
)
assert all(c is not None for c in result_contributions), ( # noqa: S101
f"Expected {num_participants} contributions but some slots are None"
)
missing_input_indices = [i for i, r in enumerate(result_inputs) if r is None]
if missing_input_indices:
raise RuntimeError(
f"Expected {num_participants} inputs but found None for indices "
f"{missing_input_indices}"
)
missing_contribution_indices = [
i for i, c in enumerate(result_contributions) if c is None
]
if missing_contribution_indices:
raise RuntimeError(
"Expected "
f"{num_participants} contributions but found None for indices "
f"{missing_contribution_indices}"
)

Copilot uses AI. Check for mistakes.
Aureliolo added a commit that referenced this pull request Mar 8, 2026
Critical (C1-C4):
- Parse decisions/action_items from LLM synthesis in all 3 meeting protocols
- Validate winning_agent_id exists in find_losers() before computing losers

Major (M1-M17):
- Guard summary budget reserve when leader_summarizes=False
- Add synthesis sub-reserve in structured phases discussion
- Reject duplicate participant_ids in meeting orchestrator
- Freeze protocol registry with MappingProxyType
- Warn when token tracker exceeds budget
- Add hierarchy tiebreaker to pick_highest_seniority()
- Wire hierarchy into debate/hybrid authority fallbacks
- Fast-path get_lowest_common_manager(a, a) → a
- Validate _SENIORITY_ORDER matches enum members at import
- Remove dead max_tokens_per_argument config field
- Verify task IDs match plan subtask IDs in DecompositionResult
- Return CANCELLED for mixed completed+cancelled terminal states
- Fix double-logging in rollup compute() for empty case
- Copy subtask dependencies from plan to created Tasks
- Reject duplicate subtask IDs in RoutingResult
- Wake all pending waiters on unsubscribe (not just one)

Minor (m1-m15):
- Remove duplicate MEETING_CONFLICT_DETECTED log events
- Replace assert with explicit raises in meeting protocols
- Include presenter_id in formatted agenda prompt
- Validate token aggregates in MeetingMinutes
- Require non-empty error_message for FAILED/BUDGET_EXHAUSTED
- Move _MIN_POSITIONS to local constant in service.py
- Precompute seniority rank dict for O(1) lookups
- Remove dead asyncio.QueueFull catch on unbounded queue
- Fix racy state check in _log_receive_null (acquire lock)
- Type channel_name as NotBlankStr in messenger
- Document unsubscribe as None return path in receive()
- Preserve traceback context in parallel.py re-raise
- Validate parent_task.id matches plan.parent_task_id
- Add logging before raises in routing model validators

Trivial (t1-t4):
- Use centralized event constant in routing scorer
- Add task_structure/coordination_topology to Task docstring
- Fix DESIGN_SPEC.md model/function names to match code
- Fix StructuredPhasesConfig docstring

Tests (T1-T5):
- Assert MEETING_CONTRIBUTION enum value
- Add timeout markers to all meeting test modules
- Add 3+ participant test for authority/debate strategies
- Remove dead max_tokens_per_argument test references
- Update HybridResolver tests for new hierarchy parameter

Closes #169
Aureliolo added a commit that referenced this pull request Mar 8, 2026
## Summary

- **46 findings** from external reviewers (Copilot, Greptile,
CodeRabbit, Gemini) on PRs #164-#167, all addressed
- **4 critical**: Parse decisions/action_items from LLM synthesis in all
3 meeting protocols; validate `winning_agent_id` in `find_losers()`
- **17 major**: Token budget guards, duplicate participant rejection,
frozen registry, hierarchy tiebreakers, dependency copying,
wake-all-waiters on unsubscribe, and more
- **15 minor**: Duplicate log removal, assert→raise, traceback
preservation, O(1) seniority lookups, NotBlankStr typing, routing
validation ordering
- **4 trivial**: Centralized event constants, docstring/spec fixes
- **6 test/doc gaps**: New `test_parsing.py` (18 tests), expanded tests
across 7 modules, timeout markers, spec name corrections
- **Pre-PR review fixes**: List-item regex crossing line boundaries,
parent_task_id validation ordering, circular exception cause, dead code
removal

## Test plan

- [x] All 3634 unit tests pass
- [x] Ruff lint clean
- [x] Mypy strict clean
- [x] Pre-commit hooks pass
- [ ] CI pipeline (lint + type-check + test + coverage)

Closes #169
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement meeting protocol system with pluggable MeetingProtocol strategies (DESIGN_SPEC §5.7)

2 participants