docs: expand design spec with pluggable strategy protocols by Aureliolo · Pull Request #121 · Aureliolo/synthorg

Aureliolo · 2026-03-06T10:09:24Z

Summary

Major expansion of DESIGN_SPEC.md based on external review feedback (3 independent reviews evaluated). Adds 7 new/rewritten sections covering previously underspecified areas. All new subsystems follow a consistent pattern: pluggable strategies behind protocol interfaces for maximum extensibility.

New sections added

§5.6 Conflict Resolution Protocol — 4 strategies (authority+dissent log, structured debate+judge, human escalation, hybrid with review agent) behind ConflictResolver protocol
§5.7 Meeting Protocol — 3 protocols (round-robin transcript, async position papers+synthesizer, structured phases) behind MeetingProtocol protocol
§6.5 Agent Execution Loop — 3 architectures (ReAct, Plan-and-Execute, Hybrid Plan+ReAct) behind ExecutionLoop protocol with auto-select by task complexity
§7.4 Shared Organizational Memory — 3 backends (hybrid prompt+retrieval for MVP, GraphRAG, temporal knowledge graph) behind OrgMemoryBackend protocol
§11.3 Progressive Trust (rewritten) — 4 strategies (disabled, weighted score, per-category tracks, milestone gates with trust decay) behind TrustStrategy protocol
§12.4 Approval Timeout Policy — 4 policies (wait forever, deny on timeout, tiered by risk, escalation chain) behind TimeoutPolicy protocol, with task park/resume via AgentContext snapshots
§10.4 Auto-downgrade boundary — Clarified as task-assignment only, never mid-execution

Other updates

§17.1 Open Questions: 5 questions marked resolved (Q2, Q5, Q7, Q11-Q13), 4 new questions added (Q11-Q14)
§18.1 Backlog: Conflict resolution protocol moved from backlog to core (§5.6)
Memory layer references: All Mem0 references updated to "candidate" status — memory layer library is TBD with candidates: Mem0, Zep, Letta, Cognee, custom
CLAUDE.md: memory/ package description updated to reflect TBD status

Design principles

Everything is a pluggable strategy behind a protocol interface
All strategies configurable per company, department, or context
MVP defaults identified for each subsystem
Extensible by design — new strategies addable without modifying existing ones

Test plan

Verify DESIGN_SPEC.md renders correctly on GitHub (tables, YAML blocks, diagrams)
Verify all internal cross-references (§5.6, §5.7, §6.5, §7.4, §12.4) link correctly
Verify no Mem0 reference implies it's a decided technology
Verify CLAUDE.md memory package description is consistent

No code changes — docs only.

Add 7 new/rewritten sections to DESIGN_SPEC.md based on external review feedback. All new subsystems are designed as pluggable strategies behind protocol interfaces for maximum extensibility: - §5.6 Conflict Resolution: 4 strategies (authority+dissent, debate+judge, human escalation, hybrid with review agent) behind ConflictResolver protocol - §5.7 Meeting Protocol: 3 protocols (round-robin, position papers, structured phases) behind MeetingProtocol protocol - §6.5 Agent Execution Loop: 3 architectures (ReAct, Plan-and-Execute, Hybrid) behind ExecutionLoop protocol with auto-select by complexity - §7.4 Shared Organizational Memory: 3 backends (hybrid prompt+retrieval, GraphRAG, temporal KG) behind OrgMemoryBackend protocol - §11.3 Progressive Trust: rewritten with 4 strategies (disabled, weighted, per-category, milestone gates) behind TrustStrategy protocol - §12.4 Approval Timeout: 4 policies (wait forever, deny, tiered, escalation chain) behind TimeoutPolicy protocol with task park/resume - §10.4 Auto-downgrade: clarified as task-boundary only, never mid-execution Also updates: - §17.1 Open Questions: 5 resolved, 4 new questions added - §18.1 Backlog: conflict resolution moved from backlog to core - All Mem0 references changed to "candidate" (memory layer TBD) - CLAUDE.md: memory/ package description updated Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T10:09:36Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-03-06T10:09:51Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e2268180-bca4-4723-aa50-2bd2e498fe4d

📥 Commits

Reviewing files that changed from the base of the PR and between 68f368a and c5952d8.

📒 Files selected for processing (2)

DESIGN_SPEC.md
README.md

📝 Walkthrough

Summary by CodeRabbit

Documentation
- Renamed memory component to "Memory layer TBD" across docs and README.
- Expanded design spec with conflict resolution strategies, meeting protocols, multiple agent execution loop architectures, shared organizational memory/backends, approval timeout policies, and progressive trust enhancements.

Walkthrough

Updated wording of the memory component in CLAUDE.md/README.md from "Mem0 adapter" to "memory layer TBD." Substantially expanded DESIGN_SPEC.md with new sections: Conflict Resolution Protocol, Meeting Protocols, Agent Execution Loop options, Shared Organizational Memory, Approval Timeout Policy, and Progressive Trust, plus YAML examples and cross-reference updates.

Changes

Cohort / File(s)	Summary
Memory wording updates `CLAUDE.md`, `README.md`	Replaced "Mem0 adapter" / "Persistent Memory" wording with "memory layer TBD" and listed candidate memory-layer options; wording/status updates only.
Design specification additions `DESIGN_SPEC.md`	Added new sections: Conflict Resolution Protocol (ConflictResolver + strategies), Meeting Protocols, Agent Execution Loop (ReAct, Plan-and-Execute, Hybrid, auto-selection), Shared Organizational Memory (OrgMemoryBackend variants), Approval Timeout Policy, Progressive Trust; included YAML examples, diagrams, and cross-reference updates.

Sequence Diagram(s)

sequenceDiagram
participant AgentA as Agent
participant AgentB as Agent (peer)
participant Resolver as ConflictResolver
participant Log as DissentLog
participant Authority as Authority/Judge/Human

AgentA->>AgentB: Propose action / decision
AgentB-->>AgentA: Dissent / counter-proposal
AgentA->>Resolver: Submit proposals + context
Resolver->>Log: Record dissent entries (YAML metadata)
Resolver->>Authority: Escalate if configured (or run structured debate)
Authority-->>Resolver: Decision (accept/override)
Resolver-->>AgentA: Resolved decision
Resolver-->>AgentB: Resolved decision

sequenceDiagram
participant Agent as Agent (loop)
participant Planner as Planner
participant LM as LanguageModel
participant Executor as Executor
participant Env as Environment
participant Memory as MemoryLayerTBD

Agent->>Planner: Generate plan (when Plan-and-Execute)
Planner->>LM: Draft steps / subgoals
LM->>Memory: Retrieve relevant context
Planner->>Agent: Plan ready
Agent->>Executor: Execute step
Executor->>LM: Request action/content
Executor->>Env: Apply action / observe
Env-->>Executor: Observation/result
Executor->>Agent: Step result
Agent->>Planner: Re-plan or continue (hybrid / ReAct)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

PR #121: Appears to add the same DESIGN_SPEC.md sections and the CLAUDE.md memory wording change — strong overlap in documentation edits.
PR #2: Overlaps on DESIGN_SPEC.md edits and memory-layer terminology; likely touches the same design additions.
PR #114: Modifies CLAUDE.md and DESIGN_SPEC.md memory-related wording consistent with this PR's changes.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: a documentation expansion that adds pluggable strategy protocols to the design spec.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing new sections, design principles, and specific updates across the documentation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch docs/design-spec-expansion

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-06T10:09:53Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's design specification by integrating a robust framework for pluggable strategies across several critical agent subsystems. The changes introduce detailed protocols for conflict resolution, multi-agent meetings, agent execution loops, and organizational memory, alongside refined policies for progressive trust and approval timeouts. This expansion aims to provide maximum extensibility and configurability, ensuring the system can adapt to diverse operational needs and future developments.

Highlights

Expanded Design Specification: Significantly expanded DESIGN_SPEC.md with 7 new or rewritten sections, incorporating feedback from 3 independent external reviews.
Pluggable Strategy Protocols: Introduced a consistent design pattern across new subsystems, emphasizing pluggable strategies behind protocol interfaces for maximum extensibility and configurability.
Conflict Resolution Protocol: Defined four strategies (Authority + Dissent Log, Structured Debate + Judge, Human Escalation, Hybrid) for resolving disagreements between agents.
Meeting Protocol: Outlined three configurable protocols (Round-Robin Transcript, Async Position Papers + Synthesizer, Structured Phases) for multi-agent conversations.
Agent Execution Loop: Detailed three architectures (ReAct, Plan-and-Execute, Hybrid Plan + ReAct) for agent task processing, with an optional auto-selection based on task complexity.
Shared Organizational Memory: Described three backends (Hybrid Prompt + Retrieval, GraphRAG, Temporal Knowledge Graph) for managing company-wide institutional knowledge.
Progressive Trust (Rewritten): Updated the progressive trust section with four distinct strategies (Disabled, Weighted Score, Per-Category Tracks, Milestone Gates) for agents to earn higher tool access.
Approval Timeout Policy: Introduced four policies (Wait Forever, Deny on Timeout, Tiered Timeout, Escalation Chain) for handling human approval requests, including mechanisms for task parking and resumption.
Auto-downgrade Boundary Clarification: Clarified that model downgrades due to budget apply only at task assignment time, preventing mid-execution model switches.
Memory Layer Status Update: Updated all references to 'Mem0' to reflect a 'TBD' status for the memory layer, with multiple candidates currently under evaluation.
Open Questions & Backlog Updates: Marked 5 open questions as resolved, added 4 new ones, and moved the Conflict Resolution Protocol from the backlog to a core section.

Changelog

CLAUDE.md
- Updated the description of the memory/ package to indicate that the memory layer is 'TBD' instead of specifically 'Mem0 adapter'.
DESIGN_SPEC.md
- Added a new section '5.6 Conflict Resolution Protocol' detailing four strategies for resolving agent disagreements.
- Added a new section '5.7 Meeting Protocol' outlining three configurable protocols for multi-agent meetings.
- Added a new section '6.5 Agent Execution Loop' describing three architectures for agent task processing.
- Added a new section '7.4 Shared Organizational Memory' presenting three backends for organizational knowledge.
- Rewrote section '11.3 Progressive Trust' to include four distinct trust strategies.
- Added a new section '12.4 Approval Timeout Policy' defining four policies for handling human approval timeouts, including task parking and resumption.
- Clarified the 'Auto-downgrade boundary' in section '10.4' to apply only at task assignment.
- Updated references to 'Mem0' throughout the document to reflect a 'TBD' memory layer status with multiple candidates.
- Updated section '17.1 Open Questions' by marking 5 questions as resolved and adding 4 new ones.
- Updated section '18.1 Backlog' to reflect the Conflict Resolution Protocol moving to core.

Activity

The DESIGN_SPEC.md was significantly expanded based on feedback gathered from three independent external reviews.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request provides a major and well-structured expansion of the DESIGN_SPEC.md document. It introduces several new sections detailing pluggable strategies for key subsystems like conflict resolution, meeting protocols, and agent execution loops, all following a consistent and extensible pattern. The updates to generalize the memory layer by removing specific Mem0 references are applied consistently throughout the documentation. My review includes a couple of minor suggestions to further improve the clarity and credibility of the design document. Overall, this is a high-quality update that significantly matures the project's design specification.

_{Note: Security Review has been skipped due to the limited scope of the PR.}

gemini-code-assist · 2026-03-06T10:11:51Z

DESIGN_SPEC.md

+
+#### Strategy 2: Structured Debate + Judge
+
+Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record.


For clarity and consistency, consider updating the description of the 'judge' to include all options mentioned in the YAML configuration comment. The text currently mentions 'their shared manager, or a configurable arbitrator agent', but the comment for the judge key also lists 'ceo' as a valid option.

Suggested change

Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record.

Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, the CEO, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record.

gemini-code-assist · 2026-03-06T10:11:51Z

DESIGN_SPEC.md

+    entity_extraction: "auto"           # auto-extract entities from ADRs and policies
+```
+
+- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships


The claim of a '3.4x accuracy improvement' is very specific. To enhance the credibility of the design specification, it would be beneficial to either add a citation for this metric or rephrase it to be more qualitative if a source isn't readily available. For example: 'Offers significant accuracy improvements over vector-only retrieval...'

Suggested change

- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships

- Offers significant accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 770-772: The documented allowed values for execution_loop are
missing "auto" while the spec later uses execution_loop: "auto"; update the
documentation for the execution_loop setting (the YAML line showing
execution_loop: "react") to include "auto" in the value set and adjust its
descriptive text accordingly so the config contract lists "react, plan_execute,
hybrid, auto" and explains what "auto" does; target the execution_loop
declaration and its description to ensure consistency with the later use of
execution_loop: "auto".
- Around line 1703-1707: The Agent Memory row omits "Cognee" compared to later
sections; update the table entry (the cell labeled "Agent Memory") to list the
standardized candidates "Mem0, Zep, Letta, Cognee, custom" and retain the "+
SQLite" note so it matches the later §15.2 references and other occurrences of
the candidate list.
- Around line 954-956: The blockquote containing the OrgMemoryBackend
description and the "Write access control" note has an extra blank line breaking
the quoted callout (triggers MD028); remove the empty line so the two sections
remain part of the same blockquote and ensure both lines begin with '>'
(references: OrgMemoryBackend, query, write, list_policies), then re-run
markdownlint to confirm MD028 is resolved.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ce35d684-7a4e-471f-ada6-fa490095d3c2

📥 Commits

Reviewing files that changed from the base of the PR and between ef89b90 and 68f368a.

📒 Files selected for processing (2)

CLAUDE.md
DESIGN_SPEC.md

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Agent
GitHub Check: Greptile Review

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T10:01:15.539Z
Learning: Update DESIGN_SPEC.md to reflect approved deviations from the original specification

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T10:01:15.539Z
Learning: Always read DESIGN_SPEC.md before implementing any feature or planning any issue

📚 Learning: 2026-02-26T17:43:50.902Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-06T10:01:15.539Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T10:01:15.539Z
Learning: Update DESIGN_SPEC.md to reflect approved deviations from the original specification

Applied to files:

DESIGN_SPEC.md

📚 Learning: 2026-01-24T16:33:29.354Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Ollama Integration - all AI agents use Ollama for local LLM serving with default endpoint `http://localhost:11434`

Applied to files:

DESIGN_SPEC.md

📚 Learning: 2026-01-24T09:54:45.426Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Configure appropriate temperature settings based on agent role: Writer (0.9), Editor (0.6), Continuity (0.3), Architect (0.85), Interviewer (0.7)

Applied to files:

DESIGN_SPEC.md

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/agents/*.py : Agent temperature settings: Writer (0.9), Editor (0.6), Continuity (0.3), Architect (0.85), Interviewer (0.7)

Applied to files:

DESIGN_SPEC.md

🪛 LanguageTool

DESIGN_SPEC.md

[grammar] ~776-~776: Please add a punctuation mark at the end of paragraph.
Context: ...s, quick fixes, single-file changes, M3 MVP #### Loop 2: Plan-and-Execute A two-p...

(PUNCTUATION_PARAGRAPH_END)

[typographical] ~780-~780: In American English, use a period after an abbreviation.
Context: ...fferent models can be used for planning vs execution (e.g., Opus for planning, Hai...

(MISSING_PERIOD_AFTER_ABBREVIATION)

[grammar] ~806-~806: Please add a punctuation mark at the end of paragraph.
Context: ...pic-level work, tasks spanning multiple files #### Loop 3: Hybrid Plan + ReAct Steps...

(PUNCTUATION_PARAGRAPH_END)

[style] ~810-~810: Since ownership is already implied, this phrasing may be redundant.
Context: ...p is executed as a mini-ReAct loop with its own turn limit. After each step, the agent ...

(PRP_OWN)

[grammar] ~845-~845: Please add a punctuation mark at the end of paragraph.
Context: ...ring, tasks requiring both planning and adaptivity > Auto-selection (optional): When ...

(PUNCTUATION_PARAGRAPH_END)

[style] ~903-~903: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...s, e.g., "no commits to main," "all PRs need 2 approvals") are injected into every a...

(EN_REPEATEDWORDS_NEED)

[grammar] ~922-~922: Please add a punctuation mark at the end of paragraph.
Context: ...may miss relational connections between policies #### Backend 2: GraphRAG Knowledge Gra...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~937-~937: Please add a punctuation mark at the end of paragraph.
Context: ...Entity extraction can be noisy. Heavier setup #### Backend 3: Temporal Knowledge Gra...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~952-~952: Please add a punctuation mark at the end of paragraph.
Context: ...kill for small companies or local-first use > Extensibility: All backends impl...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~1283-~1283: Please add a punctuation mark at the end of paragraph.
Context: ...ile edits shouldn't auto-get deployment access #### Strategy: Per-Category Trust Trac...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~1312-~1312: Please add a punctuation mark at the end of paragraph.
Context: ...rust state is a matrix per agent, not a scalar #### Strategy: Milestone Gates (ATF-In...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~1343-~1343: Please add a punctuation mark at the end of paragraph.
Context: ...ay may need tuning to avoid frustrating users --- ## 12. Security & Approval System...

(PUNCTUATION_PARAGRAPH_END)

[style] ~1426-~1426: ‘in the meantime’ might be wordy. Consider a shorter alternative.
Context: ...iting approval and works on other tasks in the meantime. ```yaml approval_timeout: policy: "...

(EN_WORDINESS_PREMIUM_IN_THE_MEANTIME)

[grammar] ~1434-~1434: Please add a punctuation mark at the end of paragraph.
Context: ...if human is unavailable. Queue can grow unbounded #### Policy 2: Deny on Timeout All un...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~1447-~1447: Please add a punctuation mark at the end of paragraph.
Context: ...egitimate work if human is consistently slow #### Policy 3: Tiered Timeout Differe...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~1472-~1472: Please add a punctuation mark at the end of paragraph.
Context: ...s risk. Tuning tier boundaries requires experience #### Policy 4: Escalation Chain On ti...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~1492-~1492: Please add a punctuation mark at the end of paragraph.
Context: ...chain. More humans involved. Complex to implement > Task Suspension and Resumption: ...

(PUNCTUATION_PARAGRAPH_END)

[style] ~1494-~1494: Consider using the typographical ellipsis character here instead.
Context: ...spension. This works naturally with the model_copy(update=...) immutability pattern — the snapshot i...

(ELLIPSIS)

(MISSING_PERIOD_AFTER_ABBREVIATION)

(MISSING_PERIOD_AFTER_ABBREVIATION)

(MISSING_PERIOD_AFTER_ABBREVIATION)

[typographical] ~1961-~1961: In American English, use a period after an abbreviation.
Context: ...n | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the minimum ...

(MISSING_PERIOD_AFTER_ABBREVIATION)

🪛 markdownlint-cli2 (0.21.0)

DESIGN_SPEC.md

[warning] 955-955: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (1)

CLAUDE.md (1)

51-51: Good terminology update.

This wording matches the current state better than a concrete Mem0 reference and stays aligned with the spec’s “candidate memory layer” framing.

DESIGN_SPEC.md

greptile-apps · 2026-03-06T10:14:11Z

Greptile Summary

This PR significantly expands DESIGN_SPEC.md with 7 new/rewritten sections, introducing a consistent architectural pattern: pluggable strategies behind protocol interfaces. The additions cover conflict resolution (§5.6), meeting protocols (§5.7), agent execution loops (§6.5), shared organizational memory (§7.4), progressive trust (§11.3), approval timeout policies (§12.4), and auto-downgrade boundary clarification (§10.4). Supporting files README.md and CLAUDE.md are updated to reflect the TBD memory layer decision.

Key strengths:

The pluggable strategy pattern is applied consistently across all new subsystems, with MVP defaults identified and clear extensibility notes.
All design decisions align with the stated principles; the spec is well-structured and internally coherent.
All Mem0-specific references have been appropriately generalized to "TBD" with candidates listed.

Minor improvements identified:

Three new protocols (ConflictResolver, MeetingProtocol, ExecutionLoop, TrustStrategy, TimeoutPolicy) lack explicit method signatures, creating inconsistency with OrgMemoryBackend which does define them. Adding provisional signatures would make the spec more actionable for implementers.
The auto_loop_rules config example is incomplete — it shows threshold keys but omits the loop type assignments that would clarify the mapping for implementers.
The README feature bullet wording (memory layer TBD) exposes internal uncertainty to users. Reframing as (via pluggable memory layer) would communicate the design decision more confidently.

Confidence Score: 4/5

Docs-only PR with no code changes — safe to merge with minor documentation clarity improvements suggested.
This is a documentation-only PR with no runtime impact. All changes are confined to design and supporting documentation files (DESIGN_SPEC.md, README.md, CLAUDE.md). The new design sections are well-structured, internally consistent, and follow the established pluggable-strategy pattern. Three minor style-level issues were identified: incomplete config example (line 847), missing protocol method signatures (5 protocols), and public-facing wording that could be more confident (README line 18). These are all non-blocking documentation gaps that improve clarity for implementers and users without affecting the validity or safety of the design.
DESIGN_SPEC.md requires minor improvements to method signatures and config example clarity; README.md feature bullet phrasing could be stronger.

_{Last reviewed commit: c5952d8}

DESIGN_SPEC.md

Copilot

Pull request overview

Expands the design documentation to formalize several previously underspecified subsystems using a consistent “pluggable strategy behind protocol interface” pattern (conflict resolution, meetings, execution loops, shared org memory, trust, approval timeouts), and updates memory-layer references to reflect a TBD vendor decision.

Changes:

Add new protocol/strategy sections for conflict resolution, meeting coordination, and agent execution loop architectures.
Introduce a shared organizational memory backend abstraction and rewrite progressive trust/approval-timeout policy sections.
Update memory-layer references (Mem0 → “candidates/TBD”) and align CLAUDE.md package description accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
DESIGN_SPEC.md	Major spec expansion with new protocol-based subsystems; updates memory-layer positioning to TBD/candidates.
CLAUDE.md	Updates `memory/` package description to reflect memory layer is TBD.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T10:15:12Z

DESIGN_SPEC.md

+```yaml
+meeting_protocol: "round_robin"
+round_robin:
+  max_turns_per_agent: 2
+  max_total_turns: 16
+  leader_summarizes: true


The Meeting Protocol YAML examples use top-level keys (meeting_protocol, round_robin, etc.), but the existing meeting configuration in §5.4 nests meetings under communication.meetings.types. To avoid readers configuring this incorrectly, show these protocol settings in the same structure (e.g., per communication.meetings.types[].protocol / protocol-specific options) or explicitly state where meeting_protocol lives in the config schema.

Copilot · 2026-03-06T10:15:12Z

DESIGN_SPEC.md

+- **Best for**: Complex tasks, multi-file refactoring, tasks requiring both planning and adaptivity
+
+> **Auto-selection (optional):** When `execution_loop: "auto"`, the framework selects the loop based on `estimated_complexity`: simple → ReAct, medium → Plan-and-Execute, complex/epic → Hybrid. Configurable via `auto_loop_rules`.
+


auto_loop_rules is referenced here as a configuration mechanism, but it isn’t defined anywhere else in the spec. Either add a short definition (expected shape + where it lives in config) or remove the reference to avoid introducing an undocumented config surface.

Suggested change

**`auto_loop_rules` configuration**

- Optional top-level config key, defined alongside `execution_loop`.

- Controls how `estimated_complexity` values are mapped to loop implementations when `execution_loop: "auto"`.

- Recommended shape:

```yaml

# Example agent/config YAML

execution_loop: "auto"

auto_loop_rules:

thresholds:

simple_max_tokens: 500 # up to this → "simple"

medium_max_tokens: 3000 # up to this → "medium"; above → "complex"

mapping:

simple: "react" # use ReAct loop

medium: "plan_and_execute" # use Plan-and-Execute loop

complex: "hybrid" # use Hybrid loop

Copilot · 2026-03-06T10:15:12Z

DESIGN_SPEC.md

+    entity_extraction: "auto"           # auto-extract entities from ADRs and policies
+```
+
+- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships


The “3.4x accuracy improvement” claim is very specific but has no citation or qualifier, which makes it read like a measured guarantee. Consider adding a source link/footnote or rephrasing to a qualitative statement (e.g., “can improve accuracy vs vector-only retrieval in some benchmarks”) to avoid misleading readers.

Suggested change

- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships

- Can improve accuracy over vector-only retrieval in some scenarios; multi-hop reasoning captures policy relationships

Copilot · 2026-03-06T10:15:13Z

DESIGN_SPEC.md

  auto_downgrade:
    enabled: true
    threshold: 85              # percent of budget used
+    boundary: "task_assignment" # task_assignment only — NEVER mid-execution


This adds budget.auto_downgrade.boundary, but the current config model AutoDowngradeConfig in src/ai_company/budget/config.py only defines enabled, threshold, and downgrade_map. Either (a) remove boundary from the YAML example and keep the boundary behavior as a documented invariant, or (b) update the config schema/implementation to accept and validate this new field.

Suggested change

boundary: "task_assignment" # task_assignment only — NEVER mid-execution

Copilot · 2026-03-06T10:15:13Z

DESIGN_SPEC.md

+
+When an action requires human approval (per autonomy level in §12.2), the agent must wait. The framework provides configurable timeout policies that determine what happens when a human doesn't respond. All policies implement a `TimeoutPolicy` protocol. The policy is configurable per autonomy level and per action risk tier.
+
+During any wait — regardless of policy — the agent **parks** the blocked task (saving its full `AgentContext` snapshot: conversation, progress, accumulated cost, turn count) and picks up other available tasks from its queue. When approval eventually arrives, the agent **resumes** the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives.


This section uses “AgentContext snapshot” to mean the full persisted execution state (including conversation), but in the current codebase AgentContextSnapshot is a compact reporting/logging snapshot and does not include the conversation contents. To avoid confusion/incorrect implementation, consider explicitly distinguishing between a persisted/serialized AgentContext (full state) vs AgentContextSnapshot (telemetry), or rename the persisted artifact in the spec.

Suggested change

During any wait — regardless of policy — the agent **parks** the blocked task (saving its full `AgentContext` snapshot: conversation, progress, accumulated cost, turn count) and picks up other available tasks from its queue. When approval eventually arrives, the agent **resumes** the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives.

During any wait — regardless of policy — the agent **parks** the blocked task (saving its full serialized `AgentContext` state: conversation, progress, accumulated cost, turn count — i.e., the complete persisted context, not just the compact `AgentContextSnapshot` used for telemetry) and picks up other available tasks from its queue. When approval eventually arrives, the agent **resumes** the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives.

…, Copilot, and Greptile - README.md: update 2 Mem0 references to TBD (memory layer undecided) - DESIGN_SPEC.md §15.2: add missing Cognee to memory candidate list - DESIGN_SPEC.md §15.4: clarify OrgMemoryBackend vs agent memory types - DESIGN_SPEC.md §11.3: make weighted trust human approval gate a structured field (consistent with other strategies) - DESIGN_SPEC.md §5.6: add CEO to debate judge options - DESIGN_SPEC.md §6.5: add "auto" to execution_loop YAML values - DESIGN_SPEC.md §7.4: fix MD028 blank line inside blockquote - DESIGN_SPEC.md §7.4: qualify 3.4x accuracy claim - DESIGN_SPEC.md §6.5: define auto_loop_rules inline - DESIGN_SPEC.md §12.4: clarify AgentContext vs AgentContextSnapshot Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-06T10:28:45Z

DESIGN_SPEC.md

+- Most complex to implement. Plan granularity needs tuning per task type
+- **Best for**: Complex tasks, multi-file refactoring, tasks requiring both planning and adaptivity
+
+> **Auto-selection (optional):** When `execution_loop: "auto"`, the framework selects the loop based on `estimated_complexity`: simple → ReAct, medium → Plan-and-Execute, complex/epic → Hybrid. Configurable via `auto_loop_rules` — a mapping of complexity thresholds to loop implementations (e.g., `{simple_max_tokens: 500, medium_max_tokens: 3000}` with corresponding loop assignments).


The auto-selection note mentions auto_loop_rules as "a mapping of complexity thresholds to loop implementations" and includes an example with threshold keys (simple_max_tokens: 500, medium_max_tokens: 3000) but omits the loop type assignments that determine which loop handles each threshold range. This leaves the config structure ambiguous for implementers.

Consider completing the example inline to show the full structure:

execution_loop: "auto" auto_loop_rules: simple_max_tokens: 500 # context <= 500 tokens → react medium_max_tokens: 3000 # context <= 3000 tokens → plan_execute # context > 3000 tokens → hybrid simple_loop: "react" medium_loop: "plan_execute" complex_loop: "hybrid"

This clarifies the mapping between thresholds and loop implementations.

Prompt To Fix With AI

This is a comment left during a code review. Path: DESIGN_SPEC.md Line: 847 Comment: The auto-selection note mentions `auto_loop_rules` as "a mapping of complexity thresholds to loop implementations" and includes an example with threshold keys (`simple_max_tokens: 500, medium_max_tokens: 3000`) but omits the loop type assignments that determine which loop handles each threshold range. This leaves the config structure ambiguous for implementers. Consider completing the example inline to show the full structure: ```yaml execution_loop: "auto" auto_loop_rules: simple_max_tokens: 500 # context <= 500 tokens → react medium_max_tokens: 3000 # context <= 3000 tokens → plan_execute # context > 3000 tokens → hybrid simple_loop: "react" medium_loop: "plan_execute" complex_loop: "hybrid" ``` This clarifies the mapping between thresholds and loop implementations. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-06T10:28:46Z

DESIGN_SPEC.md


+### 5.6 Conflict Resolution Protocol
+
+When two or more agents disagree on an approach (architecture, implementation, priority, etc.), the framework provides multiple configurable resolution strategies behind a `ConflictResolver` protocol. New strategies can be added without modifying existing ones. The strategy is configurable per company, per department, or per conflict type.


Five of the six new protocols introduced in this PR lack explicit method signatures. In contrast, OrgMemoryBackend (§7.4, line 954) defines its interface clearly:

query(context) → list[OrgFact], write(fact, author), list_policies()

The following protocols are mentioned by name but have no method signatures defined:

ConflictResolver (§5.6, line 517)

MeetingProtocol (§5.7, line 581)

ExecutionLoop (§6.5, line 749)

TrustStrategy (§11.3, line 1250)

TimeoutPolicy (§12.4, line 1421)

Since a core design principle of this PR is "pluggable strategies behind protocol interfaces," adding signature blocks to each protocol (even if provisional) would make the spec actionable for implementers and more consistent with the OrgMemoryBackend example.

Prompt To Fix With AI

This is a comment left during a code review. Path: DESIGN_SPEC.md Line: 517 Comment: Five of the six new protocols introduced in this PR lack explicit method signatures. In contrast, `OrgMemoryBackend` (§7.4, line 954) defines its interface clearly: > `query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()` The following protocols are mentioned by name but have no method signatures defined: - `ConflictResolver` (§5.6, line 517) - `MeetingProtocol` (§5.7, line 581) - `ExecutionLoop` (§6.5, line 749) - `TrustStrategy` (§11.3, line 1250) - `TimeoutPolicy` (§12.4, line 1421) Since a core design principle of this PR is "pluggable strategies behind protocol interfaces," adding signature blocks to each protocol (even if provisional) would make the spec actionable for implementers and more consistent with the `OrgMemoryBackend` example. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-06T10:28:48Z

README.md

 - **Smart Cost Management** - Per-agent budget tracking, auto model routing, CFO agent optimization
 - **Configurable Autonomy** - From fully autonomous to human-approves-everything, with a Security Ops agent in between
- **Persistent Memory** - Agents remember past decisions, code, relationships (via Mem0)
+- **Persistent Memory** - Agents remember past decisions, code, relationships (memory layer TBD)


The change from (via Mem0) to (memory layer TBD) accurately reflects the current state but exposes internal uncertainty to users and contributors reading the README. Since the underlying design commitment (a pluggable memory layer) is firm, consider phrasing that conveys the design decision without exposing the unresolved vendor choice:

Suggested change

- **Persistent Memory** - Agents remember past decisions, code, relationships (memory layer TBD)

- **Persistent Memory** - Agents remember past decisions, code, relationships (via pluggable memory layer)

This keeps the README confident and user-facing while remaining accurate about the architecture.

Prompt To Fix With AI

This is a comment left during a code review. Path: README.md Line: 18 Comment: The change from `(via Mem0)` to `(memory layer TBD)` accurately reflects the current state but exposes internal uncertainty to users and contributors reading the README. Since the underlying design commitment (a pluggable memory layer) is firm, consider phrasing that conveys the design decision without exposing the unresolved vendor choice: ```suggestion - **Persistent Memory** - Agents remember past decisions, code, relationships (via pluggable memory layer) ``` This keeps the README confident and user-facing while remaining accurate about the architecture. How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 6, 2026 10:09

Copilot started reviewing on behalf of Aureliolo March 6, 2026 10:09 View session

gemini-code-assist bot reviewed Mar 6, 2026

View reviewed changes

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

DESIGN_SPEC.md Show resolved Hide resolved

DESIGN_SPEC.md Show resolved Hide resolved

DESIGN_SPEC.md Show resolved Hide resolved

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

DESIGN_SPEC.md Outdated Show resolved Hide resolved

DESIGN_SPEC.md Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Aureliolo merged commit 6832db6 into main Mar 6, 2026
8 of 9 checks passed

Aureliolo deleted the docs/design-spec-expansion branch March 6, 2026 10:22

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

coderabbitai bot mentioned this pull request Mar 10, 2026

feat: implement inter-company communication #250

Open

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release ai-company 0.1.1 #282

Merged

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release 0.1.0 #283

Merged

This was referenced Mar 15, 2026

chore(main): release 0.2.4 #431

Merged

chore(main): release 0.2.0 #442

Closed

chore(main): release 0.2.5 #447

Merged

chore(main): release 0.2.0 #460

Closed

chore(main): release 0.2.0 #471

Closed


		#### Strategy 2: Structured Debate + Judge

		Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record.

	- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships
	- Offers significant accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships

		- Best for: Complex tasks, multi-file refactoring, tasks requiring both planning and adaptivity

		> Auto-selection (optional): When `execution_loop: "auto"`, the framework selects the loop based on `estimated_complexity`: simple → ReAct, medium → Plan-and-Execute, complex/epic → Hybrid. Configurable via `auto_loop_rules`.

+**`auto_loop_rules` configuration**
+- Optional top-level config key, defined alongside `execution_loop`.
+- Controls how `estimated_complexity` values are mapped to loop implementations when `execution_loop: "auto"`.
+- Recommended shape:
+  ```yaml
+  # Example agent/config YAML
+  execution_loop: "auto"
+  auto_loop_rules:
+    thresholds:
+      simple_max_tokens: 500        # up to this → "simple"
+      medium_max_tokens: 3000       # up to this → "medium"; above → "complex"
+    mapping:
+      simple: "react"               # use ReAct loop
+      medium: "plan_and_execute"    # use Plan-and-Execute loop
+      complex: "hybrid"             # use Hybrid loop

	- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships
	- Can improve accuracy over vector-only retrieval in some scenarios; multi-hop reasoning captures policy relationships


		When an action requires human approval (per autonomy level in §12.2), the agent must wait. The framework provides configurable timeout policies that determine what happens when a human doesn't respond. All policies implement a `TimeoutPolicy` protocol. The policy is configurable per autonomy level and per action risk tier.

		During any wait — regardless of policy — the agent parks the blocked task (saving its full `AgentContext` snapshot: conversation, progress, accumulated cost, turn count) and picks up other available tasks from its queue. When approval eventually arrives, the agent resumes the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives.


		### 5.6 Conflict Resolution Protocol

		When two or more agents disagree on an approach (architecture, implementation, priority, etc.), the framework provides multiple configurable resolution strategies behind a `ConflictResolver` protocol. New strategies can be added without modifying existing ones. The strategy is configurable per company, per department, or per conflict type.

	- Persistent Memory - Agents remember past decisions, code, relationships (memory layer TBD)
	- Persistent Memory - Agents remember past decisions, code, relationships (via pluggable memory layer)

Conversation

Aureliolo commented Mar 6, 2026

Summary

New sections added

Other updates

Design principles

Test plan

Uh oh!

github-actions bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

gemini-code-assist bot commented Mar 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

github-actions bot commented Mar 6, 2026 •

edited

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

greptile-apps bot commented Mar 6, 2026 •

edited

Loading