docs: expand design spec with pluggable strategy protocols#121
docs: expand design spec with pluggable strategy protocols#121
Conversation
Add 7 new/rewritten sections to DESIGN_SPEC.md based on external review feedback. All new subsystems are designed as pluggable strategies behind protocol interfaces for maximum extensibility: - §5.6 Conflict Resolution: 4 strategies (authority+dissent, debate+judge, human escalation, hybrid with review agent) behind ConflictResolver protocol - §5.7 Meeting Protocol: 3 protocols (round-robin, position papers, structured phases) behind MeetingProtocol protocol - §6.5 Agent Execution Loop: 3 architectures (ReAct, Plan-and-Execute, Hybrid) behind ExecutionLoop protocol with auto-select by complexity - §7.4 Shared Organizational Memory: 3 backends (hybrid prompt+retrieval, GraphRAG, temporal KG) behind OrgMemoryBackend protocol - §11.3 Progressive Trust: rewritten with 4 strategies (disabled, weighted, per-category, milestone gates) behind TrustStrategy protocol - §12.4 Approval Timeout: 4 policies (wait forever, deny, tiered, escalation chain) behind TimeoutPolicy protocol with task park/resume - §10.4 Auto-downgrade: clarified as task-boundary only, never mid-execution Also updates: - §17.1 Open Questions: 5 resolved, 4 new questions added - §18.1 Backlog: conflict resolution moved from backlog to core - All Mem0 references changed to "candidate" (memory layer TBD) - CLAUDE.md: memory/ package description updated Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughUpdated wording of the memory component in CLAUDE.md/README.md from "Mem0 adapter" to "memory layer TBD." Substantially expanded DESIGN_SPEC.md with new sections: Conflict Resolution Protocol, Meeting Protocols, Agent Execution Loop options, Shared Organizational Memory, Approval Timeout Policy, and Progressive Trust, plus YAML examples and cross-reference updates. Changes
Sequence Diagram(s)sequenceDiagram
participant AgentA as Agent
participant AgentB as Agent (peer)
participant Resolver as ConflictResolver
participant Log as DissentLog
participant Authority as Authority/Judge/Human
AgentA->>AgentB: Propose action / decision
AgentB-->>AgentA: Dissent / counter-proposal
AgentA->>Resolver: Submit proposals + context
Resolver->>Log: Record dissent entries (YAML metadata)
Resolver->>Authority: Escalate if configured (or run structured debate)
Authority-->>Resolver: Decision (accept/override)
Resolver-->>AgentA: Resolved decision
Resolver-->>AgentB: Resolved decision
sequenceDiagram
participant Agent as Agent (loop)
participant Planner as Planner
participant LM as LanguageModel
participant Executor as Executor
participant Env as Environment
participant Memory as MemoryLayerTBD
Agent->>Planner: Generate plan (when Plan-and-Execute)
Planner->>LM: Draft steps / subgoals
LM->>Memory: Retrieve relevant context
Planner->>Agent: Plan ready
Agent->>Executor: Execute step
Executor->>LM: Request action/content
Executor->>Env: Apply action / observe
Env-->>Executor: Observation/result
Executor->>Agent: Step result
Agent->>Planner: Re-plan or continue (hybrid / ReAct)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's design specification by integrating a robust framework for pluggable strategies across several critical agent subsystems. The changes introduce detailed protocols for conflict resolution, multi-agent meetings, agent execution loops, and organizational memory, alongside refined policies for progressive trust and approval timeouts. This expansion aims to provide maximum extensibility and configurability, ensuring the system can adapt to diverse operational needs and future developments. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request provides a major and well-structured expansion of the DESIGN_SPEC.md document. It introduces several new sections detailing pluggable strategies for key subsystems like conflict resolution, meeting protocols, and agent execution loops, all following a consistent and extensible pattern. The updates to generalize the memory layer by removing specific Mem0 references are applied consistently throughout the documentation. My review includes a couple of minor suggestions to further improve the clarity and credibility of the design document. Overall, this is a high-quality update that significantly matures the project's design specification.
Note: Security Review has been skipped due to the limited scope of the PR.
DESIGN_SPEC.md
Outdated
|
|
||
| #### Strategy 2: Structured Debate + Judge | ||
|
|
||
| Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record. |
There was a problem hiding this comment.
For clarity and consistency, consider updating the description of the 'judge' to include all options mentioned in the YAML configuration comment. The text currently mentions 'their shared manager, or a configurable arbitrator agent', but the comment for the judge key also lists 'ceo' as a valid option.
| Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record. | |
| Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, the CEO, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record. |
DESIGN_SPEC.md
Outdated
| entity_extraction: "auto" # auto-extract entities from ADRs and policies | ||
| ``` | ||
|
|
||
| - 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships |
There was a problem hiding this comment.
The claim of a '3.4x accuracy improvement' is very specific. To enhance the credibility of the design specification, it would be beneficial to either add a citation for this metric or rephrase it to be more qualitative if a source isn't readily available. For example: 'Offers significant accuracy improvements over vector-only retrieval...'
| - 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships | |
| - Offers significant accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 770-772: The documented allowed values for execution_loop are
missing "auto" while the spec later uses execution_loop: "auto"; update the
documentation for the execution_loop setting (the YAML line showing
execution_loop: "react") to include "auto" in the value set and adjust its
descriptive text accordingly so the config contract lists "react, plan_execute,
hybrid, auto" and explains what "auto" does; target the execution_loop
declaration and its description to ensure consistency with the later use of
execution_loop: "auto".
- Around line 1703-1707: The Agent Memory row omits "Cognee" compared to later
sections; update the table entry (the cell labeled "Agent Memory") to list the
standardized candidates "Mem0, Zep, Letta, Cognee, custom" and retain the "+
SQLite" note so it matches the later §15.2 references and other occurrences of
the candidate list.
- Around line 954-956: The blockquote containing the OrgMemoryBackend
description and the "Write access control" note has an extra blank line breaking
the quoted callout (triggers MD028); remove the empty line so the two sections
remain part of the same blockquote and ensure both lines begin with '>'
(references: OrgMemoryBackend, query, write, list_policies), then re-run
markdownlint to confirm MD028 is resolved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ce35d684-7a4e-471f-ada6-fa490095d3c2
📒 Files selected for processing (2)
CLAUDE.mdDESIGN_SPEC.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T10:01:15.539Z
Learning: Update DESIGN_SPEC.md to reflect approved deviations from the original specification
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T10:01:15.539Z
Learning: Always read DESIGN_SPEC.md before implementing any feature or planning any issue
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-06T10:01:15.539Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T10:01:15.539Z
Learning: Update DESIGN_SPEC.md to reflect approved deviations from the original specification
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Ollama Integration - all AI agents use Ollama for local LLM serving with default endpoint `http://localhost:11434`
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Configure appropriate temperature settings based on agent role: Writer (0.9), Editor (0.6), Continuity (0.3), Architect (0.85), Interviewer (0.7)
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/agents/*.py : Agent temperature settings: Writer (0.9), Editor (0.6), Continuity (0.3), Architect (0.85), Interviewer (0.7)
Applied to files:
DESIGN_SPEC.md
🪛 LanguageTool
DESIGN_SPEC.md
[grammar] ~776-~776: Please add a punctuation mark at the end of paragraph.
Context: ...s, quick fixes, single-file changes, M3 MVP #### Loop 2: Plan-and-Execute A two-p...
(PUNCTUATION_PARAGRAPH_END)
[typographical] ~780-~780: In American English, use a period after an abbreviation.
Context: ...fferent models can be used for planning vs execution (e.g., Opus for planning, Hai...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[grammar] ~806-~806: Please add a punctuation mark at the end of paragraph.
Context: ...pic-level work, tasks spanning multiple files #### Loop 3: Hybrid Plan + ReAct Steps...
(PUNCTUATION_PARAGRAPH_END)
[style] ~810-~810: Since ownership is already implied, this phrasing may be redundant.
Context: ...p is executed as a mini-ReAct loop with its own turn limit. After each step, the agent ...
(PRP_OWN)
[grammar] ~845-~845: Please add a punctuation mark at the end of paragraph.
Context: ...ring, tasks requiring both planning and adaptivity > Auto-selection (optional): When ...
(PUNCTUATION_PARAGRAPH_END)
[style] ~903-~903: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...s, e.g., "no commits to main," "all PRs need 2 approvals") are injected into every a...
(EN_REPEATEDWORDS_NEED)
[grammar] ~922-~922: Please add a punctuation mark at the end of paragraph.
Context: ...may miss relational connections between policies #### Backend 2: GraphRAG Knowledge Gra...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~937-~937: Please add a punctuation mark at the end of paragraph.
Context: ...Entity extraction can be noisy. Heavier setup #### Backend 3: Temporal Knowledge Gra...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~952-~952: Please add a punctuation mark at the end of paragraph.
Context: ...kill for small companies or local-first use > Extensibility: All backends impl...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~1283-~1283: Please add a punctuation mark at the end of paragraph.
Context: ...ile edits shouldn't auto-get deployment access #### Strategy: Per-Category Trust Trac...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~1312-~1312: Please add a punctuation mark at the end of paragraph.
Context: ...rust state is a matrix per agent, not a scalar #### Strategy: Milestone Gates (ATF-In...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~1343-~1343: Please add a punctuation mark at the end of paragraph.
Context: ...ay may need tuning to avoid frustrating users --- ## 12. Security & Approval System...
(PUNCTUATION_PARAGRAPH_END)
[style] ~1426-~1426: ‘in the meantime’ might be wordy. Consider a shorter alternative.
Context: ...iting approval and works on other tasks in the meantime. ```yaml approval_timeout: policy: "...
(EN_WORDINESS_PREMIUM_IN_THE_MEANTIME)
[grammar] ~1434-~1434: Please add a punctuation mark at the end of paragraph.
Context: ...if human is unavailable. Queue can grow unbounded #### Policy 2: Deny on Timeout All un...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~1447-~1447: Please add a punctuation mark at the end of paragraph.
Context: ...egitimate work if human is consistently slow #### Policy 3: Tiered Timeout Differe...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~1472-~1472: Please add a punctuation mark at the end of paragraph.
Context: ...s risk. Tuning tier boundaries requires experience #### Policy 4: Escalation Chain On ti...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~1492-~1492: Please add a punctuation mark at the end of paragraph.
Context: ...chain. More humans involved. Complex to implement > Task Suspension and Resumption: ...
(PUNCTUATION_PARAGRAPH_END)
[style] ~1494-~1494: Consider using the typographical ellipsis character here instead.
Context: ...spension. This works naturally with the model_copy(update=...) immutability pattern — the snapshot i...
(ELLIPSIS)
[typographical] ~1960-~1960: In American English, use a period after an abbreviation.
Context: ...cture? | Medium | Open | asyncio queues vs Redis vs embedded broker | | 9 | How to...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1960-~1960: In American English, use a period after an abbreviation.
Context: ...Medium | Open | asyncio queues vs Redis vs embedded broker | | 9 | How to handle c...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1961-~1961: In American English, use a period after an abbreviation.
Context: ...gh | Open | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the ...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1961-~1961: In American English, use a period after an abbreviation.
Context: ...n | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the minimum ...
(MISSING_PERIOD_AFTER_ABBREVIATION)
🪛 markdownlint-cli2 (0.21.0)
DESIGN_SPEC.md
[warning] 955-955: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🔇 Additional comments (1)
CLAUDE.md (1)
51-51: Good terminology update.This wording matches the current state better than a concrete Mem0 reference and stays aligned with the spec’s “candidate memory layer” framing.
Greptile SummaryThis PR significantly expands Key strengths:
Minor improvements identified:
Confidence Score: 4/5
Last reviewed commit: c5952d8 |
There was a problem hiding this comment.
Pull request overview
Expands the design documentation to formalize several previously underspecified subsystems using a consistent “pluggable strategy behind protocol interface” pattern (conflict resolution, meetings, execution loops, shared org memory, trust, approval timeouts), and updates memory-layer references to reflect a TBD vendor decision.
Changes:
- Add new protocol/strategy sections for conflict resolution, meeting coordination, and agent execution loop architectures.
- Introduce a shared organizational memory backend abstraction and rewrite progressive trust/approval-timeout policy sections.
- Update memory-layer references (Mem0 → “candidates/TBD”) and align CLAUDE.md package description accordingly.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| DESIGN_SPEC.md | Major spec expansion with new protocol-based subsystems; updates memory-layer positioning to TBD/candidates. |
| CLAUDE.md | Updates memory/ package description to reflect memory layer is TBD. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ```yaml | ||
| meeting_protocol: "round_robin" | ||
| round_robin: | ||
| max_turns_per_agent: 2 | ||
| max_total_turns: 16 | ||
| leader_summarizes: true |
There was a problem hiding this comment.
The Meeting Protocol YAML examples use top-level keys (meeting_protocol, round_robin, etc.), but the existing meeting configuration in §5.4 nests meetings under communication.meetings.types. To avoid readers configuring this incorrectly, show these protocol settings in the same structure (e.g., per communication.meetings.types[].protocol / protocol-specific options) or explicitly state where meeting_protocol lives in the config schema.
| - **Best for**: Complex tasks, multi-file refactoring, tasks requiring both planning and adaptivity | ||
|
|
||
| > **Auto-selection (optional):** When `execution_loop: "auto"`, the framework selects the loop based on `estimated_complexity`: simple → ReAct, medium → Plan-and-Execute, complex/epic → Hybrid. Configurable via `auto_loop_rules`. | ||
|
|
There was a problem hiding this comment.
auto_loop_rules is referenced here as a configuration mechanism, but it isn’t defined anywhere else in the spec. Either add a short definition (expected shape + where it lives in config) or remove the reference to avoid introducing an undocumented config surface.
| **`auto_loop_rules` configuration** | |
| - Optional top-level config key, defined alongside `execution_loop`. | |
| - Controls how `estimated_complexity` values are mapped to loop implementations when `execution_loop: "auto"`. | |
| - Recommended shape: | |
| ```yaml | |
| # Example agent/config YAML | |
| execution_loop: "auto" | |
| auto_loop_rules: | |
| thresholds: | |
| simple_max_tokens: 500 # up to this → "simple" | |
| medium_max_tokens: 3000 # up to this → "medium"; above → "complex" | |
| mapping: | |
| simple: "react" # use ReAct loop | |
| medium: "plan_and_execute" # use Plan-and-Execute loop | |
| complex: "hybrid" # use Hybrid loop |
DESIGN_SPEC.md
Outdated
| entity_extraction: "auto" # auto-extract entities from ADRs and policies | ||
| ``` | ||
|
|
||
| - 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships |
There was a problem hiding this comment.
The “3.4x accuracy improvement” claim is very specific but has no citation or qualifier, which makes it read like a measured guarantee. Consider adding a source link/footnote or rephrasing to a qualitative statement (e.g., “can improve accuracy vs vector-only retrieval in some benchmarks”) to avoid misleading readers.
| - 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships | |
| - Can improve accuracy over vector-only retrieval in some scenarios; multi-hop reasoning captures policy relationships |
| auto_downgrade: | ||
| enabled: true | ||
| threshold: 85 # percent of budget used | ||
| boundary: "task_assignment" # task_assignment only — NEVER mid-execution |
There was a problem hiding this comment.
This adds budget.auto_downgrade.boundary, but the current config model AutoDowngradeConfig in src/ai_company/budget/config.py only defines enabled, threshold, and downgrade_map. Either (a) remove boundary from the YAML example and keep the boundary behavior as a documented invariant, or (b) update the config schema/implementation to accept and validate this new field.
| boundary: "task_assignment" # task_assignment only — NEVER mid-execution |
DESIGN_SPEC.md
Outdated
|
|
||
| When an action requires human approval (per autonomy level in §12.2), the agent must wait. The framework provides configurable timeout policies that determine what happens when a human doesn't respond. All policies implement a `TimeoutPolicy` protocol. The policy is configurable per autonomy level and per action risk tier. | ||
|
|
||
| During any wait — regardless of policy — the agent **parks** the blocked task (saving its full `AgentContext` snapshot: conversation, progress, accumulated cost, turn count) and picks up other available tasks from its queue. When approval eventually arrives, the agent **resumes** the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives. |
There was a problem hiding this comment.
This section uses “AgentContext snapshot” to mean the full persisted execution state (including conversation), but in the current codebase AgentContextSnapshot is a compact reporting/logging snapshot and does not include the conversation contents. To avoid confusion/incorrect implementation, consider explicitly distinguishing between a persisted/serialized AgentContext (full state) vs AgentContextSnapshot (telemetry), or rename the persisted artifact in the spec.
| During any wait — regardless of policy — the agent **parks** the blocked task (saving its full `AgentContext` snapshot: conversation, progress, accumulated cost, turn count) and picks up other available tasks from its queue. When approval eventually arrives, the agent **resumes** the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives. | |
| During any wait — regardless of policy — the agent **parks** the blocked task (saving its full serialized `AgentContext` state: conversation, progress, accumulated cost, turn count — i.e., the complete persisted context, not just the compact `AgentContextSnapshot` used for telemetry) and picks up other available tasks from its queue. When approval eventually arrives, the agent **resumes** the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives. |
…, Copilot, and Greptile - README.md: update 2 Mem0 references to TBD (memory layer undecided) - DESIGN_SPEC.md §15.2: add missing Cognee to memory candidate list - DESIGN_SPEC.md §15.4: clarify OrgMemoryBackend vs agent memory types - DESIGN_SPEC.md §11.3: make weighted trust human approval gate a structured field (consistent with other strategies) - DESIGN_SPEC.md §5.6: add CEO to debate judge options - DESIGN_SPEC.md §6.5: add "auto" to execution_loop YAML values - DESIGN_SPEC.md §7.4: fix MD028 blank line inside blockquote - DESIGN_SPEC.md §7.4: qualify 3.4x accuracy claim - DESIGN_SPEC.md §6.5: define auto_loop_rules inline - DESIGN_SPEC.md §12.4: clarify AgentContext vs AgentContextSnapshot Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| - Most complex to implement. Plan granularity needs tuning per task type | ||
| - **Best for**: Complex tasks, multi-file refactoring, tasks requiring both planning and adaptivity | ||
|
|
||
| > **Auto-selection (optional):** When `execution_loop: "auto"`, the framework selects the loop based on `estimated_complexity`: simple → ReAct, medium → Plan-and-Execute, complex/epic → Hybrid. Configurable via `auto_loop_rules` — a mapping of complexity thresholds to loop implementations (e.g., `{simple_max_tokens: 500, medium_max_tokens: 3000}` with corresponding loop assignments). |
There was a problem hiding this comment.
The auto-selection note mentions auto_loop_rules as "a mapping of complexity thresholds to loop implementations" and includes an example with threshold keys (simple_max_tokens: 500, medium_max_tokens: 3000) but omits the loop type assignments that determine which loop handles each threshold range. This leaves the config structure ambiguous for implementers.
Consider completing the example inline to show the full structure:
execution_loop: "auto"
auto_loop_rules:
simple_max_tokens: 500 # context <= 500 tokens → react
medium_max_tokens: 3000 # context <= 3000 tokens → plan_execute
# context > 3000 tokens → hybrid
simple_loop: "react"
medium_loop: "plan_execute"
complex_loop: "hybrid"This clarifies the mapping between thresholds and loop implementations.
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 847
Comment:
The auto-selection note mentions `auto_loop_rules` as "a mapping of complexity thresholds to loop implementations" and includes an example with threshold keys (`simple_max_tokens: 500, medium_max_tokens: 3000`) but omits the loop type assignments that determine which loop handles each threshold range. This leaves the config structure ambiguous for implementers.
Consider completing the example inline to show the full structure:
```yaml
execution_loop: "auto"
auto_loop_rules:
simple_max_tokens: 500 # context <= 500 tokens → react
medium_max_tokens: 3000 # context <= 3000 tokens → plan_execute
# context > 3000 tokens → hybrid
simple_loop: "react"
medium_loop: "plan_execute"
complex_loop: "hybrid"
```
This clarifies the mapping between thresholds and loop implementations.
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| ### 5.6 Conflict Resolution Protocol | ||
|
|
||
| When two or more agents disagree on an approach (architecture, implementation, priority, etc.), the framework provides multiple configurable resolution strategies behind a `ConflictResolver` protocol. New strategies can be added without modifying existing ones. The strategy is configurable per company, per department, or per conflict type. |
There was a problem hiding this comment.
Five of the six new protocols introduced in this PR lack explicit method signatures. In contrast, OrgMemoryBackend (§7.4, line 954) defines its interface clearly:
query(context) → list[OrgFact],write(fact, author),list_policies()
The following protocols are mentioned by name but have no method signatures defined:
ConflictResolver(§5.6, line 517)MeetingProtocol(§5.7, line 581)ExecutionLoop(§6.5, line 749)TrustStrategy(§11.3, line 1250)TimeoutPolicy(§12.4, line 1421)
Since a core design principle of this PR is "pluggable strategies behind protocol interfaces," adding signature blocks to each protocol (even if provisional) would make the spec actionable for implementers and more consistent with the OrgMemoryBackend example.
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 517
Comment:
Five of the six new protocols introduced in this PR lack explicit method signatures. In contrast, `OrgMemoryBackend` (§7.4, line 954) defines its interface clearly:
> `query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`
The following protocols are mentioned by name but have no method signatures defined:
- `ConflictResolver` (§5.6, line 517)
- `MeetingProtocol` (§5.7, line 581)
- `ExecutionLoop` (§6.5, line 749)
- `TrustStrategy` (§11.3, line 1250)
- `TimeoutPolicy` (§12.4, line 1421)
Since a core design principle of this PR is "pluggable strategies behind protocol interfaces," adding signature blocks to each protocol (even if provisional) would make the spec actionable for implementers and more consistent with the `OrgMemoryBackend` example.
How can I resolve this? If you propose a fix, please make it concise.| - **Smart Cost Management** - Per-agent budget tracking, auto model routing, CFO agent optimization | ||
| - **Configurable Autonomy** - From fully autonomous to human-approves-everything, with a Security Ops agent in between | ||
| - **Persistent Memory** - Agents remember past decisions, code, relationships (via Mem0) | ||
| - **Persistent Memory** - Agents remember past decisions, code, relationships (memory layer TBD) |
There was a problem hiding this comment.
The change from (via Mem0) to (memory layer TBD) accurately reflects the current state but exposes internal uncertainty to users and contributors reading the README. Since the underlying design commitment (a pluggable memory layer) is firm, consider phrasing that conveys the design decision without exposing the unresolved vendor choice:
| - **Persistent Memory** - Agents remember past decisions, code, relationships (memory layer TBD) | |
| - **Persistent Memory** - Agents remember past decisions, code, relationships (via pluggable memory layer) |
This keeps the README confident and user-facing while remaining accurate about the architecture.
Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 18
Comment:
The change from `(via Mem0)` to `(memory layer TBD)` accurately reflects the current state but exposes internal uncertainty to users and contributors reading the README. Since the underlying design commitment (a pluggable memory layer) is firm, consider phrasing that conveys the design decision without exposing the unresolved vendor choice:
```suggestion
- **Persistent Memory** - Agents remember past decisions, code, relationships (via pluggable memory layer)
```
This keeps the README confident and user-facing while remaining accurate about the architecture.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
Major expansion of
DESIGN_SPEC.mdbased on external review feedback (3 independent reviews evaluated). Adds 7 new/rewritten sections covering previously underspecified areas. All new subsystems follow a consistent pattern: pluggable strategies behind protocol interfaces for maximum extensibility.New sections added
ConflictResolverprotocolMeetingProtocolprotocolExecutionLoopprotocol with auto-select by task complexityOrgMemoryBackendprotocolTrustStrategyprotocolTimeoutPolicyprotocol, with task park/resume viaAgentContextsnapshotsOther updates
memory/package description updated to reflect TBD statusDesign principles
Test plan
No code changes — docs only.