docs: address external review feedback with MVP scope and new protocols#128
docs: address external review feedback with MVP scope and new protocols#128
Conversation
… and inline markers Add §1.4 MVP Definition and §1.5 Configuration Philosophy to clearly separate the full vision from what M3 actually builds. Add §6.7 Graceful Shutdown Protocol (ShutdownStrategy protocol) and §6.8 Concurrent Workspace Isolation (WorkspaceIsolationStrategy protocol) addressing reviewer concerns about missing shutdown handling and multi-agent file conflicts. Relabel §7.4 GraphRAG/Temporal KG backends as Research Directions. Make progressive trust default to "disabled" with prominent security invariant. Add 12 inline MVP markers across all deferred feature sections. Add minimal config examples and sensible defaults documentation. Update §15.5 conventions table with state coordination, workspace isolation, and graceful shutdown decisions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the design specification based on external review feedback, primarily by clarifying the Minimum Viable Product (MVP) scope, introducing new essential protocols for system robustness and multi-agent coordination, and addressing concerns about over-engineering. The changes aim to provide a clearer roadmap for implementation, ensure a more robust system architecture, and improve the overall readability and understanding of the framework's design. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughDESIGN_SPEC.md undergoes restructuring with renumbered table of contents, introduces formal MVP definitions and scope (sections 1.4–1.5), and adds MVP annotations throughout multiple sections (conflict resolution, execution loop, crash recovery, sandbox backends, trust mechanisms) alongside a new INTERRUPTED terminal task state. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request significantly improves the design specification by clarifying the MVP scope, adding new protocols for graceful shutdown and workspace isolation, and introducing a configuration philosophy of progressive disclosure. The changes are well-structured and address the feedback from external reviews effectively. I have one minor suggestion to improve the consistency of the 'State Coordination vs Workspace Isolation' table.
Note: Security Review has been skipped due to the limited scope of the PR.
DESIGN_SPEC.md
Outdated
| | Framework state (tasks, assignments, budget) | Centralized single-writer (`TaskEngine`) | `model_copy(update=...)` via async queue | | ||
| | Code and files (agent work output) | Workspace isolation (`WorkspaceIsolationStrategy`) | Git worktrees / branches | | ||
| | Agent memory (personal) | Per-agent ownership | Each agent owns its memory exclusively | | ||
| | Org memory (shared knowledge) | Single-writer (`OrgMemoryBackend`) | Write-access controlled by role level | |
There was a problem hiding this comment.
In the 'State Coordination vs Workspace Isolation' table, the 'Mechanism' for 'Org memory' seems to describe a policy rather than an implementation mechanism, which makes it inconsistent with other rows. For clarity and consistency, consider describing the technical mechanism, similar to how it's done for 'Framework state'.
| | Org memory (shared knowledge) | Single-writer (`OrgMemoryBackend`) | Write-access controlled by role level | | |
| | Org memory (shared knowledge) | Single-writer (`OrgMemoryBackend`) | `OrgMemoryBackend` protocol with role-based access control | |
Greptile SummaryThis PR addresses three rounds of external review feedback by clarifying MVP scope and adding two new protocol sections to the design spec. It introduces §1.4 (explicit M3 MVP definition with deferred-feature list), §1.5 (progressive-disclosure config philosophy), §6.7 (pluggable Key changes and observations:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant OS as OS Signal
participant Engine as TaskEngine
participant Agent as In-flight Agent(s)
participant Storage as Storage / Connections
OS->>Engine: SIGTERM / SIGINT
Engine->>Engine: Set shutdown_event (asyncio.Event)
Engine->>Engine: Close drain gate — stop accepting new tasks
Note over Agent: Checks shutdown_event at turn boundaries
Agent-->>Engine: Exit cooperatively (within grace_seconds)
alt All agents exit within grace_seconds (30s)
Agent->>Engine: Cooperative exit
else grace_seconds exceeded
Engine->>Agent: task.cancel()
Agent-->>Engine: Task → INTERRUPTED
end
Engine->>Storage: Persist cost records
Engine->>Storage: Close provider connections
Engine->>Storage: Flush logs
Note over Engine,Storage: cleanup_seconds (5s) then process exits
Last reviewed commit: 072a6f2 |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 971-974: Remove the blank line between the two consecutive
blockquote lines so they remain a single blockquote; locate the paragraph
containing the `INTERRUPTED` TaskStatus note and the Windows compatibility note
referencing `loop.add_signal_handler()` and `signal.signal()` and merge them
into one continuous blockquote (or convert both to normal paragraphs) so MD028
is not triggered.
- Around line 1369-1375: The fenced YAML examples in the "Minimal config"
blockquote are flush with surrounding text and must be padded with blank quoted
lines: insert a standalone quoted blank line (a line containing just ">")
immediately before the opening ```yaml fence and another immediately after the
closing ``` fence in the "Minimal config" callout (the block containing the
"budget: total_monthly: 100.00" example), and do the same for the second
occurrence mentioned (the callout around lines 1712-1718) so both fenced blocks
have a blank `>` line above and below them.
- Around line 1125-1129: The current "Research Directions (M5+)" and "Backend 2:
GraphRAG Knowledge Graph (Research)" headings conflict with the later
extensibility note by calling Backends 2–3 “not planned implementations”; update
the wording under the "Research Directions (M5+)" section (and any Backend 2/3
subheadings) to match the later phrase "planned extensions" (or otherwise
normalize to the approved terminology used in the extensibility note) so the
intent for OrgMemoryBackend (the protocol) and future backend work is consistent
across the document.
- Around line 946-976: The Task lifecycle docs must reflect the new INTERRUPTED
state: add INTERRUPTED as a first-class recoverable TaskStatus in the canonical
status list/enum (TaskStatus) and update all references/diagrams in the
lifecycle/state model (sections §6.1 and §6.6) to include INTERRUPTED on the
non-happy path (distinct from FAILED and CANCELLED), adjust the state
transitions to show that shutdown leads to INTERRUPTED (and that INTERRUPTED is
eligible for reassignment on restart), and either remove the “defer
enum/lifecycle updates” note or mark those sections explicitly as
“proposed-only” so there is a single source of truth for M3 task-state
semantics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 709fdccc-cee0-483d-94b9-86e72a121c62
📒 Files selected for processing (1)
DESIGN_SPEC.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T11:57:38.189Z
Learning: Use `/aurelio-review-pr` to handle external reviewer feedback on created PRs
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T11:57:38.189Z
Learning: Update DESIGN_SPEC.md to reflect approved deviations from the original specification
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T11:57:38.189Z
Learning: Always read DESIGN_SPEC.md before implementing any feature or planning any issue — it is the mandatory starting point for architecture, data models, and behavior
📚 Learning: 2026-03-06T11:57:38.189Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T11:57:38.189Z
Learning: Update DESIGN_SPEC.md to reflect approved deviations from the original specification
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-06T11:57:38.189Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T11:57:38.189Z
Learning: Always read DESIGN_SPEC.md before implementing any feature or planning any issue — it is the mandatory starting point for architecture, data models, and behavior
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes
Applied to files:
DESIGN_SPEC.md
🪛 markdownlint-cli2 (0.21.0)
DESIGN_SPEC.md
[warning] 972-972: Blank line inside blockquote
(MD028, no-blanks-blockquote)
[warning] 974-974: Blank line inside blockquote
(MD028, no-blanks-blockquote)
[warning] 1370-1370: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 1373-1373: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 1713-1713: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 1716-1716: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
| ### 6.7 Graceful Shutdown Protocol | ||
|
|
||
| When the process receives SIGTERM/SIGINT (user Ctrl+C, Docker stop, systemd shutdown), the framework needs to stop cleanly without losing work or leaking costs. Shutdown strategies are implemented behind a `ShutdownStrategy` protocol, making the system pluggable — new strategies can be added without modifying existing ones. | ||
|
|
||
| > **MVP: Cooperative with Timeout only (Strategy 1).** Other strategies are future options enabled by the protocol interface. | ||
|
|
||
| #### Strategy 1: Cooperative with Timeout (Default / MVP) | ||
|
|
||
| The engine sets a shutdown event, stops accepting new tasks, and gives in-flight agents a grace period to finish their current turn. Agents check the shutdown event at turn boundaries (between LLM calls, before tool invocations) and exit cooperatively. After the grace period, remaining agents are force-cancelled and their tasks marked `INTERRUPTED`. | ||
|
|
||
| ```yaml | ||
| graceful_shutdown: | ||
| strategy: "cooperative_timeout" # cooperative_timeout, immediate, finish_tool, checkpoint | ||
| cooperative_timeout: | ||
| grace_seconds: 30 # time for agents to finish cooperatively | ||
| cleanup_seconds: 5 # time for final cleanup (persist cost records, close connections) | ||
| ``` | ||
|
|
||
| On shutdown signal: | ||
| 1. Set `shutdown_event` (`asyncio.Event`) — agents check this at turn boundaries | ||
| 2. Stop accepting new tasks (drain gate closes) | ||
| 3. Wait up to `grace_seconds` for agents to exit cooperatively | ||
| 4. Force-cancel remaining agents (`task.cancel()`) — tasks transition to `INTERRUPTED` | ||
| 5. Cleanup phase (`cleanup_seconds`): persist cost records, close provider connections, flush logs | ||
|
|
||
| > **New non-terminal status:** `INTERRUPTED` is a new `TaskStatus` variant. Unlike `FAILED` (eligible for automatic reassignment) or `CANCELLED` (terminal), `INTERRUPTED` indicates the task was stopped due to process shutdown and is eligible for manual or automatic reassignment on restart. | ||
|
|
||
| > **Windows compatibility:** `loop.add_signal_handler()` is not supported on Windows. The implementation uses `signal.signal()` as a fallback. SIGINT (Ctrl+C) works cross-platform; SIGTERM on Windows requires `os.kill()`. | ||
|
|
||
| > **In-flight LLM calls:** Non-streaming API calls that are interrupted result in tokens billed but no response received (silent cost leak). The engine logs request start (with input token count) before each provider call, so interrupted calls have at minimum an input-cost audit record. Streaming calls are charged only for tokens sent before disconnect. | ||
|
|
There was a problem hiding this comment.
Reflect INTERRUPTED in the canonical task lifecycle/state model.
§6.7 makes INTERRUPTED a first-class recoverable status, but §6.1 still shows only BLOCKED / CANCELLED on the non-happy path and §6.6 still defers enum/lifecycle updates to “when implemented.” That leaves two sources of truth for M3 task-state semantics. Please update the lifecycle/status definitions now, or explicitly mark this section as proposed-only.
Based on learnings: Update DESIGN_SPEC.md to reflect approved deviations from the original specification.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 972-972: Blank line inside blockquote
(MD028, no-blanks-blockquote)
[warning] 974-974: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@DESIGN_SPEC.md` around lines 946 - 976, The Task lifecycle docs must reflect
the new INTERRUPTED state: add INTERRUPTED as a first-class recoverable
TaskStatus in the canonical status list/enum (TaskStatus) and update all
references/diagrams in the lifecycle/state model (sections §6.1 and §6.6) to
include INTERRUPTED on the non-happy path (distinct from FAILED and CANCELLED),
adjust the state transitions to show that shutdown leads to INTERRUPTED (and
that INTERRUPTED is eligible for reassignment on restart), and either remove the
“defer enum/lifecycle updates” note or mark those sections explicitly as
“proposed-only” so there is a single source of truth for M3 task-state
semantics.
DESIGN_SPEC.md
Outdated
| > **New non-terminal status:** `INTERRUPTED` is a new `TaskStatus` variant. Unlike `FAILED` (eligible for automatic reassignment) or `CANCELLED` (terminal), `INTERRUPTED` indicates the task was stopped due to process shutdown and is eligible for manual or automatic reassignment on restart. | ||
|
|
||
| > **Windows compatibility:** `loop.add_signal_handler()` is not supported on Windows. The implementation uses `signal.signal()` as a fallback. SIGINT (Ctrl+C) works cross-platform; SIGTERM on Windows requires `os.kill()`. | ||
|
|
There was a problem hiding this comment.
Remove the blank lines inside this blockquote callout.
Lines 972 and 974 trigger MD028, and several renderers will split these notes into separate quote blocks. Keep the consecutive notes in one blockquote or convert them into normal paragraphs.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 972-972: Blank line inside blockquote
(MD028, no-blanks-blockquote)
[warning] 974-974: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@DESIGN_SPEC.md` around lines 971 - 974, Remove the blank line between the two
consecutive blockquote lines so they remain a single blockquote; locate the
paragraph containing the `INTERRUPTED` TaskStatus note and the Windows
compatibility note referencing `loop.add_signal_handler()` and `signal.signal()`
and merge them into one continuous blockquote (or convert both to normal
paragraphs) so MD028 is not triggered.
DESIGN_SPEC.md
Outdated
| ### Research Directions (M5+) | ||
|
|
||
| The following backends illustrate why `OrgMemoryBackend` is a protocol — the architecture supports future upgrades without modifying existing code. These are **not planned implementations**; they are research directions that may inform future work if/when organizational memory needs outgrow the Hybrid Prompt + Retrieval approach. | ||
|
|
||
| #### Backend 2: GraphRAG Knowledge Graph (Research) |
There was a problem hiding this comment.
Align the “research directions” wording with the later backend note.
This new heading says Backends 2–3 are “not planned implementations,” but the later extensibility note in the same section still describes them as “planned extensions.” That changes the roadmap meaning materially, so the terminology should be normalized here.
Based on learnings: Update DESIGN_SPEC.md to reflect approved deviations from the original specification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@DESIGN_SPEC.md` around lines 1125 - 1129, The current "Research Directions
(M5+)" and "Backend 2: GraphRAG Knowledge Graph (Research)" headings conflict
with the later extensibility note by calling Backends 2–3 “not planned
implementations”; update the wording under the "Research Directions (M5+)"
section (and any Backend 2/3 subheadings) to match the later phrase "planned
extensions" (or otherwise normalize to the approved terminology used in the
extensibility note) so the intent for OrgMemoryBackend (the protocol) and future
backend work is consistent across the document.
| > **Minimal config:** | ||
| > ```yaml | ||
| > budget: | ||
| > total_monthly: 100.00 | ||
| > ``` | ||
| > All other fields below have sensible defaults. | ||
|
|
There was a problem hiding this comment.
Add blank quoted lines around these fenced YAML examples.
Both minimal-config callouts trip MD031 because the fenced blocks are flush with the surrounding blockquote text. Please pad each fence with blank > lines so they render consistently across markdown viewers.
Also applies to: 1712-1718
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 1370-1370: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 1373-1373: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@DESIGN_SPEC.md` around lines 1369 - 1375, The fenced YAML examples in the
"Minimal config" blockquote are flush with surrounding text and must be padded
with blank quoted lines: insert a standalone quoted blank line (a line
containing just ">") immediately before the opening ```yaml fence and another
immediately after the closing ``` fence in the "Minimal config" callout (the
block containing the "budget: total_monthly: 100.00" example), and do the same
for the second occurrence mentioned (the callout around lines 1712-1718) so both
fenced blocks have a blank `>` line above and below them.
There was a problem hiding this comment.
Pull request overview
Updates DESIGN_SPEC.md to incorporate external review feedback by clarifying MVP (M3) scope, introducing new protocol-based designs (graceful shutdown + workspace isolation), and tightening several “future vs now” boundaries across the spec.
Changes:
- Adds explicit M3 MVP definition and configuration philosophy (incl. minimal config examples).
- Introduces new protocol sections for graceful shutdown (
ShutdownStrategy) and concurrent workspace isolation (WorkspaceIsolationStrategy), with MVP callouts. - Reframes org-memory backends 2–3 as “research directions” and expands engineering conventions to include new coordination/isolation/shutdown rows.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
DESIGN_SPEC.md
Outdated
| 4. Force-cancel remaining agents (`task.cancel()`) — tasks transition to `INTERRUPTED` | ||
| 5. Cleanup phase (`cleanup_seconds`): persist cost records, close provider connections, flush logs | ||
|
|
||
| > **New non-terminal status:** `INTERRUPTED` is a new `TaskStatus` variant. Unlike `FAILED` (eligible for automatic reassignment) or `CANCELLED` (terminal), `INTERRUPTED` indicates the task was stopped due to process shutdown and is eligible for manual or automatic reassignment on restart. |
There was a problem hiding this comment.
INTERRUPTED is described as a new TaskStatus variant, but the current TaskStatus enum in code (src/ai_company/core/enums.py) only includes CREATED/ASSIGNED/IN_PROGRESS/IN_REVIEW/COMPLETED/BLOCKED/CANCELLED (and the spec already calls out that FAILED will be added later). To keep the spec consistent with the repo’s current state, consider adding the same kind of “will be added when implemented” note here (and/or referencing the transition map that will need updating) rather than stating it as an already-existing variant.
| > **New non-terminal status:** `INTERRUPTED` is a new `TaskStatus` variant. Unlike `FAILED` (eligible for automatic reassignment) or `CANCELLED` (terminal), `INTERRUPTED` indicates the task was stopped due to process shutdown and is eligible for manual or automatic reassignment on restart. | |
| > **Planned non-terminal status:** `INTERRUPTED` will be introduced as a new `TaskStatus` variant once implemented in code (and the task status transition map is updated). Unlike `FAILED` (eligible for automatic reassignment) or `CANCELLED` (terminal), `INTERRUPTED` will indicate the task was stopped due to process shutdown and is eligible for manual or automatic reassignment on restart. |
DESIGN_SPEC.md
Outdated
| #### Backend 2: GraphRAG Knowledge Graph (Future) | ||
| --- | ||
|
|
||
| ### Research Directions (M5+) |
There was a problem hiding this comment.
This introduces an unnumbered level-3 heading (### Research Directions (M5+)) inside the numbered §7.4 area, which breaks the otherwise consistent “### 7.x …” structure and also makes the Table of Contents note (“7.4 … (Research Directions)”) a bit misleading since Research Directions is now a sibling section rather than clearly part of 7.4. Consider making this a #### subsection under 7.4, or renumbering it to ### 7.5 … if it’s meant to be a peer section.
| ### Research Directions (M5+) | |
| #### Research Directions (M5+) |
DESIGN_SPEC.md
Outdated
| ### Research Directions (M5+) | ||
|
|
||
| The following backends illustrate why `OrgMemoryBackend` is a protocol — the architecture supports future upgrades without modifying existing code. These are **not planned implementations**; they are research directions that may inform future work if/when organizational memory needs outgrow the Hybrid Prompt + Retrieval approach. | ||
|
|
||
| #### Backend 2: GraphRAG Knowledge Graph (Research) |
There was a problem hiding this comment.
This section says Backends 2–3 are “not planned implementations”, but later in the same §7.4 area the spec describes them as “planned extensions.” Please align the wording so it’s unambiguous whether GraphRAG/Temporal KG are only research ideas or actual roadmap items.
DESIGN_SPEC.md
Outdated
| > **Minimal config:** | ||
| > ```yaml | ||
| > autonomy: | ||
| > level: "semi" | ||
| > ``` | ||
| > All presets below are built-in. Most users only set the level. |
There was a problem hiding this comment.
The “Minimal config” callout suggests users can configure autonomy.level: "semi", but the current validated YAML schema uses config.autonomy as a float (0.0–1.0) in RootConfig/CompanyConfig (see src/ai_company/config/schema.py + src/ai_company/core/company.py) and does not define an autonomy: object with presets. Either mark this autonomy-preset structure as future/planned (separate from today’s config), or update the minimal example to match the current config shape.
| > **Minimal config:** | |
| > ```yaml | |
| > autonomy: | |
| > level: "semi" | |
| > ``` | |
| > All presets below are built-in. Most users only set the level. | |
| > **Planned minimal config (future, not yet implemented):** | |
| > ```yaml | |
| > autonomy: | |
| > level: "semi" | |
| > ``` | |
| > This `autonomy.level` + presets structure is a design target for a future config schema. | |
| > | |
| > **Current implementation (v0.x) uses a numeric flag:** | |
| > ```yaml | |
| > config: | |
| > autonomy: 0.5 # float between 0.0 (no autonomy) and 1.0 (full autonomy) | |
| > ``` |
| **Minimal custom company** (all other settings use defaults): | ||
|
|
||
| ```yaml | ||
| company: | ||
| name: "Acme Corp" | ||
| template: "startup" |
There was a problem hiding this comment.
The “Minimal custom company” YAML snippet doesn’t match any current config shape in the repo: built-in templates are defined under a top-level template: key (with template.company.type, template.company.budget_monthly, etc.), while validated runtime config uses company_name/company_type plus config.budget_monthly (see src/ai_company/config/schema.py). Please either (a) update the snippet to a real, loadable template/config example consistent with the existing schemas, or (b) explicitly label it as a conceptual/future higher-level config format so readers don’t try to use it as-is.
| **Minimal custom company** (all other settings use defaults): | |
| ```yaml | |
| company: | |
| name: "Acme Corp" | |
| template: "startup" | |
| **Minimal custom company (runtime config example)** (all other settings use defaults): | |
| ```yaml | |
| company_name: "Acme Corp" | |
| company_type: "startup" | |
| config: |
…le, Copilot, and Gemini - Fix "not planned" vs "planned extensions" contradiction in §7.4 - Fix FAILED labeled "terminal" in §6.6 (should be non-terminal) - Fix INTERRUPTED wording to "Planned non-terminal status" - Fix §5.6 MVP marker to say "Not in M3" (single-agent, no conflicts) - Add missing SUSPENDED "new status" callout in §6.7 Strategy 4 - Remove mid-section --- and fix Research Directions heading level - Mark §12.2 autonomy minimal config as planned (not yet implemented) - Fix Progressive Trust deferred list annotation pattern - Fix MD028: merge consecutive blockquotes in §6.7 - Fix MD031: add blank > lines around fenced code in §10.4 and §12.2 - Update org memory mechanism column for clarity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| - Memory layer integration, org memory backends (M5) | ||
| - Web UI, WebSocket real-time updates (M6) | ||
| - CLI commands beyond basic `start` (M6) | ||
| - Security ops agent, approval workflows (M7) |
There was a problem hiding this comment.
§1.4 "approval workflows (M7)" contradicts §12.4 MVP marker
Line 78 places "approval workflows" in the M7 deferred bucket, but §12.4 (line 1766) adds:
MVP: Wait Forever only (Policy 1). Other timeout policies are M5+.
This means the human-approval waiting mechanism (parking a task, receiving approval, resuming) is explicitly planned for M3 — not M7. A reader parsing §1.4 will conclude that no approval integration exists until M7, then reach §12.4 and find a contradicting MVP callout. This is especially likely to confuse implementors scoping M3 work.
The §1.4 entry most likely intends to defer the Security ops agent and the full approval workflow UI / escalation policies, while the basic "wait for human, resume on approval" plumbing ships in M3. Splitting the entry or adjusting the milestone annotation would remove the ambiguity:
| - Security ops agent, approval workflows (M7) | |
| - Security ops agent (M7) | |
| - Approval workflows: full timeout/escalation policies (M5+); basic wait-and-resume ships in M3 (§12.4 Policy 1) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 78
Comment:
**§1.4 "approval workflows (M7)" contradicts §12.4 MVP marker**
Line 78 places "approval workflows" in the M7 deferred bucket, but §12.4 (line 1766) adds:
> **MVP: Wait Forever only (Policy 1).** Other timeout policies are M5+.
This means the human-approval waiting mechanism (parking a task, receiving approval, resuming) is explicitly planned for M3 — not M7. A reader parsing §1.4 will conclude that no approval integration exists until M7, then reach §12.4 and find a contradicting MVP callout. This is especially likely to confuse implementors scoping M3 work.
The §1.4 entry most likely intends to defer the **Security ops agent** and the full **approval workflow UI / escalation policies**, while the basic "wait for human, resume on approval" plumbing ships in M3. Splitting the entry or adjusting the milestone annotation would remove the ambiguity:
```suggestion
- Security ops agent (M7)
- Approval workflows: full timeout/escalation policies (M5+); basic wait-and-resume ships in M3 (§12.4 Policy 1)
```
How can I resolve this? If you propose a fix, please make it concise.| #### Research Directions (M5+) | ||
|
|
||
| The following backends illustrate why `OrgMemoryBackend` is a protocol — the architecture supports future upgrades without modifying existing code. These are **not planned implementations**; they are research directions that may inform future work if/when organizational memory needs outgrow the Hybrid Prompt + Retrieval approach. | ||
|
|
||
| #### Backend 2: GraphRAG Knowledge Graph (Research) |
There was a problem hiding this comment.
Heading hierarchy makes "Research Directions" a sibling of the backends it introduces
#### Research Directions (M5+) is at the same #### level as #### Backend 2: and #### Backend 3:. In Markdown's heading hierarchy these are document siblings, not a parent containing the two backends. A reader navigating via rendered headings or a document outline sees all three at equal depth, which undermines the intent of "Research Directions" as a label/container for the subsequent entries.
Consider promoting the container to ### so that Backends 2 and 3 sit under it as #### children:
| #### Research Directions (M5+) | |
| The following backends illustrate why `OrgMemoryBackend` is a protocol — the architecture supports future upgrades without modifying existing code. These are **not planned implementations**; they are research directions that may inform future work if/when organizational memory needs outgrow the Hybrid Prompt + Retrieval approach. | |
| #### Backend 2: GraphRAG Knowledge Graph (Research) | |
| ### Research Directions (M5+) | |
| The following backends illustrate why `OrgMemoryBackend` is a protocol — the architecture supports future upgrades without modifying existing code. These are **not planned implementations**; they are research directions that may inform future work if/when organizational memory needs outgrow the Hybrid Prompt + Retrieval approach. | |
| #### Backend 2: GraphRAG Knowledge Graph (Research) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 1125-1129
Comment:
**Heading hierarchy makes "Research Directions" a sibling of the backends it introduces**
`#### Research Directions (M5+)` is at the same `####` level as `#### Backend 2:` and `#### Backend 3:`. In Markdown's heading hierarchy these are document *siblings*, not a parent containing the two backends. A reader navigating via rendered headings or a document outline sees all three at equal depth, which undermines the intent of "Research Directions" as a label/container for the subsequent entries.
Consider promoting the container to `###` so that Backends 2 and 3 sit under it as `####` children:
```suggestion
### Research Directions (M5+)
The following backends illustrate why `OrgMemoryBackend` is a protocol — the architecture supports future upgrades without modifying existing code. These are **not planned implementations**; they are research directions that may inform future work if/when organizational memory needs outgrow the Hybrid Prompt + Retrieval approach.
#### Backend 2: GraphRAG Knowledge Graph (Research)
```
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
Addresses feedback from 3 external reviews of the design spec. All reviewers flagged scope communication, missing protocols, and over-engineering concerns.
Changes
ShutdownStrategyprotocol with 4 strategies (cooperative+timeout default),INTERRUPTEDstatus, Windows compatibility notes, in-flight LLM cost handlingWorkspaceIsolationStrategyprotocol with 3 strategies (planner+worktrees default), state coordination table distinguishing framework state vs code/filesDesign Decisions (from external review analysis)
TaskEngine)ShutdownStrategy)WorkspaceIsolationStrategy)Test plan
🤖 Generated with Claude Code