Skip to content

docs: expand design spec with pluggable strategy protocols#121

Merged
Aureliolo merged 2 commits intomainfrom
docs/design-spec-expansion
Mar 6, 2026
Merged

docs: expand design spec with pluggable strategy protocols#121
Aureliolo merged 2 commits intomainfrom
docs/design-spec-expansion

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Major expansion of DESIGN_SPEC.md based on external review feedback (3 independent reviews evaluated). Adds 7 new/rewritten sections covering previously underspecified areas. All new subsystems follow a consistent pattern: pluggable strategies behind protocol interfaces for maximum extensibility.

New sections added

  • §5.6 Conflict Resolution Protocol — 4 strategies (authority+dissent log, structured debate+judge, human escalation, hybrid with review agent) behind ConflictResolver protocol
  • §5.7 Meeting Protocol — 3 protocols (round-robin transcript, async position papers+synthesizer, structured phases) behind MeetingProtocol protocol
  • §6.5 Agent Execution Loop — 3 architectures (ReAct, Plan-and-Execute, Hybrid Plan+ReAct) behind ExecutionLoop protocol with auto-select by task complexity
  • §7.4 Shared Organizational Memory — 3 backends (hybrid prompt+retrieval for MVP, GraphRAG, temporal knowledge graph) behind OrgMemoryBackend protocol
  • §11.3 Progressive Trust (rewritten) — 4 strategies (disabled, weighted score, per-category tracks, milestone gates with trust decay) behind TrustStrategy protocol
  • §12.4 Approval Timeout Policy — 4 policies (wait forever, deny on timeout, tiered by risk, escalation chain) behind TimeoutPolicy protocol, with task park/resume via AgentContext snapshots
  • §10.4 Auto-downgrade boundary — Clarified as task-assignment only, never mid-execution

Other updates

  • §17.1 Open Questions: 5 questions marked resolved (Q2, Q5, Q7, Q11-Q13), 4 new questions added (Q11-Q14)
  • §18.1 Backlog: Conflict resolution protocol moved from backlog to core (§5.6)
  • Memory layer references: All Mem0 references updated to "candidate" status — memory layer library is TBD with candidates: Mem0, Zep, Letta, Cognee, custom
  • CLAUDE.md: memory/ package description updated to reflect TBD status

Design principles

  • Everything is a pluggable strategy behind a protocol interface
  • All strategies configurable per company, department, or context
  • MVP defaults identified for each subsystem
  • Extensible by design — new strategies addable without modifying existing ones

Test plan

  • Verify DESIGN_SPEC.md renders correctly on GitHub (tables, YAML blocks, diagrams)
  • Verify all internal cross-references (§5.6, §5.7, §6.5, §7.4, §12.4) link correctly
  • Verify no Mem0 reference implies it's a decided technology
  • Verify CLAUDE.md memory package description is consistent

No code changes — docs only.

Add 7 new/rewritten sections to DESIGN_SPEC.md based on external
review feedback. All new subsystems are designed as pluggable
strategies behind protocol interfaces for maximum extensibility:

- §5.6 Conflict Resolution: 4 strategies (authority+dissent, debate+judge,
  human escalation, hybrid with review agent) behind ConflictResolver protocol
- §5.7 Meeting Protocol: 3 protocols (round-robin, position papers,
  structured phases) behind MeetingProtocol protocol
- §6.5 Agent Execution Loop: 3 architectures (ReAct, Plan-and-Execute,
  Hybrid) behind ExecutionLoop protocol with auto-select by complexity
- §7.4 Shared Organizational Memory: 3 backends (hybrid prompt+retrieval,
  GraphRAG, temporal KG) behind OrgMemoryBackend protocol
- §11.3 Progressive Trust: rewritten with 4 strategies (disabled, weighted,
  per-category, milestone gates) behind TrustStrategy protocol
- §12.4 Approval Timeout: 4 policies (wait forever, deny, tiered,
  escalation chain) behind TimeoutPolicy protocol with task park/resume
- §10.4 Auto-downgrade: clarified as task-boundary only, never mid-execution

Also updates:
- §17.1 Open Questions: 5 resolved, 4 new questions added
- §18.1 Backlog: conflict resolution moved from backlog to core
- All Mem0 references changed to "candidate" (memory layer TBD)
- CLAUDE.md: memory/ package description updated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 10:09
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e2268180-bca4-4723-aa50-2bd2e498fe4d

📥 Commits

Reviewing files that changed from the base of the PR and between 68f368a and c5952d8.

📒 Files selected for processing (2)
  • DESIGN_SPEC.md
  • README.md

📝 Walkthrough

Summary by CodeRabbit

  • Documentation
    • Renamed memory component to "Memory layer TBD" across docs and README.
    • Expanded design spec with conflict resolution strategies, meeting protocols, multiple agent execution loop architectures, shared organizational memory/backends, approval timeout policies, and progressive trust enhancements.

Walkthrough

Updated wording of the memory component in CLAUDE.md/README.md from "Mem0 adapter" to "memory layer TBD." Substantially expanded DESIGN_SPEC.md with new sections: Conflict Resolution Protocol, Meeting Protocols, Agent Execution Loop options, Shared Organizational Memory, Approval Timeout Policy, and Progressive Trust, plus YAML examples and cross-reference updates.

Changes

Cohort / File(s) Summary
Memory wording updates
CLAUDE.md, README.md
Replaced "Mem0 adapter" / "Persistent Memory" wording with "memory layer TBD" and listed candidate memory-layer options; wording/status updates only.
Design specification additions
DESIGN_SPEC.md
Added new sections: Conflict Resolution Protocol (ConflictResolver + strategies), Meeting Protocols, Agent Execution Loop (ReAct, Plan-and-Execute, Hybrid, auto-selection), Shared Organizational Memory (OrgMemoryBackend variants), Approval Timeout Policy, Progressive Trust; included YAML examples, diagrams, and cross-reference updates.

Sequence Diagram(s)

sequenceDiagram
participant AgentA as Agent
participant AgentB as Agent (peer)
participant Resolver as ConflictResolver
participant Log as DissentLog
participant Authority as Authority/Judge/Human

AgentA->>AgentB: Propose action / decision
AgentB-->>AgentA: Dissent / counter-proposal
AgentA->>Resolver: Submit proposals + context
Resolver->>Log: Record dissent entries (YAML metadata)
Resolver->>Authority: Escalate if configured (or run structured debate)
Authority-->>Resolver: Decision (accept/override)
Resolver-->>AgentA: Resolved decision
Resolver-->>AgentB: Resolved decision
Loading
sequenceDiagram
participant Agent as Agent (loop)
participant Planner as Planner
participant LM as LanguageModel
participant Executor as Executor
participant Env as Environment
participant Memory as MemoryLayerTBD

Agent->>Planner: Generate plan (when Plan-and-Execute)
Planner->>LM: Draft steps / subgoals
LM->>Memory: Retrieve relevant context
Planner->>Agent: Plan ready
Agent->>Executor: Execute step
Executor->>LM: Request action/content
Executor->>Env: Apply action / observe
Env-->>Executor: Observation/result
Executor->>Agent: Step result
Agent->>Planner: Re-plan or continue (hybrid / ReAct)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #121: Appears to add the same DESIGN_SPEC.md sections and the CLAUDE.md memory wording change — strong overlap in documentation edits.
  • PR #2: Overlaps on DESIGN_SPEC.md edits and memory-layer terminology; likely touches the same design additions.
  • PR #114: Modifies CLAUDE.md and DESIGN_SPEC.md memory-related wording consistent with this PR's changes.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: a documentation expansion that adds pluggable strategy protocols to the design spec.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing new sections, design principles, and specific updates across the documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs/design-spec-expansion

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's design specification by integrating a robust framework for pluggable strategies across several critical agent subsystems. The changes introduce detailed protocols for conflict resolution, multi-agent meetings, agent execution loops, and organizational memory, alongside refined policies for progressive trust and approval timeouts. This expansion aims to provide maximum extensibility and configurability, ensuring the system can adapt to diverse operational needs and future developments.

Highlights

  • Expanded Design Specification: Significantly expanded DESIGN_SPEC.md with 7 new or rewritten sections, incorporating feedback from 3 independent external reviews.
  • Pluggable Strategy Protocols: Introduced a consistent design pattern across new subsystems, emphasizing pluggable strategies behind protocol interfaces for maximum extensibility and configurability.
  • Conflict Resolution Protocol: Defined four strategies (Authority + Dissent Log, Structured Debate + Judge, Human Escalation, Hybrid) for resolving disagreements between agents.
  • Meeting Protocol: Outlined three configurable protocols (Round-Robin Transcript, Async Position Papers + Synthesizer, Structured Phases) for multi-agent conversations.
  • Agent Execution Loop: Detailed three architectures (ReAct, Plan-and-Execute, Hybrid Plan + ReAct) for agent task processing, with an optional auto-selection based on task complexity.
  • Shared Organizational Memory: Described three backends (Hybrid Prompt + Retrieval, GraphRAG, Temporal Knowledge Graph) for managing company-wide institutional knowledge.
  • Progressive Trust (Rewritten): Updated the progressive trust section with four distinct strategies (Disabled, Weighted Score, Per-Category Tracks, Milestone Gates) for agents to earn higher tool access.
  • Approval Timeout Policy: Introduced four policies (Wait Forever, Deny on Timeout, Tiered Timeout, Escalation Chain) for handling human approval requests, including mechanisms for task parking and resumption.
  • Auto-downgrade Boundary Clarification: Clarified that model downgrades due to budget apply only at task assignment time, preventing mid-execution model switches.
  • Memory Layer Status Update: Updated all references to 'Mem0' to reflect a 'TBD' status for the memory layer, with multiple candidates currently under evaluation.
  • Open Questions & Backlog Updates: Marked 5 open questions as resolved, added 4 new ones, and moved the Conflict Resolution Protocol from the backlog to a core section.
Changelog
  • CLAUDE.md
    • Updated the description of the memory/ package to indicate that the memory layer is 'TBD' instead of specifically 'Mem0 adapter'.
  • DESIGN_SPEC.md
    • Added a new section '5.6 Conflict Resolution Protocol' detailing four strategies for resolving agent disagreements.
    • Added a new section '5.7 Meeting Protocol' outlining three configurable protocols for multi-agent meetings.
    • Added a new section '6.5 Agent Execution Loop' describing three architectures for agent task processing.
    • Added a new section '7.4 Shared Organizational Memory' presenting three backends for organizational knowledge.
    • Rewrote section '11.3 Progressive Trust' to include four distinct trust strategies.
    • Added a new section '12.4 Approval Timeout Policy' defining four policies for handling human approval timeouts, including task parking and resumption.
    • Clarified the 'Auto-downgrade boundary' in section '10.4' to apply only at task assignment.
    • Updated references to 'Mem0' throughout the document to reflect a 'TBD' memory layer status with multiple candidates.
    • Updated section '17.1 Open Questions' by marking 5 questions as resolved and adding 4 new ones.
    • Updated section '18.1 Backlog' to reflect the Conflict Resolution Protocol moving to core.
Activity
  • The DESIGN_SPEC.md was significantly expanded based on feedback gathered from three independent external reviews.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a major and well-structured expansion of the DESIGN_SPEC.md document. It introduces several new sections detailing pluggable strategies for key subsystems like conflict resolution, meeting protocols, and agent execution loops, all following a consistent and extensible pattern. The updates to generalize the memory layer by removing specific Mem0 references are applied consistently throughout the documentation. My review includes a couple of minor suggestions to further improve the clarity and credibility of the design document. Overall, this is a high-quality update that significantly matures the project's design specification.

Note: Security Review has been skipped due to the limited scope of the PR.

DESIGN_SPEC.md Outdated

#### Strategy 2: Structured Debate + Judge

Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency, consider updating the description of the 'judge' to include all options mentioned in the YAML configuration comment. The text currently mentions 'their shared manager, or a configurable arbitrator agent', but the comment for the judge key also lists 'ceo' as a valid option.

Suggested change
Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record.
Both agents present arguments (1 round each, capped at `max_tokens_per_argument`). A judge — their shared manager, the CEO, or a configurable arbitrator agent — evaluates both positions and decides. The judge's reasoning and both arguments are logged as a dissent record.

DESIGN_SPEC.md Outdated
entity_extraction: "auto" # auto-extract entities from ADRs and policies
```

- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The claim of a '3.4x accuracy improvement' is very specific. To enhance the credibility of the design specification, it would be beneficial to either add a citation for this metric or rephrase it to be more qualitative if a source isn't readily available. For example: 'Offers significant accuracy improvements over vector-only retrieval...'

Suggested change
- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships
- Offers significant accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 770-772: The documented allowed values for execution_loop are
missing "auto" while the spec later uses execution_loop: "auto"; update the
documentation for the execution_loop setting (the YAML line showing
execution_loop: "react") to include "auto" in the value set and adjust its
descriptive text accordingly so the config contract lists "react, plan_execute,
hybrid, auto" and explains what "auto" does; target the execution_loop
declaration and its description to ensure consistency with the later use of
execution_loop: "auto".
- Around line 1703-1707: The Agent Memory row omits "Cognee" compared to later
sections; update the table entry (the cell labeled "Agent Memory") to list the
standardized candidates "Mem0, Zep, Letta, Cognee, custom" and retain the "+
SQLite" note so it matches the later §15.2 references and other occurrences of
the candidate list.
- Around line 954-956: The blockquote containing the OrgMemoryBackend
description and the "Write access control" note has an extra blank line breaking
the quoted callout (triggers MD028); remove the empty line so the two sections
remain part of the same blockquote and ensure both lines begin with '>'
(references: OrgMemoryBackend, query, write, list_policies), then re-run
markdownlint to confirm MD028 is resolved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ce35d684-7a4e-471f-ada6-fa490095d3c2

📥 Commits

Reviewing files that changed from the base of the PR and between ef89b90 and 68f368a.

📒 Files selected for processing (2)
  • CLAUDE.md
  • DESIGN_SPEC.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T10:01:15.539Z
Learning: Update DESIGN_SPEC.md to reflect approved deviations from the original specification
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T10:01:15.539Z
Learning: Always read DESIGN_SPEC.md before implementing any feature or planning any issue
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T10:01:15.539Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T10:01:15.539Z
Learning: Update DESIGN_SPEC.md to reflect approved deviations from the original specification

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Ollama Integration - all AI agents use Ollama for local LLM serving with default endpoint `http://localhost:11434`

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Configure appropriate temperature settings based on agent role: Writer (0.9), Editor (0.6), Continuity (0.3), Architect (0.85), Interviewer (0.7)

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/agents/*.py : Agent temperature settings: Writer (0.9), Editor (0.6), Continuity (0.3), Architect (0.85), Interviewer (0.7)

Applied to files:

  • DESIGN_SPEC.md
🪛 LanguageTool
DESIGN_SPEC.md

[grammar] ~776-~776: Please add a punctuation mark at the end of paragraph.
Context: ...s, quick fixes, single-file changes, M3 MVP #### Loop 2: Plan-and-Execute A two-p...

(PUNCTUATION_PARAGRAPH_END)


[typographical] ~780-~780: In American English, use a period after an abbreviation.
Context: ...fferent models can be used for planning vs execution (e.g., Opus for planning, Hai...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[grammar] ~806-~806: Please add a punctuation mark at the end of paragraph.
Context: ...pic-level work, tasks spanning multiple files #### Loop 3: Hybrid Plan + ReAct Steps...

(PUNCTUATION_PARAGRAPH_END)


[style] ~810-~810: Since ownership is already implied, this phrasing may be redundant.
Context: ...p is executed as a mini-ReAct loop with its own turn limit. After each step, the agent ...

(PRP_OWN)


[grammar] ~845-~845: Please add a punctuation mark at the end of paragraph.
Context: ...ring, tasks requiring both planning and adaptivity > Auto-selection (optional): When ...

(PUNCTUATION_PARAGRAPH_END)


[style] ~903-~903: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...s, e.g., "no commits to main," "all PRs need 2 approvals") are injected into every a...

(EN_REPEATEDWORDS_NEED)


[grammar] ~922-~922: Please add a punctuation mark at the end of paragraph.
Context: ...may miss relational connections between policies #### Backend 2: GraphRAG Knowledge Gra...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~937-~937: Please add a punctuation mark at the end of paragraph.
Context: ...Entity extraction can be noisy. Heavier setup #### Backend 3: Temporal Knowledge Gra...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~952-~952: Please add a punctuation mark at the end of paragraph.
Context: ...kill for small companies or local-first use > Extensibility: All backends impl...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~1283-~1283: Please add a punctuation mark at the end of paragraph.
Context: ...ile edits shouldn't auto-get deployment access #### Strategy: Per-Category Trust Trac...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~1312-~1312: Please add a punctuation mark at the end of paragraph.
Context: ...rust state is a matrix per agent, not a scalar #### Strategy: Milestone Gates (ATF-In...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~1343-~1343: Please add a punctuation mark at the end of paragraph.
Context: ...ay may need tuning to avoid frustrating users --- ## 12. Security & Approval System...

(PUNCTUATION_PARAGRAPH_END)


[style] ~1426-~1426: ‘in the meantime’ might be wordy. Consider a shorter alternative.
Context: ...iting approval and works on other tasks in the meantime. ```yaml approval_timeout: policy: "...

(EN_WORDINESS_PREMIUM_IN_THE_MEANTIME)


[grammar] ~1434-~1434: Please add a punctuation mark at the end of paragraph.
Context: ...if human is unavailable. Queue can grow unbounded #### Policy 2: Deny on Timeout All un...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~1447-~1447: Please add a punctuation mark at the end of paragraph.
Context: ...egitimate work if human is consistently slow #### Policy 3: Tiered Timeout Differe...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~1472-~1472: Please add a punctuation mark at the end of paragraph.
Context: ...s risk. Tuning tier boundaries requires experience #### Policy 4: Escalation Chain On ti...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~1492-~1492: Please add a punctuation mark at the end of paragraph.
Context: ...chain. More humans involved. Complex to implement > Task Suspension and Resumption: ...

(PUNCTUATION_PARAGRAPH_END)


[style] ~1494-~1494: Consider using the typographical ellipsis character here instead.
Context: ...spension. This works naturally with the model_copy(update=...) immutability pattern — the snapshot i...

(ELLIPSIS)


[typographical] ~1960-~1960: In American English, use a period after an abbreviation.
Context: ...cture? | Medium | Open | asyncio queues vs Redis vs embedded broker | | 9 | How to...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1960-~1960: In American English, use a period after an abbreviation.
Context: ...Medium | Open | asyncio queues vs Redis vs embedded broker | | 9 | How to handle c...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1961-~1961: In American English, use a period after an abbreviation.
Context: ...gh | Open | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the ...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1961-~1961: In American English, use a period after an abbreviation.
Context: ...n | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the minimum ...

(MISSING_PERIOD_AFTER_ABBREVIATION)

🪛 markdownlint-cli2 (0.21.0)
DESIGN_SPEC.md

[warning] 955-955: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🔇 Additional comments (1)
CLAUDE.md (1)

51-51: Good terminology update.

This wording matches the current state better than a concrete Mem0 reference and stays aligned with the spec’s “candidate memory layer” framing.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR significantly expands DESIGN_SPEC.md with 7 new/rewritten sections, introducing a consistent architectural pattern: pluggable strategies behind protocol interfaces. The additions cover conflict resolution (§5.6), meeting protocols (§5.7), agent execution loops (§6.5), shared organizational memory (§7.4), progressive trust (§11.3), approval timeout policies (§12.4), and auto-downgrade boundary clarification (§10.4). Supporting files README.md and CLAUDE.md are updated to reflect the TBD memory layer decision.

Key strengths:

  • The pluggable strategy pattern is applied consistently across all new subsystems, with MVP defaults identified and clear extensibility notes.
  • All design decisions align with the stated principles; the spec is well-structured and internally coherent.
  • All Mem0-specific references have been appropriately generalized to "TBD" with candidates listed.

Minor improvements identified:

  • Three new protocols (ConflictResolver, MeetingProtocol, ExecutionLoop, TrustStrategy, TimeoutPolicy) lack explicit method signatures, creating inconsistency with OrgMemoryBackend which does define them. Adding provisional signatures would make the spec more actionable for implementers.
  • The auto_loop_rules config example is incomplete — it shows threshold keys but omits the loop type assignments that would clarify the mapping for implementers.
  • The README feature bullet wording (memory layer TBD) exposes internal uncertainty to users. Reframing as (via pluggable memory layer) would communicate the design decision more confidently.

Confidence Score: 4/5

  • Docs-only PR with no code changes — safe to merge with minor documentation clarity improvements suggested.
  • This is a documentation-only PR with no runtime impact. All changes are confined to design and supporting documentation files (DESIGN_SPEC.md, README.md, CLAUDE.md). The new design sections are well-structured, internally consistent, and follow the established pluggable-strategy pattern. Three minor style-level issues were identified: incomplete config example (line 847), missing protocol method signatures (5 protocols), and public-facing wording that could be more confident (README line 18). These are all non-blocking documentation gaps that improve clarity for implementers and users without affecting the validity or safety of the design.
  • DESIGN_SPEC.md requires minor improvements to method signatures and config example clarity; README.md feature bullet phrasing could be stronger.

Last reviewed commit: c5952d8

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands the design documentation to formalize several previously underspecified subsystems using a consistent “pluggable strategy behind protocol interface” pattern (conflict resolution, meetings, execution loops, shared org memory, trust, approval timeouts), and updates memory-layer references to reflect a TBD vendor decision.

Changes:

  • Add new protocol/strategy sections for conflict resolution, meeting coordination, and agent execution loop architectures.
  • Introduce a shared organizational memory backend abstraction and rewrite progressive trust/approval-timeout policy sections.
  • Update memory-layer references (Mem0 → “candidates/TBD”) and align CLAUDE.md package description accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
DESIGN_SPEC.md Major spec expansion with new protocol-based subsystems; updates memory-layer positioning to TBD/candidates.
CLAUDE.md Updates memory/ package description to reflect memory layer is TBD.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +587 to +592
```yaml
meeting_protocol: "round_robin"
round_robin:
max_turns_per_agent: 2
max_total_turns: 16
leader_summarizes: true
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Meeting Protocol YAML examples use top-level keys (meeting_protocol, round_robin, etc.), but the existing meeting configuration in §5.4 nests meetings under communication.meetings.types. To avoid readers configuring this incorrectly, show these protocol settings in the same structure (e.g., per communication.meetings.types[].protocol / protocol-specific options) or explicitly state where meeting_protocol lives in the config schema.

Copilot uses AI. Check for mistakes.
- **Best for**: Complex tasks, multi-file refactoring, tasks requiring both planning and adaptivity

> **Auto-selection (optional):** When `execution_loop: "auto"`, the framework selects the loop based on `estimated_complexity`: simple → ReAct, medium → Plan-and-Execute, complex/epic → Hybrid. Configurable via `auto_loop_rules`.

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto_loop_rules is referenced here as a configuration mechanism, but it isn’t defined anywhere else in the spec. Either add a short definition (expected shape + where it lives in config) or remove the reference to avoid introducing an undocumented config surface.

Suggested change
**`auto_loop_rules` configuration**
- Optional top-level config key, defined alongside `execution_loop`.
- Controls how `estimated_complexity` values are mapped to loop implementations when `execution_loop: "auto"`.
- Recommended shape:
```yaml
# Example agent/config YAML
execution_loop: "auto"
auto_loop_rules:
thresholds:
simple_max_tokens: 500 # up to this → "simple"
medium_max_tokens: 3000 # up to this → "medium"; above → "complex"
mapping:
simple: "react" # use ReAct loop
medium: "plan_and_execute" # use Plan-and-Execute loop
complex: "hybrid" # use Hybrid loop

Copilot uses AI. Check for mistakes.
DESIGN_SPEC.md Outdated
entity_extraction: "auto" # auto-extract entities from ADRs and policies
```

- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “3.4x accuracy improvement” claim is very specific but has no citation or qualifier, which makes it read like a measured guarantee. Consider adding a source link/footnote or rephrasing to a qualitative statement (e.g., “can improve accuracy vs vector-only retrieval in some benchmarks”) to avoid misleading readers.

Suggested change
- 3.4x accuracy improvement over vector-only retrieval. Multi-hop reasoning captures policy relationships
- Can improve accuracy over vector-only retrieval in some scenarios; multi-hop reasoning captures policy relationships

Copilot uses AI. Check for mistakes.
auto_downgrade:
enabled: true
threshold: 85 # percent of budget used
boundary: "task_assignment" # task_assignment only — NEVER mid-execution
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds budget.auto_downgrade.boundary, but the current config model AutoDowngradeConfig in src/ai_company/budget/config.py only defines enabled, threshold, and downgrade_map. Either (a) remove boundary from the YAML example and keep the boundary behavior as a documented invariant, or (b) update the config schema/implementation to accept and validate this new field.

Suggested change
boundary: "task_assignment" # task_assignment only — NEVER mid-execution

Copilot uses AI. Check for mistakes.
DESIGN_SPEC.md Outdated

When an action requires human approval (per autonomy level in §12.2), the agent must wait. The framework provides configurable timeout policies that determine what happens when a human doesn't respond. All policies implement a `TimeoutPolicy` protocol. The policy is configurable per autonomy level and per action risk tier.

During any wait — regardless of policy — the agent **parks** the blocked task (saving its full `AgentContext` snapshot: conversation, progress, accumulated cost, turn count) and picks up other available tasks from its queue. When approval eventually arrives, the agent **resumes** the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section uses “AgentContext snapshot” to mean the full persisted execution state (including conversation), but in the current codebase AgentContextSnapshot is a compact reporting/logging snapshot and does not include the conversation contents. To avoid confusion/incorrect implementation, consider explicitly distinguishing between a persisted/serialized AgentContext (full state) vs AgentContextSnapshot (telemetry), or rename the persisted artifact in the spec.

Suggested change
During any wait — regardless of policy — the agent **parks** the blocked task (saving its full `AgentContext` snapshot: conversation, progress, accumulated cost, turn count) and picks up other available tasks from its queue. When approval eventually arrives, the agent **resumes** the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives.
During any wait — regardless of policy — the agent **parks** the blocked task (saving its full serialized `AgentContext` state: conversation, progress, accumulated cost, turn count — i.e., the complete persisted context, not just the compact `AgentContextSnapshot` used for telemetry) and picks up other available tasks from its queue. When approval eventually arrives, the agent **resumes** the original context exactly where it left off. This mirrors real company behavior: a junior developer starts another task while waiting for a code review, then returns to the original work when feedback arrives.

Copilot uses AI. Check for mistakes.
…, Copilot, and Greptile

- README.md: update 2 Mem0 references to TBD (memory layer undecided)
- DESIGN_SPEC.md §15.2: add missing Cognee to memory candidate list
- DESIGN_SPEC.md §15.4: clarify OrgMemoryBackend vs agent memory types
- DESIGN_SPEC.md §11.3: make weighted trust human approval gate a
  structured field (consistent with other strategies)
- DESIGN_SPEC.md §5.6: add CEO to debate judge options
- DESIGN_SPEC.md §6.5: add "auto" to execution_loop YAML values
- DESIGN_SPEC.md §7.4: fix MD028 blank line inside blockquote
- DESIGN_SPEC.md §7.4: qualify 3.4x accuracy claim
- DESIGN_SPEC.md §6.5: define auto_loop_rules inline
- DESIGN_SPEC.md §12.4: clarify AgentContext vs AgentContextSnapshot

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit 6832db6 into main Mar 6, 2026
8 of 9 checks passed
@Aureliolo Aureliolo deleted the docs/design-spec-expansion branch March 6, 2026 10:22
- Most complex to implement. Plan granularity needs tuning per task type
- **Best for**: Complex tasks, multi-file refactoring, tasks requiring both planning and adaptivity

> **Auto-selection (optional):** When `execution_loop: "auto"`, the framework selects the loop based on `estimated_complexity`: simple → ReAct, medium → Plan-and-Execute, complex/epic → Hybrid. Configurable via `auto_loop_rules` — a mapping of complexity thresholds to loop implementations (e.g., `{simple_max_tokens: 500, medium_max_tokens: 3000}` with corresponding loop assignments).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-selection note mentions auto_loop_rules as "a mapping of complexity thresholds to loop implementations" and includes an example with threshold keys (simple_max_tokens: 500, medium_max_tokens: 3000) but omits the loop type assignments that determine which loop handles each threshold range. This leaves the config structure ambiguous for implementers.

Consider completing the example inline to show the full structure:

execution_loop: "auto"
auto_loop_rules:
  simple_max_tokens: 500      # context <= 500 tokens → react
  medium_max_tokens: 3000     # context <= 3000 tokens → plan_execute
                               # context > 3000 tokens → hybrid
  simple_loop: "react"
  medium_loop: "plan_execute"
  complex_loop: "hybrid"

This clarifies the mapping between thresholds and loop implementations.

Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 847

Comment:
The auto-selection note mentions `auto_loop_rules` as "a mapping of complexity thresholds to loop implementations" and includes an example with threshold keys (`simple_max_tokens: 500, medium_max_tokens: 3000`) but omits the loop type assignments that determine which loop handles each threshold range. This leaves the config structure ambiguous for implementers.

Consider completing the example inline to show the full structure:

```yaml
execution_loop: "auto"
auto_loop_rules:
  simple_max_tokens: 500      # context <= 500 tokens → react
  medium_max_tokens: 3000     # context <= 3000 tokens → plan_execute
                               # context > 3000 tokens → hybrid
  simple_loop: "react"
  medium_loop: "plan_execute"
  complex_loop: "hybrid"
```

This clarifies the mapping between thresholds and loop implementations.

How can I resolve this? If you propose a fix, please make it concise.


### 5.6 Conflict Resolution Protocol

When two or more agents disagree on an approach (architecture, implementation, priority, etc.), the framework provides multiple configurable resolution strategies behind a `ConflictResolver` protocol. New strategies can be added without modifying existing ones. The strategy is configurable per company, per department, or per conflict type.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Five of the six new protocols introduced in this PR lack explicit method signatures. In contrast, OrgMemoryBackend (§7.4, line 954) defines its interface clearly:

query(context) → list[OrgFact], write(fact, author), list_policies()

The following protocols are mentioned by name but have no method signatures defined:

  • ConflictResolver (§5.6, line 517)
  • MeetingProtocol (§5.7, line 581)
  • ExecutionLoop (§6.5, line 749)
  • TrustStrategy (§11.3, line 1250)
  • TimeoutPolicy (§12.4, line 1421)

Since a core design principle of this PR is "pluggable strategies behind protocol interfaces," adding signature blocks to each protocol (even if provisional) would make the spec actionable for implementers and more consistent with the OrgMemoryBackend example.

Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 517

Comment:
Five of the six new protocols introduced in this PR lack explicit method signatures. In contrast, `OrgMemoryBackend` (§7.4, line 954) defines its interface clearly:

> `query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`

The following protocols are mentioned by name but have no method signatures defined:
- `ConflictResolver` (§5.6, line 517)
- `MeetingProtocol` (§5.7, line 581)
- `ExecutionLoop` (§6.5, line 749)
- `TrustStrategy` (§11.3, line 1250)
- `TimeoutPolicy` (§12.4, line 1421)

Since a core design principle of this PR is "pluggable strategies behind protocol interfaces," adding signature blocks to each protocol (even if provisional) would make the spec actionable for implementers and more consistent with the `OrgMemoryBackend` example.

How can I resolve this? If you propose a fix, please make it concise.

- **Smart Cost Management** - Per-agent budget tracking, auto model routing, CFO agent optimization
- **Configurable Autonomy** - From fully autonomous to human-approves-everything, with a Security Ops agent in between
- **Persistent Memory** - Agents remember past decisions, code, relationships (via Mem0)
- **Persistent Memory** - Agents remember past decisions, code, relationships (memory layer TBD)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from (via Mem0) to (memory layer TBD) accurately reflects the current state but exposes internal uncertainty to users and contributors reading the README. Since the underlying design commitment (a pluggable memory layer) is firm, consider phrasing that conveys the design decision without exposing the unresolved vendor choice:

Suggested change
- **Persistent Memory** - Agents remember past decisions, code, relationships (memory layer TBD)
- **Persistent Memory** - Agents remember past decisions, code, relationships (via pluggable memory layer)

This keeps the README confident and user-facing while remaining accurate about the architecture.

Prompt To Fix With AI
This is a comment left during a code review.
Path: README.md
Line: 18

Comment:
The change from `(via Mem0)` to `(memory layer TBD)` accurately reflects the current state but exposes internal uncertainty to users and contributors reading the README. Since the underlying design commitment (a pluggable memory layer) is firm, consider phrasing that conveys the design decision without exposing the unresolved vendor choice:

```suggestion
- **Persistent Memory** - Agents remember past decisions, code, relationships (via pluggable memory layer)
```

This keeps the README confident and user-facing while remaining accurate about the architecture.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants