Skip to content

docs: finalize 23 design decisions (ADR-002)#190

Merged
Aureliolo merged 2 commits intomainfrom
docs/adr-002-design-decisions
Mar 10, 2026
Merged

docs: finalize 23 design decisions (ADR-002)#190
Aureliolo merged 2 commits intomainfrom
docs/adr-002-design-decisions

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Rename PENDING-DECISIONS.mdADR-002-design-decisions-batch-1.md (status: DECIDED)
  • Add 12 decision annotation blocks to DESIGN_SPEC.md across 8 sections
  • Create 2 new subsections: §11.1.3 MCP Integration, §11.1.4 Action Type System
  • All 23 decisions follow pluggable protocol pattern with one initial implementation

Decisions covered

Group Decisions Issues
Cross-cutting D1 (action types), D2 (quality scoring), D3 (collaboration scoring) #40, #42, #43, #47, #49, #126
SecOps D4 (hybrid rules+LLM), D5 (integration point) #40
Autonomy D6 (3-level scope), D7 (human-only promotion) #42
HR D8 (instantiation), D9 (reassignment), D10 (archival) #45
Performance D11 (rolling windows), D12 (trend detection) #47
Promotions D13 (criteria), D14 (approval), D15 (model mapping) #49
Sandbox D16 (Docker MVP) #50
MCP D17 (SDK), D18 (result mapping) #53
Timeout D19 (risk tiers), D20 (serialization), D21 (resume) #126
System prompt D22 (remove tools section), D23 (memory filter) #188

GitHub issue bodies for all 10 issues (#40, #42, #43, #45, #47, #49, #50, #53, #126, #188) have been updated with their respective decisions.

Test plan

  • Verify DESIGN_SPEC.md renders correctly (all blockquotes, tables, code blocks)
  • Verify ADR-002 file renders correctly
  • Verify all ADR-002 links in DESIGN_SPEC.md resolve correctly
  • Confirm no existing spec content was modified (only additions)

🤖 Generated with Claude Code

Rename PENDING-DECISIONS.md to ADR-002-design-decisions-batch-1.md
(status: DECIDED). Add decision annotations to DESIGN_SPEC.md across
8 sections with 2 new subsections (§11.1.3 MCP Integration, §11.1.4
Action Type System). All decisions follow pluggable protocol pattern
with one initial implementation.

Decisions cover: action types (D1), quality/collaboration scoring
(D2-D3), SecOps approach and integration (D4-D5), autonomy scope and
changes (D6-D7), HR instantiation/reassignment/archival (D8-D10),
performance metrics windows and trends (D11-D12), promotion criteria/
approval/model-mapping (D13-D15), sandbox backend (D16), MCP SDK and
result mapping (D17-D18), timeout risk tiers/serialization/resume
(D19-D21), system prompt tools removal (D22), memory filter (D23).

GitHub issues updated: #40, #42, #43, #45, #47, #49, #50, #53, #126, #188
Copilot AI review requested due to automatic review settings March 9, 2026 21:45
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 9, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c15418e-fe1e-4e68-8755-5b8d2510be1e

📥 Commits

Reviewing files that changed from the base of the PR and between 92e2e54 and a92b80c.

📒 Files selected for processing (3)
  • DESIGN_SPEC.md
  • README.md
  • docs/decisions/ADR-002-design-decisions-batch-1.md

📝 Walkthrough

Summary by CodeRabbit

  • Documentation
    • Added comprehensive architecture decision records (23 ADRs) covering action taxonomy, memory and persistence, security, autonomy, HR/workflow, performance, MCP integration, and rollout guidance.
  • New Features
    • Public-facing design sections for MCP integration and an action-type system; planned public data-plane declarations for parked context and agent state.
  • Behavior Change
    • System prompts will no longer embed tool definitions; tools are supplied via API parameters.

Walkthrough

Adds a batch of ADRs (D1–D23), new public data-plane declarations (ParkedContext, ParkedContextRepository, AgentStateRepository), and new public design subsections for MCP integration and an Action Type System; updates system-prompt/tool delivery, memory/filtering, HR/devops, and project ADR references. (≈+93/−11 lines)

Changes

Cohort / File(s) Summary
Design spec + ADRs
DESIGN_SPEC.md, docs/decisions/ADR-002-design-decisions-batch-1.md
Introduces ADR-002 with 23 design decisions (D1–D23), adds ADR-driven decision blocks throughout DESIGN_SPEC, formalizes choices for tools-in-prompt (D22), memory filtering (D23), MCP, autonomy, HR, security, performance, and rollout guidance.
Public data-plane declarations (docs)
DESIGN_SPEC.md
Adds planned public data-plane declarations: ParkedContext, ParkedContextRepository, and AgentStateRepository (marked M7) and documents their intended query keys and roles.
README
README.md
Renames milestone item to "Memory Backend Adapter (M5)" and updates milestone description to reflect completed protocols and remaining adapter/backends.

Sequence Diagram(s)

sequenceDiagram
participant Client
participant MCP as "MCP / Orchestration"
participant Engine as "Agent Engine"
participant Store as "Persistence (ParkedContextRepo)"

Client->>MCP: Submit task / request
MCP->>Engine: Dispatch to Agent (includes action-type + tool params via API)
Engine->>Store: If needs park -> persist ParkedContext (execution_id, agent_id, task_id)
Store-->>Engine: Ack
Engine-->>MCP: Parked (park_id) / status
MCP->>Client: Return parked status

%% resume flow
Client->>MCP: Resume request (park_id)
MCP->>Store: Query ParkedContext by park_id
Store-->>MCP: ParkedContext payload
MCP->>Engine: Rehydrate execution (inject resume context)
Engine->>MCP: Execution result / tool outputs
MCP->>Client: Deliver final result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'docs: finalize 23 design decisions (ADR-002)' clearly and specifically summarizes the main change—the documentation finalization of 23 design decisions in an ADR document.
Description check ✅ Passed The description provides a detailed summary of the pull request changes, including file renames, added decision blocks, new subsections, and issue references, all directly related to the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs/adr-002-design-decisions
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch docs/adr-002-design-decisions

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 1665-1670: Update the DESIGN_SPEC.md §7.6 persistence contract to
include the newly introduced runtime repositories and interfaces referenced by
ADR-002 D8 (e.g., add descriptions and method contracts for PersistenceBackend
extensions such as ParkedContextRepository, AgentRepository/AgentStore (for
runtime agent state and hot-plugging via AgentEngine.add_agent/remove_agent),
and any repositories for runtime message/task/cost-record linkage), specifying
expected CRUD/transaction semantics, thread-safety and ownership rules, and how
YAML bootstrap seeds interact with the operational store; ensure the new
repository names (ParkedContextRepository, PersistenceBackend extensions,
AgentRepository or equivalent) are explicitly listed in the inventory and
cross-referenced with ADR-002 and §14.1 template behavior so the spec and
decisions are consistent.
- Around line 974-975: Update the DESIGN_SPEC.md step 3 to stop stating that
build_system_prompt() is called "with agent identity, task, and available tool
definitions"; either remove "available tool definitions" from the call signature
or explicitly document that build_system_prompt() ignores tool definitions and
only accepts non-redundant behavioral guidance, per decision D22. Mention
build_system_prompt() by name and reference AgentContext.from_identity() remains
unchanged for step 4; ensure the text notes that tool definitions are supplied
via the API `tools` parameter (not the system prompt) and that any approved
deviations must be reflected in DESIGN_SPEC.md.
- Around line 2395-2400: The autonomy preset examples in the DESIGN_SPEC.md
presets still use legacy permission strings (e.g., code_changes, file_edits,
git_push); update the auto_approve and human_approval lists in those preset
blocks to use the new category:action taxonomy defined in §11.1.4 (e.g.,
code:changes, file:edit, git:push or comms:internal) or explicitly mark those
example blocks as "legacy" if you intend to keep them unchanged; change the
values in the preset definitions (auto_approve, human_approval) to the canonical
category:action form so readers copy the correct format.
- Around line 2146-2155: TOC for Section 11 is missing entries for the newly
added subsections 11.1.3 (MCP Integration) and 11.1.4 (Action Type System);
update the Table of Contents so Section 11 lists 11.1.3 and 11.1.4 with matching
titles and links, ensuring their anchor/numbering matches the headings ("11.1.3
MCP Integration" and "11.1.4 Action Type System") and adjust any downstream TOC
numbering or anchor targets so navigation remains consistent.

In `@docs/decisions/ADR-002-design-decisions-batch-1.md`:
- Around line 233-237: The review flags that lifecycle APIs
(AgentEngine.add_agent()/remove_agent()) are attached to the wrong abstraction;
remove or relocate agent registry/HR responsibilities from AgentEngine and
implement them on a dedicated company/registry/HR service (or similarly named
Service class) that owns HiringRequest persistence via PersistenceBackend and
wires templates/overrides, ensuring AgentEngine remains a per-agent task runner;
if you intentionally redefine AgentEngine's scope instead, update DESIGN_SPEC.md
and surface that deviation to reviewers/users rather than silently diverging.
- Around line 45-57: The markdown fenced code blocks under the "Proposed
taxonomy (~25 leaf types):" section (the plain list block) and the subsequent
code sample block are missing leading/trailing blank lines and a language tag,
which triggers MD031/MD040; fix both blocks by adding a blank line before and
after each fence and change the opening fence to include a language (e.g.,
```text) so the taxonomy list and the collaboration_score snippet render
consistently and satisfy markdownlint.
- Around line 177-186: The document is inconsistent: the Decision text uses
resolution "agent.autonomy_level or company.autonomy.level" but D6 and
DESIGN_SPEC.md describe a three-level chain including department overrides;
update ADR-002 to match the mandated three-level precedence (agent -> department
-> company) or explicitly justify and change D6/DESIGN_SPEC.md if you intend to
keep two-level resolution. Concretely, change the Decision and Resolution lines
to reference the three-level rule (e.g., resolve as agent.autonomy_level or
department.autonomy.level or company.autonomy.level), update the sentence
introducing the resolution, and add a cross-reference to DESIGN_SPEC.md and D6
so readers see the authoritative precedence and validation rules (e.g.,
seniority-based constraints on AgentIdentity/autonomy_level).
- Around line 193-202: The ADR currently declares "Decision: **(a+c hybrid)**
Human-only for promotion. Automatic downgrade..." but the DESIGN_SPEC.md and PR
objective expect human-only runtime changes now with auto-downgrade as future
work; pick one and make both sources agree. Either (A) change this ADR's
"Decision" paragraph, the table row for "(a+c hybrid)", and the "Key insight"
and "D7" references to state that only human-only runtime changes are
implemented now and auto-downgrade is deferred to future work, or (B) update
DESIGN_SPEC.md/PR objective to accept the (a+c hybrid) choice; ensure the text
around "Decision: **(a+c hybrid)**", the pros/cons table, and the
recovery/auto-downgrade bullets are updated consistently and add a clear note
referencing DESIGN_SPEC.md if you defer (so readers know mandatory spec must be
updated when implementation diverges).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a4dc0e4-208a-4340-9d43-e0745848571e

📥 Commits

Reviewing files that changed from the base of the PR and between 29fcd08 and 92e2e54.

📒 Files selected for processing (2)
  • DESIGN_SPEC.md
  • docs/decisions/ADR-002-design-decisions-batch-1.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality

Files:

  • docs/decisions/ADR-002-design-decisions-batch-1.md
  • DESIGN_SPEC.md
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.md : DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.md : DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality

Applied to files:

  • docs/decisions/ADR-002-design-decisions-batch-1.md
  • DESIGN_SPEC.md
🪛 LanguageTool
docs/decisions/ADR-002-design-decisions-batch-1.md

[typographical] ~25-~25: In American English, use a period after an abbreviation.
Context: ...ve trust. Sub-question 1: Fixed enum vs open/extensible registry? | Option |...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[grammar] ~148-~148: Please add a punctuation mark at the end of paragraph.
Context: ...ntial exposure, data destruction) never bypass Decision: (c) Hybrid. Rule eng...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~165-~165: Ensure spelling is correct
Context: ...ble. Even OPA sidecar evaluations are 1-5ms. Total security overhead: milliseconds ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~308-~308: Redundant conjunctions can lead to confusion; consider removing a conjunction here.
Context: ...15) ### D13: Promotion Criteria Logic (AND/OR) Unblocks: #49 | Option | Pros | ...

(AND_OR)


[style] ~316-~316: Redundant conjunctions can lead to confusion; consider removing a conjunction here.
Context: ...ll). Single ThresholdEvaluator covers AND/OR/threshold | More configuration | **Pre...

(AND_OR)


[typographical] ~375-~375: Insert a space between the numerical value and the unit symbol.
Context: ... via docker.image config - Keep under 500MB; users add Go/Rust via custom images *...

(UNIT_SPACE)


[typographical] ~458-~458: In American English, use a period after an abbreviation.
Context: ...tion needed | Sub-decision: Verbatim vs summarized conversation? - **Verbatim...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~525-~525: In American English, use a period after an abbreviation.
Context: ...ry | #47, #43, #49 | | D4 | SecOps: LLM vs rules | Hybrid: rule engine + LLM | — |...

(MISSING_PERIOD_AFTER_ABBREVIATION)

🪛 markdownlint-cli2 (0.21.0)
docs/decisions/ADR-002-design-decisions-batch-1.md

[warning] 46-46: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 46-46: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 117-117: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 117-117: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 126-126: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on documenting and integrating a batch of 23 key design decisions into the agent framework's documentation. It covers a wide range of topics, solidifying architectural patterns and providing clear guidance for future development.

Highlights

  • Design Decisions: This PR finalizes 23 design decisions (ADR-002) related to various aspects of the agent framework, including action types, quality scoring, collaboration scoring, SecOps, autonomy, HR, performance, promotions, sandboxing, MCP integration, timeouts, and system prompts.
  • Architectural Patterns: The decisions largely follow a pluggable protocol pattern, providing flexibility and extensibility for future implementations.
  • Documentation: The pull request includes updates to DESIGN_SPEC.md with decision annotations and creates a new ADR-002 document to capture the finalized decisions.
Changelog
  • DESIGN_SPEC.md
    • Added decision annotations throughout the document.
    • Created new subsections for MCP Integration and Action Type System.
    • Updated content to reflect finalized design decisions.
  • docs/decisions/ADR-002-design-decisions-batch-1.md
    • Created a new ADR document to capture the details of the 23 finalized design decisions.
Activity
  • Renamed PENDING-DECISIONS.md to ADR-002-design-decisions-batch-1.md.
  • Added 12 decision annotation blocks to DESIGN_SPEC.md.
  • Created 2 new subsections in DESIGN_SPEC.md: §11.1.3 MCP Integration, §11.1.4 Action Type System.
  • Updated GitHub issue bodies for 10 issues with their respective decisions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR finalizes 23 design decisions (ADR-002) accumulated across 10 open issues, renames the pending-decisions file to a proper ADR, and back-annotates DESIGN_SPEC.md with decision blocks in the relevant sections. The changes are documentation-only and follow the project's "protocol-everywhere" philosophy: each decision codifies a pluggable protocol interface with one initial implementation shipped.

Key changes:

  • docs/decisions/ADR-002-design-decisions-batch-1.md created — full decision record for D1–D23 including options, pros/cons, precedents, and chosen approaches
  • DESIGN_SPEC.md annotated with 12 decision blocks across 8 sections; action-type strings in autonomy presets and timeout tiers updated from informal strings (e.g. code_changes, deployment) to the new formal category:action taxonomy (e.g. code:write, deploy:production)
  • 2 new subsections added: §11.1.3 MCP Integration (D17, D18) and §11.1.4 Action Type System (D1)
  • README.md updated to reflect completed memory infrastructure

Issues found:

  • D18 metadata destination ambiguous (logic): The D18 mapping bullets and the corresponding DESIGN_SPEC §11.1.3 annotation reference metadata["attachments"] / metadata["structured_content"] without specifying that the owner is ToolExecutionResult (not ToolResult). This directly contradicts the option text which explicitly notes ToolResult has no metadata field. An implementer reading only the mapping could incorrectly add a metadata field to ToolResult.
  • D7 summary table truncated (style): The ADR summary table "Initial Impl" column for D7 records only "Human-only promotion", omitting the automatic-downgrade leg (error rate / budget exhausted / security incident) which is part of the decided (a+c hybrid) initial implementation.

Confidence Score: 4/5

  • Safe to merge — documentation-only PR with no code changes; two minor doc inconsistencies are fixable post-merge or before.
  • All changes are documentation. The D18 metadata ambiguity is a real implementer risk but does not break any existing functionality; it only affects future implementation of the MCP bridge. The D7 summary table truncation is a clarity issue with low impact since the ADR body and DESIGN_SPEC annotation are both correct. No code, tests, or runtime behaviour is affected by this PR.
  • docs/decisions/ADR-002-design-decisions-batch-1.md — D18 mapping bullets (lines 424–426) and D7 summary table (line 531) need corrections before the file is used as an implementation reference.

Important Files Changed

Filename Overview
docs/decisions/ADR-002-design-decisions-batch-1.md New ADR file documenting 23 finalized design decisions. Two issues found: (1) D18 mapping bullets drop the ToolExecutionResult qualifier, making the metadata storage destination ambiguous vs ToolResult; (2) D7 summary table "Initial Impl" column truncates the decided (a+c hybrid) to "Human-only promotion", omitting the automatic-downgrade leg.
DESIGN_SPEC.md 12 decision annotation blocks added across 8 sections, plus 2 new subsections (§11.1.3 MCP Integration, §11.1.4 Action Type System). Action type references updated from informal strings to the new category:action taxonomy. The D7 annotation correctly captures the (a+c hybrid) including auto-downgrade — a direct improvement over the pre-PR state flagged in prior review threads. The D18 annotation inherits the same metadata-field ambiguity noted in ADR-002.
README.md Minor clarification: "Memory Backends (M5)" renamed to "Memory Backend Adapter (M5)" with an expanded description reflecting completed memory infrastructure. No issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Action triggered] --> B{ActionTypeRegistry\nlookup}
    B -->|Unknown type| C[Reject at config\nload time]
    B -->|Known type| D[ToolPermissionChecker]
    D --> E{SecurityInterceptionStrategy\nD5: before every tool call}
    E -->|Rules fast path ~95%| F[Rule Engine\nsub-ms]
    E -->|Uncertain ~5%| G[LLM SecOps\n0.5-2s]
    F -->|Blocked| H[Deny / Escalate]
    G -->|Blocked| H
    F -->|Allowed| I{Autonomy Preset\nD6: 3-level resolution}
    G -->|Allowed| I
    I -->|auto_approve| J[Execute Tool]
    I -->|human_approval| K{RiskTierClassifier\nD19: YAML mapping}
    K -->|low_risk| L[Park context\nD20: ParkedContext\nPydantic JSON]
    K -->|medium_risk| L
    K -->|high_risk| L
    L --> M[Human Review Queue]
    M -->|Approved| N[Resume via\ntool result injection\nD21]
    M -->|Denied / Timeout| O[Deny reason\nto agent]
    N --> J
    J --> P[ToolResult]
    P -->|MCP tool| Q[MCPBridgeTool\nD18: adapter\ntext→content\nbinary→ToolExecutionResult.metadata]
    Q --> P
Loading

Last reviewed commit: a92b80c

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of formalizing and documenting 23 key design decisions in a new Architecture Decision Record (ADR-002) and integrating these decisions back into the main DESIGN_SPEC.md. The structure of the ADR is clear and well-reasoned, covering options, precedents, and final choices for each decision. The changes significantly improve the clarity and completeness of the project's design documentation. I have one minor suggestion to ensure consistency between the ADR and the design spec regarding the autonomy scope resolution.

Note: Security Review has been skipped due to the limited scope of the PR.


**Precedents:** CrewAI has 24 per-agent attributes. AutoGen has per-agent `human_input_mode`. LangGraph has per-node `interrupt_before`/`interrupt_after`. CSA Agentic Trust Framework requires per-agent identity and trust level.

**Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a small inconsistency in the resolution logic for autonomy scope. The decision text here mentions a two-level resolution (agent.autonomy_level or company.autonomy.level), but the corresponding update in DESIGN_SPEC.md and the summary table at the end of this ADR describe a more robust three-level chain that includes the department level (agent.autonomy_level or department.autonomy_level or company.autonomy.level). To ensure consistency across the design documentation, it would be best to update this section to reflect the three-level resolution chain.

Suggested change
**Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`).
**Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` and department config (default: None = use company default). Resolution: `agent.autonomy_level or department.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR documents the final design decisions for 23 architectural items (D1–D23) covering 10 feature areas across the codebase. It creates a new ADR file (ADR-002-design-decisions-batch-1.md) containing the full decision records with options/pros/cons analysis, and backfills 12 blockquote annotations into the relevant sections of DESIGN_SPEC.md linking to the ADR.

Changes:

  • Create docs/decisions/ADR-002-design-decisions-batch-1.md: 544-line ADR documenting D1–D23 with options tables, precedents, and a summary table
  • Add 12 decision annotation blockquotes to DESIGN_SPEC.md across §6.5, §7.7, §8.1–8.4, §11.1.2, §12.2, §12.3, and §12.4
  • Add two new subsections to DESIGN_SPEC.md: §11.1.3 (MCP Integration) and §11.1.4 (Action Type System)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
docs/decisions/ADR-002-design-decisions-batch-1.md New ADR with 23 architecture decisions, options analysis, precedents, and a summary table
DESIGN_SPEC.md 12 ADR annotation blockquotes added inline; 2 new subsections for MCP and action type system

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


**Precedents:** CrewAI has 24 per-agent attributes. AutoGen has per-agent `human_input_mode`. LangGraph has per-node `interrupt_before`/`interrupt_after`. CSA Agentic Trust Framework requires per-agent identity and trust level.

**Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`).
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The D6 decision body at line 185 describes only a two-level resolution chain — agent.autonomy_level or company.autonomy.level — with no department level. However, both the summary table (line 527) and the DESIGN_SPEC.md annotation (line 2397) describe a three-level chain: agent.autonomy_level or department.autonomy_level or company.autonomy.level. This inconsistency within the ADR itself, and between the ADR body and the DESIGN_SPEC annotation, could lead to implementation confusion: does D6 add a per-department layer or not? The decision body should be updated to include the department-level resolution (matching the summary table, DESIGN_SPEC annotation, and issue #42).

Copilot uses AI. Check for mistakes.
Comment on lines +198 to +202
| **(a+c hybrid) Human-only promotion + automatic downgrade (CHOSEN)** | Asymmetric trust: gaining trust is hard, losing it is easy; matches Azure Conditional Access (only restricts, never loosens) | Two code paths |

**Key insight:** No real-world security system automatically grants higher privileges. Conditional access only steps UP requirements, never DOWN. The SEAgent MAC framework explicitly prevents agents from self-modifying policies.

**Decision:** **(a+c hybrid)** Human-only for promotion. Automatic downgrade on: high error rate → downgrade one level, budget exhausted → supervised, security incident → locked. Recovery from auto-downgrade: human-only.
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The D7 decision body (lines 198–202) marks the (a+c hybrid) as CHOSEN and explicitly states "Automatic downgrade on: high error rate → downgrade one level, budget exhausted → supervised, security incident → locked" as part of the decided implementation. However, the summary table (line 528) lists only "Human-only promotion" as the initial implementation, and the DESIGN_SPEC.md annotation (line 2398) puts the auto-downgrade rules in "Future strategies." This three-way contradiction needs to be resolved: either the initial implementation includes both human-only promotion AND automatic downgrade (as the D7 decision body states), or auto-downgrade is deferred to future (as the summary table and DESIGN_SPEC annotation indicate). This directly impacts which code gets written for the initial milestone.

Copilot uses AI. Check for mistakes.

**Precedents:** AWS IAM uses open namespaced strings (`s3:GetObject`). Kubernetes RBAC uses semi-open verbs. GitHub uses closed scopes. OPA/Rego uses open policy strings. Every production security system validates action strings against a known set.

**Decision:** **(c) Enum core + validated registry.** StrEnum for built-in types (~20), plus an `ActionTypeRegistry` that accepts custom strings only if explicitly registered. Unknown strings rejected at config load time. Critical for security — a typo in `human_approval` list silently means "skip approval."
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 35 describes the StrEnum as covering "built-in types (~20)", but the proposed taxonomy listed in lines 46–57 contains 24 distinct leaf types (5 code:* + 2 test:* + 1 docs:* + 3 vcs:* + 2 deploy:* + 2 comms:* + 2 budget:* + 3 org:* + 3 db:* + 1 arch:*). The DESIGN_SPEC.md annotation at line 2159 also says "~20 built-in action types." These estimates should be reconciled with the actual taxonomy: the count is 24, not ~20.

Copilot uses AI. Check for mistakes.
DESIGN_SPEC.md Outdated
> **Decisions ([ADR-002](docs/decisions/ADR-002-design-decisions-batch-1.md) D6, D7):**
>
> - **D6 — Autonomy Scope:** Three-level resolution chain: per-agent → per-department → company default. Optional `autonomy_level` on `AgentIdentity` and department config. Resolution: `agent.autonomy_level or department.autonomy_level or company.autonomy.level`. Seniority validation: Juniors/Interns cannot be set to `full`.
> - **D7 — Autonomy Changes at Runtime:** Pluggable `AutonomyChangeStrategy` protocol. Initial: human-only promotion via REST API. No agent (including CEO) can escalate privileges. Future strategies: human-only + auto-downgrade (on high error rate → one level down, budget exhausted → supervised, security incident → locked; recovery from auto-downgrade: human-only). Precedent: no real-world security system automatically grants higher privileges.
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The D7 annotation here lists the auto-downgrade rules (on high error rate → one level down, budget exhausted → supervised, security incident → locked) as "Future strategies", but the ADR-002 D7 decision body (lines 198–202) explicitly marks (a+c hybrid) (human-only promotion + automatic downgrade) as CHOSEN and describes the auto-downgrade triggers as part of the decided implementation. This inconsistency between the ADR body and the DESIGN_SPEC annotation needs to be resolved: if auto-downgrade is intended for the initial implementation, this annotation should list it as the initial impl; if it's deferred, the ADR decision body should be corrected to reflect that.

Copilot uses AI. Check for mistakes.
DESIGN_SPEC.md Outdated
>
> Action types classify agent actions for use by autonomy presets (§12.2), SecOps validation (§12.3), tiered timeout policies (§12.4), and progressive trust (§11.3). Three sub-decisions:
>
> - **D1.1 — Registry:** `StrEnum` for ~20 built-in action types (type safety, autocomplete, typos caught at compile time) + `ActionTypeRegistry` for custom types via explicit registration. Unknown strings rejected at config load time. Critical for security — a typo in `human_approval` list silently means "skip approval."
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 2159 states "~20 built-in action types" but the proposed taxonomy listed just below (and in the ADR at lines 46–57) contains 24 distinct leaf types. This count should be updated to "~25" (consistent with the taxonomy header at line 2162) or the actual number of 24.

Copilot uses AI. Check for mistakes.
…ewers

- D6: Fix two-level → three-level autonomy resolution chain (agent → department → company)
- D7: Move auto-downgrade from "Future" to "Initial" in DESIGN_SPEC annotation
- D1: Reconcile ~20 → ~25 built-in action type count
- D5: Add SecurityInterceptionStrategy protocol name to decision body
- D8.3: Replace AgentEngine.add_agent() with company/registry service
- D18: Clarify metadata lives on ToolExecutionResult, not ToolResult
- D22: Extract inline annotation to blockquote; remove "tool definitions" from step 3
- Update autonomy preset + tiered timeout YAML to category:action taxonomy
- Add 11.1.3/11.1.4 to ToC, ADR-002 to project structure
- Add ParkedContext + AgentState to §7.6 entity table
- Fix M4 status label, README memory description, markdown lint issues
- Align sub-decision numbering (D1.1/D8.1 cross-refs)
- Add "estimated" qualifier to D4 percentages
- Explain "11 agents" count in ADR-002 header
@Aureliolo Aureliolo merged commit 8c39742 into main Mar 10, 2026
8 of 9 checks passed
@Aureliolo Aureliolo deleted the docs/adr-002-design-decisions branch March 10, 2026 06:14
Comment on lines +418 to +428
| Option | Pros | Cons |
|--------|------|------|
| **(a) Extend ToolResult to support multi-modal** | Native support for images/resources | Cascading changes across entire codebase; LLM providers consume tool results as text anyway |
| **(b) Adapter in MCPBridgeTool; keep ToolResult as-is (CHOSEN)** | Zero disruption; text concatenation for LLM path; rich content stored in `ToolExecutionResult.metadata` (not `ToolResult`, which has no metadata field); MCP spec requires TextContent block alongside structured content | Non-text content requires metadata extraction |

**Mapping:**
- Text blocks → concatenate into `content: str`
- Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `metadata["attachments"]`
- `structuredContent` → `metadata["structured_content"]`
- `isError` → `is_error` (direct 1:1)
- `tool_call_id` assigned by our framework, associated back after MCP response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D18 metadata destination is ambiguous — ToolResult vs ToolExecutionResult

The option (b) row (line 421) explicitly states that rich content is stored in ToolExecutionResult.metadata with the parenthetical "(not ToolResult, which has no metadata field)". However, the Mapping bullets immediately below drop that qualifier and just write metadata["attachments"] and metadata["structured_content"] — leaving it ambiguous which model's metadata is being referenced.

The DESIGN_SPEC §11.1.3 annotation (line 2155) copies the same bare metadata[...] phrasing without the qualifier.

An implementer reading only the mapping bullets (or only the spec annotation) has two plausible interpretations:

  1. Add a metadata field to ToolResult — which directly contradicts the "keep ToolResult as-is" decision and the parenthetical in option (b).
  2. Store on ToolExecutionResult — which is what option (b) intends, but is not stated in the actionable part of the decision.

The mapping bullets should explicitly name the owner:

Suggested change
| Option | Pros | Cons |
|--------|------|------|
| **(a) Extend ToolResult to support multi-modal** | Native support for images/resources | Cascading changes across entire codebase; LLM providers consume tool results as text anyway |
| **(b) Adapter in MCPBridgeTool; keep ToolResult as-is (CHOSEN)** | Zero disruption; text concatenation for LLM path; rich content stored in `ToolExecutionResult.metadata` (not `ToolResult`, which has no metadata field); MCP spec requires TextContent block alongside structured content | Non-text content requires metadata extraction |
**Mapping:**
- Text blocks → concatenate into `content: str`
- Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `metadata["attachments"]`
- `structuredContent``metadata["structured_content"]`
- `isError``is_error` (direct 1:1)
- `tool_call_id` assigned by our framework, associated back after MCP response
**Mapping:**
- Text blocks → concatenate into `content: str`
- Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `ToolExecutionResult.metadata["attachments"]`
- `structuredContent``ToolExecutionResult.metadata["structured_content"]`
- `isError``is_error` (direct 1:1)
- `tool_call_id` assigned by our framework, associated back after MCP response

The same fix should be applied to the D18 annotation in DESIGN_SPEC §11.1.3 (line 2155).

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/decisions/ADR-002-design-decisions-batch-1.md
Line: 418-428

Comment:
**D18 metadata destination is ambiguous — `ToolResult` vs `ToolExecutionResult`**

The option (b) row (line 421) explicitly states that rich content is stored in `ToolExecutionResult.metadata` with the parenthetical **"(not `ToolResult`, which has no metadata field)"**. However, the **Mapping** bullets immediately below drop that qualifier and just write `metadata["attachments"]` and `metadata["structured_content"]` — leaving it ambiguous which model's `metadata` is being referenced.

The DESIGN_SPEC §11.1.3 annotation (line 2155) copies the same bare `metadata[...]` phrasing without the qualifier.

An implementer reading only the mapping bullets (or only the spec annotation) has two plausible interpretations:
1. Add a `metadata` field to `ToolResult` — which directly contradicts the "keep `ToolResult` as-is" decision and the parenthetical in option (b).
2. Store on `ToolExecutionResult` — which is what option (b) intends, but is not stated in the actionable part of the decision.

The mapping bullets should explicitly name the owner:

```suggestion
**Mapping:**
- Text blocks → concatenate into `content: str`
- Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `ToolExecutionResult.metadata["attachments"]`
- `structuredContent` → `ToolExecutionResult.metadata["structured_content"]`
- `isError` → `is_error` (direct 1:1)
- `tool_call_id` assigned by our framework, associated back after MCP response
```

The same fix should be applied to the D18 annotation in DESIGN_SPEC §11.1.3 (line 2155).

How can I resolve this? If you propose a fix, please make it concise.

| D4 | SecOps: LLM vs rules | Hybrid: rule engine + LLM | — | Rule fast path (~95%) + LLM slow path (~5%) | #40 |
| D5 | SecOps: integration point | Pluggable + configurable | `SecurityInterceptionStrategy` | Before every tool invocation | #40 |
| D6 | Autonomy: scope | Three-level chain | — | Agent → department → company default | #42 |
| D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion | #42 |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D7 summary table "Initial Impl" omits automatic downgrade

The D7 body (line 205) decides the (a+c hybrid): human-only promotion plus automatic downgrade on specific conditions (high error rate → one level down, budget exhausted → supervised, security incident → locked). The summary table's "Initial Impl" column only records "Human-only promotion", which misrepresents the decided initial implementation by omitting the entire automatic-downgrade leg.

This is distinct from the previously flagged DESIGN_SPEC annotation issue — the DESIGN_SPEC annotation in this PR now correctly reflects the hybrid. The summary table itself is the remaining inconsistency.

Suggested change
| D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion | #42 |
| D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion + auto-downgrade (error rate/budget/security) | #42 |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/decisions/ADR-002-design-decisions-batch-1.md
Line: 531

Comment:
**D7 summary table "Initial Impl" omits automatic downgrade**

The D7 body (line 205) decides the **(a+c hybrid)**: human-only promotion **plus** automatic downgrade on specific conditions (high error rate → one level down, budget exhausted → supervised, security incident → locked). The summary table's "Initial Impl" column only records "Human-only promotion", which misrepresents the decided initial implementation by omitting the entire automatic-downgrade leg.

This is distinct from the previously flagged DESIGN_SPEC annotation issue — the DESIGN_SPEC annotation in this PR now correctly reflects the hybrid. The summary table itself is the remaining inconsistency.

```suggestion
| D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion + auto-downgrade (error rate/budget/security) | #42 |
```

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants