docs: finalize 23 design decisions (ADR-002) by Aureliolo · Pull Request #190 · Aureliolo/synthorg

Aureliolo · 2026-03-09T21:45:14Z

Summary

Rename PENDING-DECISIONS.md → ADR-002-design-decisions-batch-1.md (status: DECIDED)
Add 12 decision annotation blocks to DESIGN_SPEC.md across 8 sections
Create 2 new subsections: §11.1.3 MCP Integration, §11.1.4 Action Type System
All 23 decisions follow pluggable protocol pattern with one initial implementation

Decisions covered

Group	Decisions	Issues
Cross-cutting	D1 (action types), D2 (quality scoring), D3 (collaboration scoring)	#40, #42, #43, #47, #49, #126
SecOps	D4 (hybrid rules+LLM), D5 (integration point)	#40
Autonomy	D6 (3-level scope), D7 (human-only promotion)	#42
HR	D8 (instantiation), D9 (reassignment), D10 (archival)	#45
Performance	D11 (rolling windows), D12 (trend detection)	#47
Promotions	D13 (criteria), D14 (approval), D15 (model mapping)	#49
Sandbox	D16 (Docker MVP)	#50
MCP	D17 (SDK), D18 (result mapping)	#53
Timeout	D19 (risk tiers), D20 (serialization), D21 (resume)	#126
System prompt	D22 (remove tools section), D23 (memory filter)	#188

GitHub issue bodies for all 10 issues (#40, #42, #43, #45, #47, #49, #50, #53, #126, #188) have been updated with their respective decisions.

Test plan

Verify DESIGN_SPEC.md renders correctly (all blockquotes, tables, code blocks)
Verify ADR-002 file renders correctly
Verify all ADR-002 links in DESIGN_SPEC.md resolve correctly
Confirm no existing spec content was modified (only additions)

🤖 Generated with Claude Code

Rename PENDING-DECISIONS.md to ADR-002-design-decisions-batch-1.md (status: DECIDED). Add decision annotations to DESIGN_SPEC.md across 8 sections with 2 new subsections (§11.1.3 MCP Integration, §11.1.4 Action Type System). All decisions follow pluggable protocol pattern with one initial implementation. Decisions cover: action types (D1), quality/collaboration scoring (D2-D3), SecOps approach and integration (D4-D5), autonomy scope and changes (D6-D7), HR instantiation/reassignment/archival (D8-D10), performance metrics windows and trends (D11-D12), promotion criteria/ approval/model-mapping (D13-D15), sandbox backend (D16), MCP SDK and result mapping (D17-D18), timeout risk tiers/serialization/resume (D19-D21), system prompt tools removal (D22), memory filter (D23). GitHub issues updated: #40, #42, #43, #45, #47, #49, #50, #53, #126, #188

github-actions · 2026-03-09T21:45:24Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-03-09T21:45:36Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c15418e-fe1e-4e68-8755-5b8d2510be1e

📥 Commits

Reviewing files that changed from the base of the PR and between 92e2e54 and a92b80c.

📒 Files selected for processing (3)

DESIGN_SPEC.md
README.md
docs/decisions/ADR-002-design-decisions-batch-1.md

📝 Walkthrough

Summary by CodeRabbit

Documentation
- Added comprehensive architecture decision records (23 ADRs) covering action taxonomy, memory and persistence, security, autonomy, HR/workflow, performance, MCP integration, and rollout guidance.
New Features
- Public-facing design sections for MCP integration and an action-type system; planned public data-plane declarations for parked context and agent state.
Behavior Change
- System prompts will no longer embed tool definitions; tools are supplied via API parameters.

Walkthrough

Adds a batch of ADRs (D1–D23), new public data-plane declarations (ParkedContext, ParkedContextRepository, AgentStateRepository), and new public design subsections for MCP integration and an Action Type System; updates system-prompt/tool delivery, memory/filtering, HR/devops, and project ADR references. (≈+93/−11 lines)

Changes

Cohort / File(s)	Summary
Design spec + ADRs `DESIGN_SPEC.md`, `docs/decisions/ADR-002-design-decisions-batch-1.md`	Introduces ADR-002 with 23 design decisions (D1–D23), adds ADR-driven decision blocks throughout DESIGN_SPEC, formalizes choices for tools-in-prompt (D22), memory filtering (D23), MCP, autonomy, HR, security, performance, and rollout guidance.
Public data-plane declarations (docs) `DESIGN_SPEC.md`	Adds planned public data-plane declarations: `ParkedContext`, `ParkedContextRepository`, and `AgentStateRepository` (marked M7) and documents their intended query keys and roles.
README `README.md`	Renames milestone item to "Memory Backend Adapter (M5)" and updates milestone description to reflect completed protocols and remaining adapter/backends.

Sequence Diagram(s)

sequenceDiagram
participant Client
participant MCP as "MCP / Orchestration"
participant Engine as "Agent Engine"
participant Store as "Persistence (ParkedContextRepo)"

Client->>MCP: Submit task / request
MCP->>Engine: Dispatch to Agent (includes action-type + tool params via API)
Engine->>Store: If needs park -> persist ParkedContext (execution_id, agent_id, task_id)
Store-->>Engine: Ack
Engine-->>MCP: Parked (park_id) / status
MCP->>Client: Return parked status

%% resume flow
Client->>MCP: Resume request (park_id)
MCP->>Store: Query ParkedContext by park_id
Store-->>MCP: ParkedContext payload
MCP->>Engine: Rehydrate execution (inject resume context)
Engine->>MCP: Execution result / tool outputs
MCP->>Client: Deliver final result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

docs: finalize 23 design decisions (ADR-002) #190: Adds ADR-002 and overlapping ADR sections (D22/D23, MCP/Action Type subsections) — likely direct overlap.
docs: expand design spec with pluggable strategy protocols #121: Changes DESIGN_SPEC around park/resume semantics and memory-layer wording; overlaps ParkedContext and resume behavior.
docs: update project docs for M2.5 conventions and add docs-consistency review agent #114: Modifies public design surface (agent runtime/config split, tool execution model) and may intersect with tool delivery and MCP SDK decisions.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'docs: finalize 23 design decisions (ADR-002)' clearly and specifically summarizes the main change—the documentation finalization of 23 design decisions in an ADR document.
Description check	✅ Passed	The description provides a detailed summary of the pull request changes, including file renames, added decision blocks, new subsections, and issue references, all directly related to the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch docs/adr-002-design-decisions

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch docs/adr-002-design-decisions

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 1665-1670: Update the DESIGN_SPEC.md §7.6 persistence contract to
include the newly introduced runtime repositories and interfaces referenced by
ADR-002 D8 (e.g., add descriptions and method contracts for PersistenceBackend
extensions such as ParkedContextRepository, AgentRepository/AgentStore (for
runtime agent state and hot-plugging via AgentEngine.add_agent/remove_agent),
and any repositories for runtime message/task/cost-record linkage), specifying
expected CRUD/transaction semantics, thread-safety and ownership rules, and how
YAML bootstrap seeds interact with the operational store; ensure the new
repository names (ParkedContextRepository, PersistenceBackend extensions,
AgentRepository or equivalent) are explicitly listed in the inventory and
cross-referenced with ADR-002 and §14.1 template behavior so the spec and
decisions are consistent.
- Around line 974-975: Update the DESIGN_SPEC.md step 3 to stop stating that
build_system_prompt() is called "with agent identity, task, and available tool
definitions"; either remove "available tool definitions" from the call signature
or explicitly document that build_system_prompt() ignores tool definitions and
only accepts non-redundant behavioral guidance, per decision D22. Mention
build_system_prompt() by name and reference AgentContext.from_identity() remains
unchanged for step 4; ensure the text notes that tool definitions are supplied
via the API `tools` parameter (not the system prompt) and that any approved
deviations must be reflected in DESIGN_SPEC.md.
- Around line 2395-2400: The autonomy preset examples in the DESIGN_SPEC.md
presets still use legacy permission strings (e.g., code_changes, file_edits,
git_push); update the auto_approve and human_approval lists in those preset
blocks to use the new category:action taxonomy defined in §11.1.4 (e.g.,
code:changes, file:edit, git:push or comms:internal) or explicitly mark those
example blocks as "legacy" if you intend to keep them unchanged; change the
values in the preset definitions (auto_approve, human_approval) to the canonical
category:action form so readers copy the correct format.
- Around line 2146-2155: TOC for Section 11 is missing entries for the newly
added subsections 11.1.3 (MCP Integration) and 11.1.4 (Action Type System);
update the Table of Contents so Section 11 lists 11.1.3 and 11.1.4 with matching
titles and links, ensuring their anchor/numbering matches the headings ("11.1.3
MCP Integration" and "11.1.4 Action Type System") and adjust any downstream TOC
numbering or anchor targets so navigation remains consistent.

In `@docs/decisions/ADR-002-design-decisions-batch-1.md`:
- Around line 233-237: The review flags that lifecycle APIs
(AgentEngine.add_agent()/remove_agent()) are attached to the wrong abstraction;
remove or relocate agent registry/HR responsibilities from AgentEngine and
implement them on a dedicated company/registry/HR service (or similarly named
Service class) that owns HiringRequest persistence via PersistenceBackend and
wires templates/overrides, ensuring AgentEngine remains a per-agent task runner;
if you intentionally redefine AgentEngine's scope instead, update DESIGN_SPEC.md
and surface that deviation to reviewers/users rather than silently diverging.
- Around line 45-57: The markdown fenced code blocks under the "Proposed
taxonomy (~25 leaf types):" section (the plain list block) and the subsequent
code sample block are missing leading/trailing blank lines and a language tag,
which triggers MD031/MD040; fix both blocks by adding a blank line before and
after each fence and change the opening fence to include a language (e.g.,
```text) so the taxonomy list and the collaboration_score snippet render
consistently and satisfy markdownlint.
- Around line 177-186: The document is inconsistent: the Decision text uses
resolution "agent.autonomy_level or company.autonomy.level" but D6 and
DESIGN_SPEC.md describe a three-level chain including department overrides;
update ADR-002 to match the mandated three-level precedence (agent -> department
-> company) or explicitly justify and change D6/DESIGN_SPEC.md if you intend to
keep two-level resolution. Concretely, change the Decision and Resolution lines
to reference the three-level rule (e.g., resolve as agent.autonomy_level or
department.autonomy.level or company.autonomy.level), update the sentence
introducing the resolution, and add a cross-reference to DESIGN_SPEC.md and D6
so readers see the authoritative precedence and validation rules (e.g.,
seniority-based constraints on AgentIdentity/autonomy_level).
- Around line 193-202: The ADR currently declares "Decision: **(a+c hybrid)**
Human-only for promotion. Automatic downgrade..." but the DESIGN_SPEC.md and PR
objective expect human-only runtime changes now with auto-downgrade as future
work; pick one and make both sources agree. Either (A) change this ADR's
"Decision" paragraph, the table row for "(a+c hybrid)", and the "Key insight"
and "D7" references to state that only human-only runtime changes are
implemented now and auto-downgrade is deferred to future work, or (B) update
DESIGN_SPEC.md/PR objective to accept the (a+c hybrid) choice; ensure the text
around "Decision: **(a+c hybrid)**", the pros/cons table, and the
recovery/auto-downgrade bullets are updated consistently and add a clear note
referencing DESIGN_SPEC.md if you defer (so readers know mandatory spec must be
updated when implementation diverges).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a4dc0e4-208a-4340-9d43-e0745848571e

📥 Commits

Reviewing files that changed from the base of the PR and between 29fcd08 and 92e2e54.

📒 Files selected for processing (2)

DESIGN_SPEC.md
docs/decisions/ADR-002-design-decisions-batch-1.md

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Agent
GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (1)

**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality

Files:

docs/decisions/ADR-002-design-decisions-batch-1.md
DESIGN_SPEC.md

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.md : DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality

📚 Learning: 2026-03-09T20:32:32.549Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.md : DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality

Applied to files:

docs/decisions/ADR-002-design-decisions-batch-1.md
DESIGN_SPEC.md

🪛 LanguageTool

docs/decisions/ADR-002-design-decisions-batch-1.md

[typographical] ~25-~25: In American English, use a period after an abbreviation.
Context: ...ve trust. Sub-question 1: Fixed enum vs open/extensible registry? | Option |...

(MISSING_PERIOD_AFTER_ABBREVIATION)

[grammar] ~148-~148: Please add a punctuation mark at the end of paragraph.
Context: ...ntial exposure, data destruction) never bypass Decision: (c) Hybrid. Rule eng...

(PUNCTUATION_PARAGRAPH_END)

[grammar] ~165-~165: Ensure spelling is correct
Context: ...ble. Even OPA sidecar evaluations are 1-5ms. Total security overhead: milliseconds ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~308-~308: Redundant conjunctions can lead to confusion; consider removing a conjunction here.
Context: ...15) ### D13: Promotion Criteria Logic (AND/OR) Unblocks: #49 | Option | Pros | ...

(AND_OR)

[style] ~316-~316: Redundant conjunctions can lead to confusion; consider removing a conjunction here.
Context: ...ll). Single ThresholdEvaluator covers AND/OR/threshold | More configuration | **Pre...

(AND_OR)

[typographical] ~375-~375: Insert a space between the numerical value and the unit symbol.
Context: ... via docker.image config - Keep under 500MB; users add Go/Rust via custom images *...

(UNIT_SPACE)

[typographical] ~458-~458: In American English, use a period after an abbreviation.
Context: ...tion needed | Sub-decision: Verbatim vs summarized conversation? - **Verbatim...

(MISSING_PERIOD_AFTER_ABBREVIATION)

[typographical] ~525-~525: In American English, use a period after an abbreviation.
Context: ...ry | #47, #43, #49 | | D4 | SecOps: LLM vs rules | Hybrid: rule engine + LLM | — |...

(MISSING_PERIOD_AFTER_ABBREVIATION)

🪛 markdownlint-cli2 (0.21.0)

docs/decisions/ADR-002-design-decisions-batch-1.md

[warning] 46-46: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

[warning] 46-46: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 117-117: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

[warning] 117-117: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 126-126: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

DESIGN_SPEC.md

docs/decisions/ADR-002-design-decisions-batch-1.md

gemini-code-assist · 2026-03-09T21:49:19Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on documenting and integrating a batch of 23 key design decisions into the agent framework's documentation. It covers a wide range of topics, solidifying architectural patterns and providing clear guidance for future development.

Highlights

Design Decisions: This PR finalizes 23 design decisions (ADR-002) related to various aspects of the agent framework, including action types, quality scoring, collaboration scoring, SecOps, autonomy, HR, performance, promotions, sandboxing, MCP integration, timeouts, and system prompts.
Architectural Patterns: The decisions largely follow a pluggable protocol pattern, providing flexibility and extensibility for future implementations.
Documentation: The pull request includes updates to DESIGN_SPEC.md with decision annotations and creates a new ADR-002 document to capture the finalized decisions.

Changelog

DESIGN_SPEC.md
- Added decision annotations throughout the document.
- Created new subsections for MCP Integration and Action Type System.
- Updated content to reflect finalized design decisions.
docs/decisions/ADR-002-design-decisions-batch-1.md
- Created a new ADR document to capture the details of the 23 finalized design decisions.

Activity

Renamed PENDING-DECISIONS.md to ADR-002-design-decisions-batch-1.md.
Added 12 decision annotation blocks to DESIGN_SPEC.md.
Created 2 new subsections in DESIGN_SPEC.md: §11.1.3 MCP Integration, §11.1.4 Action Type System.
Updated GitHub issue bodies for 10 issues with their respective decisions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

greptile-apps · 2026-03-09T21:49:48Z

Greptile Summary

This PR finalizes 23 design decisions (ADR-002) accumulated across 10 open issues, renames the pending-decisions file to a proper ADR, and back-annotates DESIGN_SPEC.md with decision blocks in the relevant sections. The changes are documentation-only and follow the project's "protocol-everywhere" philosophy: each decision codifies a pluggable protocol interface with one initial implementation shipped.

Key changes:

docs/decisions/ADR-002-design-decisions-batch-1.md created — full decision record for D1–D23 including options, pros/cons, precedents, and chosen approaches
DESIGN_SPEC.md annotated with 12 decision blocks across 8 sections; action-type strings in autonomy presets and timeout tiers updated from informal strings (e.g. code_changes, deployment) to the new formal category:action taxonomy (e.g. code:write, deploy:production)
2 new subsections added: §11.1.3 MCP Integration (D17, D18) and §11.1.4 Action Type System (D1)
README.md updated to reflect completed memory infrastructure

Issues found:

D18 metadata destination ambiguous (logic): The D18 mapping bullets and the corresponding DESIGN_SPEC §11.1.3 annotation reference metadata["attachments"] / metadata["structured_content"] without specifying that the owner is ToolExecutionResult (not ToolResult). This directly contradicts the option text which explicitly notes ToolResult has no metadata field. An implementer reading only the mapping could incorrectly add a metadata field to ToolResult.
D7 summary table truncated (style): The ADR summary table "Initial Impl" column for D7 records only "Human-only promotion", omitting the automatic-downgrade leg (error rate / budget exhausted / security incident) which is part of the decided (a+c hybrid) initial implementation.

Confidence Score: 4/5

Safe to merge — documentation-only PR with no code changes; two minor doc inconsistencies are fixable post-merge or before.
All changes are documentation. The D18 metadata ambiguity is a real implementer risk but does not break any existing functionality; it only affects future implementation of the MCP bridge. The D7 summary table truncation is a clarity issue with low impact since the ADR body and DESIGN_SPEC annotation are both correct. No code, tests, or runtime behaviour is affected by this PR.
docs/decisions/ADR-002-design-decisions-batch-1.md — D18 mapping bullets (lines 424–426) and D7 summary table (line 531) need corrections before the file is used as an implementation reference.

Important Files Changed

Filename	Overview
docs/decisions/ADR-002-design-decisions-batch-1.md	New ADR file documenting 23 finalized design decisions. Two issues found: (1) D18 mapping bullets drop the `ToolExecutionResult` qualifier, making the metadata storage destination ambiguous vs `ToolResult`; (2) D7 summary table "Initial Impl" column truncates the decided (a+c hybrid) to "Human-only promotion", omitting the automatic-downgrade leg.
DESIGN_SPEC.md	12 decision annotation blocks added across 8 sections, plus 2 new subsections (§11.1.3 MCP Integration, §11.1.4 Action Type System). Action type references updated from informal strings to the new `category:action` taxonomy. The D7 annotation correctly captures the (a+c hybrid) including auto-downgrade — a direct improvement over the pre-PR state flagged in prior review threads. The D18 annotation inherits the same metadata-field ambiguity noted in ADR-002.
README.md	Minor clarification: "Memory Backends (M5)" renamed to "Memory Backend Adapter (M5)" with an expanded description reflecting completed memory infrastructure. No issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Action triggered] --> B{ActionTypeRegistry\nlookup}
    B -->|Unknown type| C[Reject at config\nload time]
    B -->|Known type| D[ToolPermissionChecker]
    D --> E{SecurityInterceptionStrategy\nD5: before every tool call}
    E -->|Rules fast path ~95%| F[Rule Engine\nsub-ms]
    E -->|Uncertain ~5%| G[LLM SecOps\n0.5-2s]
    F -->|Blocked| H[Deny / Escalate]
    G -->|Blocked| H
    F -->|Allowed| I{Autonomy Preset\nD6: 3-level resolution}
    G -->|Allowed| I
    I -->|auto_approve| J[Execute Tool]
    I -->|human_approval| K{RiskTierClassifier\nD19: YAML mapping}
    K -->|low_risk| L[Park context\nD20: ParkedContext\nPydantic JSON]
    K -->|medium_risk| L
    K -->|high_risk| L
    L --> M[Human Review Queue]
    M -->|Approved| N[Resume via\ntool result injection\nD21]
    M -->|Denied / Timeout| O[Deny reason\nto agent]
    N --> J
    J --> P[ToolResult]
    P -->|MCP tool| Q[MCPBridgeTool\nD18: adapter\ntext→content\nbinary→ToolExecutionResult.metadata]
    Q --> P

_{Last reviewed commit: a92b80c}

docs/decisions/ADR-002-design-decisions-batch-1.md

DESIGN_SPEC.md

gemini-code-assist

Code Review

This pull request does a great job of formalizing and documenting 23 key design decisions in a new Architecture Decision Record (ADR-002) and integrating these decisions back into the main DESIGN_SPEC.md. The structure of the ADR is clear and well-reasoned, covering options, precedents, and final choices for each decision. The changes significantly improve the clarity and completeness of the project's design documentation. I have one minor suggestion to ensure consistency between the ADR and the design spec regarding the autonomy scope resolution.

_{Note: Security Review has been skipped due to the limited scope of the PR.}

gemini-code-assist · 2026-03-09T21:50:49Z

docs/decisions/ADR-002-design-decisions-batch-1.md

+
+**Precedents:** CrewAI has 24 per-agent attributes. AutoGen has per-agent `human_input_mode`. LangGraph has per-node `interrupt_before`/`interrupt_after`. CSA Agentic Trust Framework requires per-agent identity and trust level.
+
+**Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`).


There's a small inconsistency in the resolution logic for autonomy scope. The decision text here mentions a two-level resolution (agent.autonomy_level or company.autonomy.level), but the corresponding update in DESIGN_SPEC.md and the summary table at the end of this ADR describe a more robust three-level chain that includes the department level (agent.autonomy_level or department.autonomy_level or company.autonomy.level). To ensure consistency across the design documentation, it would be best to update this section to reflect the three-level resolution chain.

Suggested change

**Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`).

**Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` and department config (default: None = use company default). Resolution: `agent.autonomy_level or department.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`).

Copilot

Pull request overview

This PR documents the final design decisions for 23 architectural items (D1–D23) covering 10 feature areas across the codebase. It creates a new ADR file (ADR-002-design-decisions-batch-1.md) containing the full decision records with options/pros/cons analysis, and backfills 12 blockquote annotations into the relevant sections of DESIGN_SPEC.md linking to the ADR.

Changes:

Create docs/decisions/ADR-002-design-decisions-batch-1.md: 544-line ADR documenting D1–D23 with options tables, precedents, and a summary table
Add 12 decision annotation blockquotes to DESIGN_SPEC.md across §6.5, §7.7, §8.1–8.4, §11.1.2, §12.2, §12.3, and §12.4
Add two new subsections to DESIGN_SPEC.md: §11.1.3 (MCP Integration) and §11.1.4 (Action Type System)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
`docs/decisions/ADR-002-design-decisions-batch-1.md`	New ADR with 23 architecture decisions, options analysis, precedents, and a summary table
`DESIGN_SPEC.md`	12 ADR annotation blockquotes added inline; 2 new subsections for MCP and action type system

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T21:51:53Z

docs/decisions/ADR-002-design-decisions-batch-1.md

+
+**Precedents:** CrewAI has 24 per-agent attributes. AutoGen has per-agent `human_input_mode`. LangGraph has per-node `interrupt_before`/`interrupt_after`. CSA Agentic Trust Framework requires per-agent identity and trust level.
+
+**Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`).


The D6 decision body at line 185 describes only a two-level resolution chain — agent.autonomy_level or company.autonomy.level — with no department level. However, both the summary table (line 527) and the DESIGN_SPEC.md annotation (line 2397) describe a three-level chain: agent.autonomy_level or department.autonomy_level or company.autonomy.level. This inconsistency within the ADR itself, and between the ADR body and the DESIGN_SPEC annotation, could lead to implementation confusion: does D6 add a per-department layer or not? The decision body should be updated to include the department-level resolution (matching the summary table, DESIGN_SPEC annotation, and issue #42).

Copilot · 2026-03-09T21:51:54Z

docs/decisions/ADR-002-design-decisions-batch-1.md

+| **(a+c hybrid) Human-only promotion + automatic downgrade (CHOSEN)** | Asymmetric trust: gaining trust is hard, losing it is easy; matches Azure Conditional Access (only restricts, never loosens) | Two code paths |
+
+**Key insight:** No real-world security system automatically grants higher privileges. Conditional access only steps UP requirements, never DOWN. The SEAgent MAC framework explicitly prevents agents from self-modifying policies.
+
+**Decision:** **(a+c hybrid)** Human-only for promotion. Automatic downgrade on: high error rate → downgrade one level, budget exhausted → supervised, security incident → locked. Recovery from auto-downgrade: human-only.


The D7 decision body (lines 198–202) marks the (a+c hybrid) as CHOSEN and explicitly states "Automatic downgrade on: high error rate → downgrade one level, budget exhausted → supervised, security incident → locked" as part of the decided implementation. However, the summary table (line 528) lists only "Human-only promotion" as the initial implementation, and the DESIGN_SPEC.md annotation (line 2398) puts the auto-downgrade rules in "Future strategies." This three-way contradiction needs to be resolved: either the initial implementation includes both human-only promotion AND automatic downgrade (as the D7 decision body states), or auto-downgrade is deferred to future (as the summary table and DESIGN_SPEC annotation indicate). This directly impacts which code gets written for the initial milestone.

Copilot · 2026-03-09T21:51:54Z

docs/decisions/ADR-002-design-decisions-batch-1.md

+
+**Precedents:** AWS IAM uses open namespaced strings (`s3:GetObject`). Kubernetes RBAC uses semi-open verbs. GitHub uses closed scopes. OPA/Rego uses open policy strings. Every production security system validates action strings against a known set.
+
+**Decision:** **(c) Enum core + validated registry.** StrEnum for built-in types (~20), plus an `ActionTypeRegistry` that accepts custom strings only if explicitly registered. Unknown strings rejected at config load time. Critical for security — a typo in `human_approval` list silently means "skip approval."


Line 35 describes the StrEnum as covering "built-in types (~20)", but the proposed taxonomy listed in lines 46–57 contains 24 distinct leaf types (5 code:* + 2 test:* + 1 docs:* + 3 vcs:* + 2 deploy:* + 2 comms:* + 2 budget:* + 3 org:* + 3 db:* + 1 arch:*). The DESIGN_SPEC.md annotation at line 2159 also says "~20 built-in action types." These estimates should be reconciled with the actual taxonomy: the count is 24, not ~20.

Copilot · 2026-03-09T21:51:54Z

DESIGN_SPEC.md

+> **Decisions ([ADR-002](docs/decisions/ADR-002-design-decisions-batch-1.md) D6, D7):**
+>
+> - **D6 — Autonomy Scope:** Three-level resolution chain: per-agent → per-department → company default. Optional `autonomy_level` on `AgentIdentity` and department config. Resolution: `agent.autonomy_level or department.autonomy_level or company.autonomy.level`. Seniority validation: Juniors/Interns cannot be set to `full`.
+> - **D7 — Autonomy Changes at Runtime:** Pluggable `AutonomyChangeStrategy` protocol. Initial: human-only promotion via REST API. No agent (including CEO) can escalate privileges. Future strategies: human-only + auto-downgrade (on high error rate → one level down, budget exhausted → supervised, security incident → locked; recovery from auto-downgrade: human-only). Precedent: no real-world security system automatically grants higher privileges.


The D7 annotation here lists the auto-downgrade rules (on high error rate → one level down, budget exhausted → supervised, security incident → locked) as "Future strategies", but the ADR-002 D7 decision body (lines 198–202) explicitly marks (a+c hybrid) (human-only promotion + automatic downgrade) as CHOSEN and describes the auto-downgrade triggers as part of the decided implementation. This inconsistency between the ADR body and the DESIGN_SPEC annotation needs to be resolved: if auto-downgrade is intended for the initial implementation, this annotation should list it as the initial impl; if it's deferred, the ADR decision body should be corrected to reflect that.

Copilot · 2026-03-09T21:51:54Z

DESIGN_SPEC.md

+>
+> Action types classify agent actions for use by autonomy presets (§12.2), SecOps validation (§12.3), tiered timeout policies (§12.4), and progressive trust (§11.3). Three sub-decisions:
+>
+> - **D1.1 — Registry:** `StrEnum` for ~20 built-in action types (type safety, autocomplete, typos caught at compile time) + `ActionTypeRegistry` for custom types via explicit registration. Unknown strings rejected at config load time. Critical for security — a typo in `human_approval` list silently means "skip approval."


Line 2159 states "~20 built-in action types" but the proposed taxonomy listed just below (and in the ADR at lines 46–57) contains 24 distinct leaf types. This count should be updated to "~25" (consistent with the taxonomy header at line 2162) or the actual number of 24.

…ewers - D6: Fix two-level → three-level autonomy resolution chain (agent → department → company) - D7: Move auto-downgrade from "Future" to "Initial" in DESIGN_SPEC annotation - D1: Reconcile ~20 → ~25 built-in action type count - D5: Add SecurityInterceptionStrategy protocol name to decision body - D8.3: Replace AgentEngine.add_agent() with company/registry service - D18: Clarify metadata lives on ToolExecutionResult, not ToolResult - D22: Extract inline annotation to blockquote; remove "tool definitions" from step 3 - Update autonomy preset + tiered timeout YAML to category:action taxonomy - Add 11.1.3/11.1.4 to ToC, ADR-002 to project structure - Add ParkedContext + AgentState to §7.6 entity table - Fix M4 status label, README memory description, markdown lint issues - Align sub-decision numbering (D1.1/D8.1 cross-refs) - Add "estimated" qualifier to D4 percentages - Explain "11 agents" count in ADR-002 header

greptile-apps · 2026-03-10T06:18:32Z

docs/decisions/ADR-002-design-decisions-batch-1.md

+| Option | Pros | Cons |
+|--------|------|------|
+| **(a) Extend ToolResult to support multi-modal** | Native support for images/resources | Cascading changes across entire codebase; LLM providers consume tool results as text anyway |
+| **(b) Adapter in MCPBridgeTool; keep ToolResult as-is (CHOSEN)** | Zero disruption; text concatenation for LLM path; rich content stored in `ToolExecutionResult.metadata` (not `ToolResult`, which has no metadata field); MCP spec requires TextContent block alongside structured content | Non-text content requires metadata extraction |
+
+**Mapping:**
+- Text blocks → concatenate into `content: str`
+- Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `metadata["attachments"]`
+- `structuredContent` → `metadata["structured_content"]`
+- `isError` → `is_error` (direct 1:1)
+- `tool_call_id` assigned by our framework, associated back after MCP response


D18 metadata destination is ambiguous — ToolResult vs ToolExecutionResult

The option (b) row (line 421) explicitly states that rich content is stored in ToolExecutionResult.metadata with the parenthetical "(not ToolResult, which has no metadata field)". However, the Mapping bullets immediately below drop that qualifier and just write metadata["attachments"] and metadata["structured_content"] — leaving it ambiguous which model's metadata is being referenced.

The DESIGN_SPEC §11.1.3 annotation (line 2155) copies the same bare metadata[...] phrasing without the qualifier.

An implementer reading only the mapping bullets (or only the spec annotation) has two plausible interpretations:

Add a metadata field to ToolResult — which directly contradicts the "keep ToolResult as-is" decision and the parenthetical in option (b).

Store on ToolExecutionResult — which is what option (b) intends, but is not stated in the actionable part of the decision.

The mapping bullets should explicitly name the owner:

Suggested change

| Option | Pros | Cons |

|--------|------|------|

| **(a) Extend ToolResult to support multi-modal** | Native support for images/resources | Cascading changes across entire codebase; LLM providers consume tool results as text anyway |

| **(b) Adapter in MCPBridgeTool; keep ToolResult as-is (CHOSEN)** | Zero disruption; text concatenation for LLM path; rich content stored in `ToolExecutionResult.metadata` (not `ToolResult`, which has no metadata field); MCP spec requires TextContent block alongside structured content | Non-text content requires metadata extraction |

**Mapping:**

- Text blocks → concatenate into `content: str`

- Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `metadata["attachments"]`

- `structuredContent` → `metadata["structured_content"]`

- `isError` → `is_error` (direct 1:1)

- `tool_call_id` assigned by our framework, associated back after MCP response

**Mapping:**

- Text blocks → concatenate into `content: str`

- Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `ToolExecutionResult.metadata["attachments"]`

- `structuredContent` → `ToolExecutionResult.metadata["structured_content"]`

- `isError` → `is_error` (direct 1:1)

- `tool_call_id` assigned by our framework, associated back after MCP response

The same fix should be applied to the D18 annotation in DESIGN_SPEC §11.1.3 (line 2155).

Prompt To Fix With AI

This is a comment left during a code review. Path: docs/decisions/ADR-002-design-decisions-batch-1.md Line: 418-428 Comment: **D18 metadata destination is ambiguous — `ToolResult` vs `ToolExecutionResult`** The option (b) row (line 421) explicitly states that rich content is stored in `ToolExecutionResult.metadata` with the parenthetical **"(not `ToolResult`, which has no metadata field)"**. However, the **Mapping** bullets immediately below drop that qualifier and just write `metadata["attachments"]` and `metadata["structured_content"]` — leaving it ambiguous which model's `metadata` is being referenced. The DESIGN_SPEC §11.1.3 annotation (line 2155) copies the same bare `metadata[...]` phrasing without the qualifier. An implementer reading only the mapping bullets (or only the spec annotation) has two plausible interpretations: 1. Add a `metadata` field to `ToolResult` — which directly contradicts the "keep `ToolResult` as-is" decision and the parenthetical in option (b). 2. Store on `ToolExecutionResult` — which is what option (b) intends, but is not stated in the actionable part of the decision. The mapping bullets should explicitly name the owner: ```suggestion **Mapping:** - Text blocks → concatenate into `content: str` - Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `ToolExecutionResult.metadata["attachments"]` - `structuredContent` → `ToolExecutionResult.metadata["structured_content"]` - `isError` → `is_error` (direct 1:1) - `tool_call_id` assigned by our framework, associated back after MCP response ``` The same fix should be applied to the D18 annotation in DESIGN_SPEC §11.1.3 (line 2155). How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-10T06:18:33Z

docs/decisions/ADR-002-design-decisions-batch-1.md

+| D4 | SecOps: LLM vs rules | Hybrid: rule engine + LLM | — | Rule fast path (~95%) + LLM slow path (~5%) | #40 |
+| D5 | SecOps: integration point | Pluggable + configurable | `SecurityInterceptionStrategy` | Before every tool invocation | #40 |
+| D6 | Autonomy: scope | Three-level chain | — | Agent → department → company default | #42 |
+| D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion | #42 |


D7 summary table "Initial Impl" omits automatic downgrade

The D7 body (line 205) decides the (a+c hybrid): human-only promotion plus automatic downgrade on specific conditions (high error rate → one level down, budget exhausted → supervised, security incident → locked). The summary table's "Initial Impl" column only records "Human-only promotion", which misrepresents the decided initial implementation by omitting the entire automatic-downgrade leg.

This is distinct from the previously flagged DESIGN_SPEC annotation issue — the DESIGN_SPEC annotation in this PR now correctly reflects the hybrid. The summary table itself is the remaining inconsistency.

Suggested change

| D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion | #42 |

| D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion + auto-downgrade (error rate/budget/security) | #42 |

Prompt To Fix With AI

This is a comment left during a code review. Path: docs/decisions/ADR-002-design-decisions-batch-1.md Line: 531 Comment: **D7 summary table "Initial Impl" omits automatic downgrade** The D7 body (line 205) decides the **(a+c hybrid)**: human-only promotion **plus** automatic downgrade on specific conditions (high error rate → one level down, budget exhausted → supervised, security incident → locked). The summary table's "Initial Impl" column only records "Human-only promotion", which misrepresents the decided initial implementation by omitting the entire automatic-downgrade leg. This is distinct from the previously flagged DESIGN_SPEC annotation issue — the DESIGN_SPEC annotation in this PR now correctly reflects the hybrid. The summary table itself is the remaining inconsistency. ```suggestion | D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion + auto-downgrade (error rate/budget/security) | #42 | ``` How can I resolve this? If you propose a fix, please make it concise.

🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 9, 2026 21:45

Copilot started reviewing on behalf of Aureliolo March 9, 2026 21:45 View session

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

greptile-apps bot reviewed Mar 9, 2026

View reviewed changes

docs/decisions/ADR-002-design-decisions-batch-1.md Outdated Show resolved Hide resolved

DESIGN_SPEC.md Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

Copilot AI reviewed Mar 9, 2026

View reviewed changes

Aureliolo merged commit 8c39742 into main Mar 10, 2026
8 of 9 checks passed

Aureliolo deleted the docs/adr-002-design-decisions branch March 10, 2026 06:14

greptile-apps bot reviewed Mar 10, 2026

View reviewed changes

coderabbitai bot mentioned this pull request Mar 10, 2026

perf: harden non-inferable principle implementation #195

Merged

10 tasks

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release ai-company 0.1.1 #282

Merged

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release 0.1.0 #283

Merged

This was referenced Mar 15, 2026

chore(main): release 0.2.4 #431

Merged

chore(main): release 0.2.0 #442

Closed

chore(main): release 0.2.5 #447

Merged

chore(main): release 0.2.0 #460

Closed

chore(main): release 0.2.0 #471

Closed


		Precedents: CrewAI has 24 per-agent attributes. AutoGen has per-agent `human_input_mode`. LangGraph has per-node `interrupt_before`/`interrupt_after`. CSA Agentic Trust Framework requires per-agent identity and trust level.

		Decision: (b) Per-agent override. Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`).


		Precedents: AWS IAM uses open namespaced strings (`s3:GetObject`). Kubernetes RBAC uses semi-open verbs. GitHub uses closed scopes. OPA/Rego uses open policy strings. Every production security system validates action strings against a known set.

		Decision: (c) Enum core + validated registry. StrEnum for built-in types (~20), plus an `ActionTypeRegistry` that accepts custom strings only if explicitly registered. Unknown strings rejected at config load time. Critical for security — a typo in `human_approval` list silently means "skip approval."

	\| D7 \| Autonomy: who changes \| Pluggable strategy \| `AutonomyChangeStrategy` \| Human-only promotion \| #42 \|
	\| D7 \| Autonomy: who changes \| Pluggable strategy \| `AutonomyChangeStrategy` \| Human-only promotion + auto-downgrade (error rate/budget/security) \| #42 \|

Conversation

Aureliolo commented Mar 9, 2026

Summary

Decisions covered

Test plan

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

greptile-apps bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

greptile-apps bot commented Mar 9, 2026 •

edited

Loading