docs: finalize 23 design decisions (ADR-002)#190
Conversation
Rename PENDING-DECISIONS.md to ADR-002-design-decisions-batch-1.md (status: DECIDED). Add decision annotations to DESIGN_SPEC.md across 8 sections with 2 new subsections (§11.1.3 MCP Integration, §11.1.4 Action Type System). All decisions follow pluggable protocol pattern with one initial implementation. Decisions cover: action types (D1), quality/collaboration scoring (D2-D3), SecOps approach and integration (D4-D5), autonomy scope and changes (D6-D7), HR instantiation/reassignment/archival (D8-D10), performance metrics windows and trends (D11-D12), promotion criteria/ approval/model-mapping (D13-D15), sandbox backend (D16), MCP SDK and result mapping (D17-D18), timeout risk tiers/serialization/resume (D19-D21), system prompt tools removal (D22), memory filter (D23). GitHub issues updated: #40, #42, #43, #45, #47, #49, #50, #53, #126, #188
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a batch of ADRs (D1–D23), new public data-plane declarations (ParkedContext, ParkedContextRepository, AgentStateRepository), and new public design subsections for MCP integration and an Action Type System; updates system-prompt/tool delivery, memory/filtering, HR/devops, and project ADR references. (≈+93/−11 lines) Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant MCP as "MCP / Orchestration"
participant Engine as "Agent Engine"
participant Store as "Persistence (ParkedContextRepo)"
Client->>MCP: Submit task / request
MCP->>Engine: Dispatch to Agent (includes action-type + tool params via API)
Engine->>Store: If needs park -> persist ParkedContext (execution_id, agent_id, task_id)
Store-->>Engine: Ack
Engine-->>MCP: Parked (park_id) / status
MCP->>Client: Return parked status
%% resume flow
Client->>MCP: Resume request (park_id)
MCP->>Store: Query ParkedContext by park_id
Store-->>MCP: ParkedContext payload
MCP->>Engine: Rehydrate execution (inject resume context)
Engine->>MCP: Execution result / tool outputs
MCP->>Client: Deliver final result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 1665-1670: Update the DESIGN_SPEC.md §7.6 persistence contract to
include the newly introduced runtime repositories and interfaces referenced by
ADR-002 D8 (e.g., add descriptions and method contracts for PersistenceBackend
extensions such as ParkedContextRepository, AgentRepository/AgentStore (for
runtime agent state and hot-plugging via AgentEngine.add_agent/remove_agent),
and any repositories for runtime message/task/cost-record linkage), specifying
expected CRUD/transaction semantics, thread-safety and ownership rules, and how
YAML bootstrap seeds interact with the operational store; ensure the new
repository names (ParkedContextRepository, PersistenceBackend extensions,
AgentRepository or equivalent) are explicitly listed in the inventory and
cross-referenced with ADR-002 and §14.1 template behavior so the spec and
decisions are consistent.
- Around line 974-975: Update the DESIGN_SPEC.md step 3 to stop stating that
build_system_prompt() is called "with agent identity, task, and available tool
definitions"; either remove "available tool definitions" from the call signature
or explicitly document that build_system_prompt() ignores tool definitions and
only accepts non-redundant behavioral guidance, per decision D22. Mention
build_system_prompt() by name and reference AgentContext.from_identity() remains
unchanged for step 4; ensure the text notes that tool definitions are supplied
via the API `tools` parameter (not the system prompt) and that any approved
deviations must be reflected in DESIGN_SPEC.md.
- Around line 2395-2400: The autonomy preset examples in the DESIGN_SPEC.md
presets still use legacy permission strings (e.g., code_changes, file_edits,
git_push); update the auto_approve and human_approval lists in those preset
blocks to use the new category:action taxonomy defined in §11.1.4 (e.g.,
code:changes, file:edit, git:push or comms:internal) or explicitly mark those
example blocks as "legacy" if you intend to keep them unchanged; change the
values in the preset definitions (auto_approve, human_approval) to the canonical
category:action form so readers copy the correct format.
- Around line 2146-2155: TOC for Section 11 is missing entries for the newly
added subsections 11.1.3 (MCP Integration) and 11.1.4 (Action Type System);
update the Table of Contents so Section 11 lists 11.1.3 and 11.1.4 with matching
titles and links, ensuring their anchor/numbering matches the headings ("11.1.3
MCP Integration" and "11.1.4 Action Type System") and adjust any downstream TOC
numbering or anchor targets so navigation remains consistent.
In `@docs/decisions/ADR-002-design-decisions-batch-1.md`:
- Around line 233-237: The review flags that lifecycle APIs
(AgentEngine.add_agent()/remove_agent()) are attached to the wrong abstraction;
remove or relocate agent registry/HR responsibilities from AgentEngine and
implement them on a dedicated company/registry/HR service (or similarly named
Service class) that owns HiringRequest persistence via PersistenceBackend and
wires templates/overrides, ensuring AgentEngine remains a per-agent task runner;
if you intentionally redefine AgentEngine's scope instead, update DESIGN_SPEC.md
and surface that deviation to reviewers/users rather than silently diverging.
- Around line 45-57: The markdown fenced code blocks under the "Proposed
taxonomy (~25 leaf types):" section (the plain list block) and the subsequent
code sample block are missing leading/trailing blank lines and a language tag,
which triggers MD031/MD040; fix both blocks by adding a blank line before and
after each fence and change the opening fence to include a language (e.g.,
```text) so the taxonomy list and the collaboration_score snippet render
consistently and satisfy markdownlint.
- Around line 177-186: The document is inconsistent: the Decision text uses
resolution "agent.autonomy_level or company.autonomy.level" but D6 and
DESIGN_SPEC.md describe a three-level chain including department overrides;
update ADR-002 to match the mandated three-level precedence (agent -> department
-> company) or explicitly justify and change D6/DESIGN_SPEC.md if you intend to
keep two-level resolution. Concretely, change the Decision and Resolution lines
to reference the three-level rule (e.g., resolve as agent.autonomy_level or
department.autonomy.level or company.autonomy.level), update the sentence
introducing the resolution, and add a cross-reference to DESIGN_SPEC.md and D6
so readers see the authoritative precedence and validation rules (e.g.,
seniority-based constraints on AgentIdentity/autonomy_level).
- Around line 193-202: The ADR currently declares "Decision: **(a+c hybrid)**
Human-only for promotion. Automatic downgrade..." but the DESIGN_SPEC.md and PR
objective expect human-only runtime changes now with auto-downgrade as future
work; pick one and make both sources agree. Either (A) change this ADR's
"Decision" paragraph, the table row for "(a+c hybrid)", and the "Key insight"
and "D7" references to state that only human-only runtime changes are
implemented now and auto-downgrade is deferred to future work, or (B) update
DESIGN_SPEC.md/PR objective to accept the (a+c hybrid) choice; ensure the text
around "Decision: **(a+c hybrid)**", the pros/cons table, and the
recovery/auto-downgrade bullets are updated consistently and add a clear note
referencing DESIGN_SPEC.md if you defer (so readers know mandatory spec must be
updated when implementation diverges).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1a4dc0e4-208a-4340-9d43-e0745848571e
📒 Files selected for processing (2)
DESIGN_SPEC.mddocs/decisions/ADR-002-design-decisions-batch-1.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality
Files:
docs/decisions/ADR-002-design-decisions-batch-1.mdDESIGN_SPEC.md
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.md : DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.md : DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality
Applied to files:
docs/decisions/ADR-002-design-decisions-batch-1.mdDESIGN_SPEC.md
🪛 LanguageTool
docs/decisions/ADR-002-design-decisions-batch-1.md
[typographical] ~25-~25: In American English, use a period after an abbreviation.
Context: ...ve trust. Sub-question 1: Fixed enum vs open/extensible registry? | Option |...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[grammar] ~148-~148: Please add a punctuation mark at the end of paragraph.
Context: ...ntial exposure, data destruction) never bypass Decision: (c) Hybrid. Rule eng...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~165-~165: Ensure spelling is correct
Context: ...ble. Even OPA sidecar evaluations are 1-5ms. Total security overhead: milliseconds ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~308-~308: Redundant conjunctions can lead to confusion; consider removing a conjunction here.
Context: ...15) ### D13: Promotion Criteria Logic (AND/OR) Unblocks: #49 | Option | Pros | ...
(AND_OR)
[style] ~316-~316: Redundant conjunctions can lead to confusion; consider removing a conjunction here.
Context: ...ll). Single ThresholdEvaluator covers AND/OR/threshold | More configuration | **Pre...
(AND_OR)
[typographical] ~375-~375: Insert a space between the numerical value and the unit symbol.
Context: ... via docker.image config - Keep under 500MB; users add Go/Rust via custom images *...
(UNIT_SPACE)
[typographical] ~458-~458: In American English, use a period after an abbreviation.
Context: ...tion needed | Sub-decision: Verbatim vs summarized conversation? - **Verbatim...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~525-~525: In American English, use a period after an abbreviation.
Context: ...ry | #47, #43, #49 | | D4 | SecOps: LLM vs rules | Hybrid: rule engine + LLM | — |...
(MISSING_PERIOD_AFTER_ABBREVIATION)
🪛 markdownlint-cli2 (0.21.0)
docs/decisions/ADR-002-design-decisions-batch-1.md
[warning] 46-46: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 46-46: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 117-117: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 117-117: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 126-126: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on documenting and integrating a batch of 23 key design decisions into the agent framework's documentation. It covers a wide range of topics, solidifying architectural patterns and providing clear guidance for future development. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Greptile SummaryThis PR finalizes 23 design decisions (ADR-002) accumulated across 10 open issues, renames the pending-decisions file to a proper ADR, and back-annotates Key changes:
Issues found:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Action triggered] --> B{ActionTypeRegistry\nlookup}
B -->|Unknown type| C[Reject at config\nload time]
B -->|Known type| D[ToolPermissionChecker]
D --> E{SecurityInterceptionStrategy\nD5: before every tool call}
E -->|Rules fast path ~95%| F[Rule Engine\nsub-ms]
E -->|Uncertain ~5%| G[LLM SecOps\n0.5-2s]
F -->|Blocked| H[Deny / Escalate]
G -->|Blocked| H
F -->|Allowed| I{Autonomy Preset\nD6: 3-level resolution}
G -->|Allowed| I
I -->|auto_approve| J[Execute Tool]
I -->|human_approval| K{RiskTierClassifier\nD19: YAML mapping}
K -->|low_risk| L[Park context\nD20: ParkedContext\nPydantic JSON]
K -->|medium_risk| L
K -->|high_risk| L
L --> M[Human Review Queue]
M -->|Approved| N[Resume via\ntool result injection\nD21]
M -->|Denied / Timeout| O[Deny reason\nto agent]
N --> J
J --> P[ToolResult]
P -->|MCP tool| Q[MCPBridgeTool\nD18: adapter\ntext→content\nbinary→ToolExecutionResult.metadata]
Q --> P
Last reviewed commit: a92b80c |
There was a problem hiding this comment.
Code Review
This pull request does a great job of formalizing and documenting 23 key design decisions in a new Architecture Decision Record (ADR-002) and integrating these decisions back into the main DESIGN_SPEC.md. The structure of the ADR is clear and well-reasoned, covering options, precedents, and final choices for each decision. The changes significantly improve the clarity and completeness of the project's design documentation. I have one minor suggestion to ensure consistency between the ADR and the design spec regarding the autonomy scope resolution.
Note: Security Review has been skipped due to the limited scope of the PR.
|
|
||
| **Precedents:** CrewAI has 24 per-agent attributes. AutoGen has per-agent `human_input_mode`. LangGraph has per-node `interrupt_before`/`interrupt_after`. CSA Agentic Trust Framework requires per-agent identity and trust level. | ||
|
|
||
| **Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`). |
There was a problem hiding this comment.
There's a small inconsistency in the resolution logic for autonomy scope. The decision text here mentions a two-level resolution (agent.autonomy_level or company.autonomy.level), but the corresponding update in DESIGN_SPEC.md and the summary table at the end of this ADR describe a more robust three-level chain that includes the department level (agent.autonomy_level or department.autonomy_level or company.autonomy.level). To ensure consistency across the design documentation, it would be best to update this section to reflect the three-level resolution chain.
| **Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`). | |
| **Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` and department config (default: None = use company default). Resolution: `agent.autonomy_level or department.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`). |
There was a problem hiding this comment.
Pull request overview
This PR documents the final design decisions for 23 architectural items (D1–D23) covering 10 feature areas across the codebase. It creates a new ADR file (ADR-002-design-decisions-batch-1.md) containing the full decision records with options/pros/cons analysis, and backfills 12 blockquote annotations into the relevant sections of DESIGN_SPEC.md linking to the ADR.
Changes:
- Create
docs/decisions/ADR-002-design-decisions-batch-1.md: 544-line ADR documenting D1–D23 with options tables, precedents, and a summary table - Add 12 decision annotation blockquotes to
DESIGN_SPEC.mdacross §6.5, §7.7, §8.1–8.4, §11.1.2, §12.2, §12.3, and §12.4 - Add two new subsections to
DESIGN_SPEC.md: §11.1.3 (MCP Integration) and §11.1.4 (Action Type System)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
docs/decisions/ADR-002-design-decisions-batch-1.md |
New ADR with 23 architecture decisions, options analysis, precedents, and a summary table |
DESIGN_SPEC.md |
12 ADR annotation blockquotes added inline; 2 new subsections for MCP and action type system |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| **Precedents:** CrewAI has 24 per-agent attributes. AutoGen has per-agent `human_input_mode`. LangGraph has per-node `interrupt_before`/`interrupt_after`. CSA Agentic Trust Framework requires per-agent identity and trust level. | ||
|
|
||
| **Decision:** **(b) Per-agent override.** Optional `autonomy_level` on `AgentIdentity` (default: None = use company default). Resolution: `agent.autonomy_level or company.autonomy.level`. Add seniority-based validation (Juniors/Interns cannot be set to `full`). |
There was a problem hiding this comment.
The D6 decision body at line 185 describes only a two-level resolution chain — agent.autonomy_level or company.autonomy.level — with no department level. However, both the summary table (line 527) and the DESIGN_SPEC.md annotation (line 2397) describe a three-level chain: agent.autonomy_level or department.autonomy_level or company.autonomy.level. This inconsistency within the ADR itself, and between the ADR body and the DESIGN_SPEC annotation, could lead to implementation confusion: does D6 add a per-department layer or not? The decision body should be updated to include the department-level resolution (matching the summary table, DESIGN_SPEC annotation, and issue #42).
| | **(a+c hybrid) Human-only promotion + automatic downgrade (CHOSEN)** | Asymmetric trust: gaining trust is hard, losing it is easy; matches Azure Conditional Access (only restricts, never loosens) | Two code paths | | ||
|
|
||
| **Key insight:** No real-world security system automatically grants higher privileges. Conditional access only steps UP requirements, never DOWN. The SEAgent MAC framework explicitly prevents agents from self-modifying policies. | ||
|
|
||
| **Decision:** **(a+c hybrid)** Human-only for promotion. Automatic downgrade on: high error rate → downgrade one level, budget exhausted → supervised, security incident → locked. Recovery from auto-downgrade: human-only. |
There was a problem hiding this comment.
The D7 decision body (lines 198–202) marks the (a+c hybrid) as CHOSEN and explicitly states "Automatic downgrade on: high error rate → downgrade one level, budget exhausted → supervised, security incident → locked" as part of the decided implementation. However, the summary table (line 528) lists only "Human-only promotion" as the initial implementation, and the DESIGN_SPEC.md annotation (line 2398) puts the auto-downgrade rules in "Future strategies." This three-way contradiction needs to be resolved: either the initial implementation includes both human-only promotion AND automatic downgrade (as the D7 decision body states), or auto-downgrade is deferred to future (as the summary table and DESIGN_SPEC annotation indicate). This directly impacts which code gets written for the initial milestone.
|
|
||
| **Precedents:** AWS IAM uses open namespaced strings (`s3:GetObject`). Kubernetes RBAC uses semi-open verbs. GitHub uses closed scopes. OPA/Rego uses open policy strings. Every production security system validates action strings against a known set. | ||
|
|
||
| **Decision:** **(c) Enum core + validated registry.** StrEnum for built-in types (~20), plus an `ActionTypeRegistry` that accepts custom strings only if explicitly registered. Unknown strings rejected at config load time. Critical for security — a typo in `human_approval` list silently means "skip approval." |
There was a problem hiding this comment.
Line 35 describes the StrEnum as covering "built-in types (~20)", but the proposed taxonomy listed in lines 46–57 contains 24 distinct leaf types (5 code:* + 2 test:* + 1 docs:* + 3 vcs:* + 2 deploy:* + 2 comms:* + 2 budget:* + 3 org:* + 3 db:* + 1 arch:*). The DESIGN_SPEC.md annotation at line 2159 also says "~20 built-in action types." These estimates should be reconciled with the actual taxonomy: the count is 24, not ~20.
DESIGN_SPEC.md
Outdated
| > **Decisions ([ADR-002](docs/decisions/ADR-002-design-decisions-batch-1.md) D6, D7):** | ||
| > | ||
| > - **D6 — Autonomy Scope:** Three-level resolution chain: per-agent → per-department → company default. Optional `autonomy_level` on `AgentIdentity` and department config. Resolution: `agent.autonomy_level or department.autonomy_level or company.autonomy.level`. Seniority validation: Juniors/Interns cannot be set to `full`. | ||
| > - **D7 — Autonomy Changes at Runtime:** Pluggable `AutonomyChangeStrategy` protocol. Initial: human-only promotion via REST API. No agent (including CEO) can escalate privileges. Future strategies: human-only + auto-downgrade (on high error rate → one level down, budget exhausted → supervised, security incident → locked; recovery from auto-downgrade: human-only). Precedent: no real-world security system automatically grants higher privileges. |
There was a problem hiding this comment.
The D7 annotation here lists the auto-downgrade rules (on high error rate → one level down, budget exhausted → supervised, security incident → locked) as "Future strategies", but the ADR-002 D7 decision body (lines 198–202) explicitly marks (a+c hybrid) (human-only promotion + automatic downgrade) as CHOSEN and describes the auto-downgrade triggers as part of the decided implementation. This inconsistency between the ADR body and the DESIGN_SPEC annotation needs to be resolved: if auto-downgrade is intended for the initial implementation, this annotation should list it as the initial impl; if it's deferred, the ADR decision body should be corrected to reflect that.
DESIGN_SPEC.md
Outdated
| > | ||
| > Action types classify agent actions for use by autonomy presets (§12.2), SecOps validation (§12.3), tiered timeout policies (§12.4), and progressive trust (§11.3). Three sub-decisions: | ||
| > | ||
| > - **D1.1 — Registry:** `StrEnum` for ~20 built-in action types (type safety, autocomplete, typos caught at compile time) + `ActionTypeRegistry` for custom types via explicit registration. Unknown strings rejected at config load time. Critical for security — a typo in `human_approval` list silently means "skip approval." |
There was a problem hiding this comment.
Line 2159 states "~20 built-in action types" but the proposed taxonomy listed just below (and in the ADR at lines 46–57) contains 24 distinct leaf types. This count should be updated to "~25" (consistent with the taxonomy header at line 2162) or the actual number of 24.
…ewers - D6: Fix two-level → three-level autonomy resolution chain (agent → department → company) - D7: Move auto-downgrade from "Future" to "Initial" in DESIGN_SPEC annotation - D1: Reconcile ~20 → ~25 built-in action type count - D5: Add SecurityInterceptionStrategy protocol name to decision body - D8.3: Replace AgentEngine.add_agent() with company/registry service - D18: Clarify metadata lives on ToolExecutionResult, not ToolResult - D22: Extract inline annotation to blockquote; remove "tool definitions" from step 3 - Update autonomy preset + tiered timeout YAML to category:action taxonomy - Add 11.1.3/11.1.4 to ToC, ADR-002 to project structure - Add ParkedContext + AgentState to §7.6 entity table - Fix M4 status label, README memory description, markdown lint issues - Align sub-decision numbering (D1.1/D8.1 cross-refs) - Add "estimated" qualifier to D4 percentages - Explain "11 agents" count in ADR-002 header
| | Option | Pros | Cons | | ||
| |--------|------|------| | ||
| | **(a) Extend ToolResult to support multi-modal** | Native support for images/resources | Cascading changes across entire codebase; LLM providers consume tool results as text anyway | | ||
| | **(b) Adapter in MCPBridgeTool; keep ToolResult as-is (CHOSEN)** | Zero disruption; text concatenation for LLM path; rich content stored in `ToolExecutionResult.metadata` (not `ToolResult`, which has no metadata field); MCP spec requires TextContent block alongside structured content | Non-text content requires metadata extraction | | ||
|
|
||
| **Mapping:** | ||
| - Text blocks → concatenate into `content: str` | ||
| - Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `metadata["attachments"]` | ||
| - `structuredContent` → `metadata["structured_content"]` | ||
| - `isError` → `is_error` (direct 1:1) | ||
| - `tool_call_id` assigned by our framework, associated back after MCP response |
There was a problem hiding this comment.
D18 metadata destination is ambiguous — ToolResult vs ToolExecutionResult
The option (b) row (line 421) explicitly states that rich content is stored in ToolExecutionResult.metadata with the parenthetical "(not ToolResult, which has no metadata field)". However, the Mapping bullets immediately below drop that qualifier and just write metadata["attachments"] and metadata["structured_content"] — leaving it ambiguous which model's metadata is being referenced.
The DESIGN_SPEC §11.1.3 annotation (line 2155) copies the same bare metadata[...] phrasing without the qualifier.
An implementer reading only the mapping bullets (or only the spec annotation) has two plausible interpretations:
- Add a
metadatafield toToolResult— which directly contradicts the "keepToolResultas-is" decision and the parenthetical in option (b). - Store on
ToolExecutionResult— which is what option (b) intends, but is not stated in the actionable part of the decision.
The mapping bullets should explicitly name the owner:
| | Option | Pros | Cons | | |
| |--------|------|------| | |
| | **(a) Extend ToolResult to support multi-modal** | Native support for images/resources | Cascading changes across entire codebase; LLM providers consume tool results as text anyway | | |
| | **(b) Adapter in MCPBridgeTool; keep ToolResult as-is (CHOSEN)** | Zero disruption; text concatenation for LLM path; rich content stored in `ToolExecutionResult.metadata` (not `ToolResult`, which has no metadata field); MCP spec requires TextContent block alongside structured content | Non-text content requires metadata extraction | | |
| **Mapping:** | |
| - Text blocks → concatenate into `content: str` | |
| - Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `metadata["attachments"]` | |
| - `structuredContent` → `metadata["structured_content"]` | |
| - `isError` → `is_error` (direct 1:1) | |
| - `tool_call_id` assigned by our framework, associated back after MCP response | |
| **Mapping:** | |
| - Text blocks → concatenate into `content: str` | |
| - Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `ToolExecutionResult.metadata["attachments"]` | |
| - `structuredContent` → `ToolExecutionResult.metadata["structured_content"]` | |
| - `isError` → `is_error` (direct 1:1) | |
| - `tool_call_id` assigned by our framework, associated back after MCP response |
The same fix should be applied to the D18 annotation in DESIGN_SPEC §11.1.3 (line 2155).
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/decisions/ADR-002-design-decisions-batch-1.md
Line: 418-428
Comment:
**D18 metadata destination is ambiguous — `ToolResult` vs `ToolExecutionResult`**
The option (b) row (line 421) explicitly states that rich content is stored in `ToolExecutionResult.metadata` with the parenthetical **"(not `ToolResult`, which has no metadata field)"**. However, the **Mapping** bullets immediately below drop that qualifier and just write `metadata["attachments"]` and `metadata["structured_content"]` — leaving it ambiguous which model's `metadata` is being referenced.
The DESIGN_SPEC §11.1.3 annotation (line 2155) copies the same bare `metadata[...]` phrasing without the qualifier.
An implementer reading only the mapping bullets (or only the spec annotation) has two plausible interpretations:
1. Add a `metadata` field to `ToolResult` — which directly contradicts the "keep `ToolResult` as-is" decision and the parenthetical in option (b).
2. Store on `ToolExecutionResult` — which is what option (b) intends, but is not stated in the actionable part of the decision.
The mapping bullets should explicitly name the owner:
```suggestion
**Mapping:**
- Text blocks → concatenate into `content: str`
- Image/audio → `[image: {mimeType}]` placeholder in content; base64 in `ToolExecutionResult.metadata["attachments"]`
- `structuredContent` → `ToolExecutionResult.metadata["structured_content"]`
- `isError` → `is_error` (direct 1:1)
- `tool_call_id` assigned by our framework, associated back after MCP response
```
The same fix should be applied to the D18 annotation in DESIGN_SPEC §11.1.3 (line 2155).
How can I resolve this? If you propose a fix, please make it concise.| | D4 | SecOps: LLM vs rules | Hybrid: rule engine + LLM | — | Rule fast path (~95%) + LLM slow path (~5%) | #40 | | ||
| | D5 | SecOps: integration point | Pluggable + configurable | `SecurityInterceptionStrategy` | Before every tool invocation | #40 | | ||
| | D6 | Autonomy: scope | Three-level chain | — | Agent → department → company default | #42 | | ||
| | D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion | #42 | |
There was a problem hiding this comment.
D7 summary table "Initial Impl" omits automatic downgrade
The D7 body (line 205) decides the (a+c hybrid): human-only promotion plus automatic downgrade on specific conditions (high error rate → one level down, budget exhausted → supervised, security incident → locked). The summary table's "Initial Impl" column only records "Human-only promotion", which misrepresents the decided initial implementation by omitting the entire automatic-downgrade leg.
This is distinct from the previously flagged DESIGN_SPEC annotation issue — the DESIGN_SPEC annotation in this PR now correctly reflects the hybrid. The summary table itself is the remaining inconsistency.
| | D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion | #42 | | |
| | D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion + auto-downgrade (error rate/budget/security) | #42 | |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/decisions/ADR-002-design-decisions-batch-1.md
Line: 531
Comment:
**D7 summary table "Initial Impl" omits automatic downgrade**
The D7 body (line 205) decides the **(a+c hybrid)**: human-only promotion **plus** automatic downgrade on specific conditions (high error rate → one level down, budget exhausted → supervised, security incident → locked). The summary table's "Initial Impl" column only records "Human-only promotion", which misrepresents the decided initial implementation by omitting the entire automatic-downgrade leg.
This is distinct from the previously flagged DESIGN_SPEC annotation issue — the DESIGN_SPEC annotation in this PR now correctly reflects the hybrid. The summary table itself is the remaining inconsistency.
```suggestion
| D7 | Autonomy: who changes | Pluggable strategy | `AutonomyChangeStrategy` | Human-only promotion + auto-downgrade (error rate/budget/security) | #42 |
```
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
PENDING-DECISIONS.md→ADR-002-design-decisions-batch-1.md(status: DECIDED)DESIGN_SPEC.mdacross 8 sectionsDecisions covered
GitHub issue bodies for all 10 issues (#40, #42, #43, #45, #47, #49, #50, #53, #126, #188) have been updated with their respective decisions.
Test plan
🤖 Generated with Claude Code