docs: add agent scaling research findings to DESIGN_SPEC#145
Conversation
Integrate findings from Kim et al. "Towards a Science of Scaling Agent Systems" (arXiv:2512.08296) — 180-experiment study across 3 LLM families and 4 benchmarks. Changes: - §6.2: Add `task_structure` field (sequential/parallel/mixed) - §6.9: New section — Task Decomposability & Coordination Topology - Task structure classification with MAS effect data - Per-task coordination topology selection (M4+) - Auto topology selector concept with config schema - §10.5: Add M4 Coordination Metrics Suite (5 empirically-validated metrics: Ec, Ae, O%, message density, redundancy) — configurable and opt-in - §10.5: Tiered orchestration_ratio alerts (info/warn/critical) replacing single threshold - §10.5: Add M4/M5 Coordination Error Taxonomy (4 categories, configurable and opt-in for data gathering) - §16.3: New section — Agent Scaling Research with key findings and how they inform our design Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds M4/M5 multi-agent coordination content to the design spec: task decomposability (task_structure), coordination topologies, an Auto Topology Selector, coordination metrics and taxonomy, and related research citations and YAML examples — all changes confined to DESIGN_SPEC.md. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Selector as Topology Selector
participant Orchestrator as Orchestrator
participant Agents as Agent Pool
participant Telemetry as Analytics/Telemetry
Client->>Selector: submit Task (includes `task_structure`)
Selector->>Selector: evaluate task_structure + policies
Selector->>Orchestrator: chosen topology (sequential/parallel/mixed)
Orchestrator->>Agents: dispatch sub-tasks per topology
Agents->>Orchestrator: sub-task results
Orchestrator->>Telemetry: emit coordination metrics (Ec, O%, Ae, c, R)
Telemetry->>Selector: provide analytics for auto-topology feedback
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request adds several new sections to the design specification based on agent scaling research. The changes introduce concepts like task decomposability, coordination metrics, and an error taxonomy, all of which are well-documented and cross-referenced. My review focuses on ensuring the clarity and consistency of these new documentation sections. I've suggested a minor refactoring to merge two separate configuration blocks into one for better readability.
Note: Security Review has been skipped due to the limited scope of the PR.
| #### M4/M5: Coordination Error Taxonomy | ||
|
|
||
| When coordination metrics collection is enabled, the system can optionally classify coordination errors into structured categories. This enables targeted diagnosis — e.g., if coordination failures spike, the topology may be too complex; if context omissions spike, the orchestrator's synthesis is insufficient. | ||
|
|
||
| | Error Category | Description | Detection Method | | ||
| |---------------|-------------|-----------------| | ||
| | **Logical contradiction** | Agent asserts both "X is true" and "X is false", or derives conclusions violating its stated premises | Semantic contradiction detection on agent outputs | | ||
| | **Numerical drift** | Accumulated computational errors from cascading rounding or unit conversion (>5% deviation) | Numerical comparison against ground truth or cross-agent verification | | ||
| | **Context omission** | Failure to reference previously established entities, relationships, or state required for current reasoning | Missing-reference detection across agent conversation history | | ||
| | **Coordination failure** | MAS-specific: message misinterpretation, task allocation conflicts, state synchronization errors between agents | Protocol-level error detection in orchestration layer | | ||
|
|
||
| > **Configurable and opt-in:** Error taxonomy classification requires semantic analysis of agent outputs and is expensive. Enable via `coordination_metrics.error_taxonomy: true` only when actively gathering data for system tuning. The classification pipeline runs post-execution (never blocks agent work) and logs structured events to the observability layer. | ||
|
|
||
| ```yaml | ||
| coordination_metrics: | ||
| error_taxonomy: | ||
| enabled: false # opt-in — enable for targeted diagnosis | ||
| categories: | ||
| - logical_contradiction | ||
| - numerical_drift | ||
| - context_omission | ||
| - coordination_failure | ||
| ``` | ||
|
|
||
| > **Reference:** Error categories derived from [Kim et al., 2025](https://arxiv.org/abs/2512.08296) and the Multi-Agent System Failure Taxonomy (MAST) by Cemri et al. (2025). Architecture-specific patterns: centralized coordination reduces logical contradictions by 36.4% and context omissions by 66.8% via orchestrator synthesis; hybrid topology introduces 12.4% coordination failures due to protocol complexity. |
There was a problem hiding this comment.
The coordination_metrics configuration is defined in two separate YAML blocks, which can be confusing. To improve clarity and represent it as a single configuration object, I suggest merging the error_taxonomy configuration into the main coordination_metrics block from the 'M4: Coordination Metrics Suite' section and removing the redundant YAML block from this section.
The combined block would look like this:
coordination_metrics:
enabled: false
collect:
- efficiency
- overhead
- error_amplification
- message_density
- redundancy
baseline_window: 50
error_taxonomy:
enabled: false
categories:
- logical_contradiction
- numerical_drift
- context_omission
- coordination_failure| #### M4/M5: Coordination Error Taxonomy | |
| When coordination metrics collection is enabled, the system can optionally classify coordination errors into structured categories. This enables targeted diagnosis — e.g., if coordination failures spike, the topology may be too complex; if context omissions spike, the orchestrator's synthesis is insufficient. | |
| | Error Category | Description | Detection Method | | |
| |---------------|-------------|-----------------| | |
| | **Logical contradiction** | Agent asserts both "X is true" and "X is false", or derives conclusions violating its stated premises | Semantic contradiction detection on agent outputs | | |
| | **Numerical drift** | Accumulated computational errors from cascading rounding or unit conversion (>5% deviation) | Numerical comparison against ground truth or cross-agent verification | | |
| | **Context omission** | Failure to reference previously established entities, relationships, or state required for current reasoning | Missing-reference detection across agent conversation history | | |
| | **Coordination failure** | MAS-specific: message misinterpretation, task allocation conflicts, state synchronization errors between agents | Protocol-level error detection in orchestration layer | | |
| > **Configurable and opt-in:** Error taxonomy classification requires semantic analysis of agent outputs and is expensive. Enable via `coordination_metrics.error_taxonomy: true` only when actively gathering data for system tuning. The classification pipeline runs post-execution (never blocks agent work) and logs structured events to the observability layer. | |
| ```yaml | |
| coordination_metrics: | |
| error_taxonomy: | |
| enabled: false # opt-in — enable for targeted diagnosis | |
| categories: | |
| - logical_contradiction | |
| - numerical_drift | |
| - context_omission | |
| - coordination_failure | |
| ``` | |
| > **Reference:** Error categories derived from [Kim et al., 2025](https://arxiv.org/abs/2512.08296) and the Multi-Agent System Failure Taxonomy (MAST) by Cemri et al. (2025). Architecture-specific patterns: centralized coordination reduces logical contradictions by 36.4% and context omissions by 66.8% via orchestrator synthesis; hybrid topology introduces 12.4% coordination failures due to protocol complexity. | |
| #### M4/M5: Coordination Error Taxonomy | |
| When coordination metrics collection is enabled, the system can optionally classify coordination errors into structured categories. This enables targeted diagnosis — e.g., if coordination failures spike, the topology may be too complex; if context omissions spike, the orchestrator's synthesis is insufficient. | |
| | Error Category | Description | Detection Method | | |
| |---------------|-------------|-----------------| | |
| | **Logical contradiction** | Agent asserts both "X is true" and "X is false", or derives conclusions violating its stated premises | Semantic contradiction detection on agent outputs | | |
| | **Numerical drift** | Accumulated computational errors from cascading rounding or unit conversion (>5% deviation) | Numerical comparison against ground truth or cross-agent verification | | |
| | **Context omission** | Failure to reference previously established entities, relationships, or state required for current reasoning | Missing-reference detection across agent conversation history | | |
| | **Coordination failure** | MAS-specific: message misinterpretation, task allocation conflicts, state synchronization errors between agents | Protocol-level error detection in orchestration layer | | |
| > **Configurable and opt-in:** Error taxonomy classification requires semantic analysis of agent outputs and is expensive. Enable via `coordination_metrics.error_taxonomy.enabled: true` only when actively gathering data for system tuning. The classification pipeline runs post-execution (never blocks agent work) and logs structured events to the observability layer. This configuration is part of the main `coordination_metrics` block defined in the 'M4: Coordination Metrics Suite' section. | |
| > **Reference:** Error categories derived from [Kim et al., 2025](https://arxiv.org/abs/2512.08296) and the Multi-Agent System Failure Taxonomy (MAST) by Cemri et al. (2025). Architecture-specific patterns: centralized coordination reduces logical contradictions by 36.4% and context omissions by 66.8% via orchestrator synthesis; hybrid topology introduces 12.4% coordination failures due to protocol complexity. |
There was a problem hiding this comment.
Pull request overview
Adds forward-looking (M4+) design spec content on multi-agent task decomposability, per-task coordination topology selection, and coordination analytics/error taxonomy, grounded in the cited Kim et al. (2025) scaling research.
Changes:
- Introduces §6.9 with
task_structureclassification and an auto topology-selection interface. - Extends §10.5 with a coordination metrics suite, tiered orchestration-ratio alerts, and an opt-in coordination error taxonomy.
- Adds §16.3 “Agent Scaling Research” and renumbers the prior “Build vs Fork Decision” section to §16.4.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
DESIGN_SPEC.md
Outdated
| 4. [Company Structure](#4-company-structure) | ||
| 5. [Communication Architecture](#5-communication-architecture) — 5.6 Conflict Resolution, 5.7 Meeting Protocol | ||
| 6. [Task & Workflow Engine](#6-task--workflow-engine) — 6.5 Execution Loop, 6.6 Crash Recovery, **6.7 Graceful Shutdown**, **6.8 Workspace Isolation** | ||
| 6. [Task & Workflow Engine](#6-task--workflow-engine) — 6.5 Execution Loop, 6.6 Crash Recovery, **6.7 Graceful Shutdown**, **6.8 Workspace Isolation**, **6.9 Task Decomposability & Topology** |
There was a problem hiding this comment.
TOC entry calls the new subsection “6.9 Task Decomposability & Topology”, but the actual heading is “6.9 Task Decomposability & Coordination Topology”. Consider making the TOC wording match the section title to avoid confusion when scanning/searching the doc.
| 6. [Task & Workflow Engine](#6-task--workflow-engine) — 6.5 Execution Loop, 6.6 Crash Recovery, **6.7 Graceful Shutdown**, **6.8 Workspace Isolation**, **6.9 Task Decomposability & Topology** | |
| 6. [Task & Workflow Engine](#6-task--workflow-engine) — 6.5 Execution Loop, 6.6 Crash Recovery, **6.7 Graceful Shutdown**, **6.8 Workspace Isolation**, **6.9 Task Decomposability & Coordination Topology** |
DESIGN_SPEC.md
Outdated
| topology: "auto" # auto, sas, centralized, decentralized | ||
| auto_topology_rules: | ||
| # sequential tasks → always single-agent | ||
| sequential_override: "sas" | ||
| # parallel tasks → select based on domain structure | ||
| parallel_default: "centralized" | ||
| # mixed tasks → hybrid approach (SAS backbone + delegation) | ||
| mixed_default: "context_dependent" |
There was a problem hiding this comment.
In the auto-topology YAML example, mixed_default: "context_dependent" introduces a value that’s not listed in the allowed topology set (auto, sas, centralized, decentralized) and isn’t defined elsewhere in the spec. Either define context_dependent as a valid topology value (and document its semantics) or change the example to use an existing value while describing the hybrid behavior in prose.
| > **Configurable collection:** All 5 metrics are opt-in via `coordination_metrics.enabled` in analytics config. `Ec` and `O%` are cheap (turn counting). `Ae` requires baseline comparison data. `c` and `R` require semantic analysis of agent outputs (embedding computation). Enable selectively based on data-gathering needs. | ||
|
|
||
| ```yaml | ||
| coordination_metrics: | ||
| enabled: false # opt-in — enable for data gathering | ||
| collect: | ||
| - efficiency # cheap — turn counting | ||
| - overhead # cheap — turn counting | ||
| - error_amplification # requires SAS baseline data | ||
| - message_density # requires message counting infrastructure | ||
| - redundancy # requires embedding computation on outputs | ||
| baseline_window: 50 # number of SAS runs to establish baseline for Ae | ||
| ``` |
There was a problem hiding this comment.
The text says coordination metrics are enabled via coordination_metrics.enabled “in analytics config”, but the surrounding config schema in this section is call_analytics: and there’s no other analytics block in the document. Consider nesting this under the existing call_analytics config (or explicitly stating the full config path) so readers know where it belongs.
DESIGN_SPEC.md
Outdated
| > **Configurable and opt-in:** Error taxonomy classification requires semantic analysis of agent outputs and is expensive. Enable via `coordination_metrics.error_taxonomy: true` only when actively gathering data for system tuning. The classification pipeline runs post-execution (never blocks agent work) and logs structured events to the observability layer. | ||
|
|
||
| ```yaml | ||
| coordination_metrics: | ||
| error_taxonomy: | ||
| enabled: false # opt-in — enable for targeted diagnosis | ||
| categories: |
There was a problem hiding this comment.
The enablement key for error taxonomy is inconsistent: prose says to enable via coordination_metrics.error_taxonomy: true, but the YAML example uses coordination_metrics.error_taxonomy.enabled: false. Please align the documented config shape (either a boolean at error_taxonomy or an enabled field) so it’s unambiguous.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 2475-2476: The evidence about “Centralized verification” reducing
error amplification should be reclassified to support topology selection
(referencing §6.9 and the Ae metric in §10.5) rather than authority-based
conflict resolution (referencing §5.6); update the sentence that ties lower
error amplification to §5.6 so it instead cites §6.9 and §10.5 (Ae), and add a
short clarifying clause that authority/dissent strategies in §5.6 remain
distinct from the topology guidance.
- Around line 1607-1618: The doc uses two different schema forms for enabling
the error taxonomy — a boolean flag `coordination_metrics.error_taxonomy: true`
in the prose and a nested object with
`coordination_metrics.error_taxonomy.enabled` in the YAML; pick one canonical
config path and make both prose and YAML consistent. Either change the prose to
reference `coordination_metrics.error_taxonomy.enabled: true` to match the YAML,
or flatten the YAML to `coordination_metrics.error_taxonomy: true` (removing the
nested `enabled` key) and adjust the example categories accordingly; update all
occurrences of `coordination_metrics.error_taxonomy` and
`coordination_metrics.error_taxonomy.enabled` so they match the chosen schema.
- Around line 2471-2473: The bullet conflates "coordination overhead" (defined
as O% vs SAS baseline in §10.5) with the existing tiered alerts that are only
for orchestration_ratio, so update the text to clearly distinguish the two:
state that the "Tiered coordination overhead" recommendation refers to
coordination overhead (O%) metrics and not the orchestration_ratio alerting
scheme, and either add a separate set of tier thresholds for orchestration_ratio
or explicitly note that orchestration_ratio alerts remain unchanged; reference
the terms "coordination overhead (O%)", "orchestration_ratio", and "§10.5" in
the revised sentence to make the distinction unambiguous.
- Around line 1133-1143: The example uses mixed_default: "context_dependent"
which is not a member of the documented topology enum (topology: "auto", "sas",
"centralized", "decentralized"), so update the spec to keep the public config
consistent: either add "context_dependent" to the topology enum or change
mixed_default to one of the existing enum values; edit the coordination block
and the topology enum declaration so they match (referencing coordination,
topology, auto_topology_rules, and mixed_default) and run schema/validation to
ensure no other docs or examples use the old value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d077441e-a54c-4d21-b231-dc0f9811271b
📒 Files selected for processing (1)
DESIGN_SPEC.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T21:51:55.175Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue; the design spec is the starting point for architecture, data models, and behavior
Greptile SummaryThis docs-only PR enriches Issues identified:
Confidence Score: 4/5
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Task Submitted] --> B{task_structure field set?}
B -- Yes, explicit --> C{task_structure value}
B -- No --> D[Infer from task properties\ntool count, dependency graph,\nacceptance criteria]
D --> C
C -- sequential --> E[Sequential Override:\nForce SAS topology\nCoordination overhead\nfragments reasoning]
C -- parallel --> F{Domain structure?}
C -- mixed --> G[Context-Dependent:\nSAS for sequential backbone\nDelegate parallel sub-tasks]
F -- structured domain --> H[Centralized Topology\nOrchestrator decomposes\n→ sub-agents execute\n→ orchestrator synthesizes\nAe ≈ 4.4×]
F -- exploratory / open-ended --> I[Decentralized Topology\nPeer debate for\nhigh-entropy search spaces]
E --> J[Execute Task]
H --> J
I --> J
G --> J
J --> K{coordination_metrics\n.enabled?}
K -- Yes --> L[Collect Ec, O%, Ae, c, R\npost-execution]
K -- No --> M[Task Complete]
L --> N{error_taxonomy\n.enabled?}
N -- Yes --> O[Classify errors:\nlogical_contradiction\nnumerical_drift\ncontext_omission\ncoordination_failure]
N -- No --> M
O --> M
Last reviewed commit: 366342d |
| ### 16.3 Build vs Fork Decision | ||
| ### 16.3 Agent Scaling Research | ||
|
|
||
| [Kim et al., "Towards a Science of Scaling Agent Systems" (2025)](https://arxiv.org/abs/2512.08296) — 180 controlled experiments across 3 LLM families (OpenAI, Google, Anthropic), 4 agentic benchmarks, 5 coordination topologies. Key findings informing our design: |
There was a problem hiding this comment.
Vendor names in research citation
CLAUDE.md enforces a vendor-agnostic rule: vendor names (Anthropic, OpenAI, Google, etc.) may only appear in the DESIGN_SPEC.md provider list (§9). This line is in the Research & Prior Art section — not the provider list — so it technically violates the rule.
Since this is a direct citation of a paper's experimental setup, consider either:
- Redacting the specific vendor names to a neutral description, e.g.
"3 LLM families (frontier commercial providers)", or - Treating this as an explicit, documented exception given the academic citation context.
| [Kim et al., "Towards a Science of Scaling Agent Systems" (2025)](https://arxiv.org/abs/2512.08296) — 180 controlled experiments across 3 LLM families (OpenAI, Google, Anthropic), 4 agentic benchmarks, 5 coordination topologies. Key findings informing our design: | |
| [Kim et al., "Towards a Science of Scaling Agent Systems" (2025)](https://arxiv.org/abs/2512.08296) — 180 controlled experiments across 3 LLM families (frontier commercial providers), 4 agentic benchmarks, 5 coordination topologies. Key findings informing our design: |
Rule Used: CLAUDE.md (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 2468
Comment:
**Vendor names in research citation**
`CLAUDE.md` enforces a vendor-agnostic rule: vendor names (`Anthropic`, `OpenAI`, `Google`, etc.) may only appear in the DESIGN_SPEC.md provider list (§9). This line is in the Research & Prior Art section — not the provider list — so it technically violates the rule.
Since this is a direct citation of a paper's experimental setup, consider either:
- Redacting the specific vendor names to a neutral description, e.g. `"3 LLM families (frontier commercial providers)"`, or
- Treating this as an explicit, documented exception given the academic citation context.
```suggestion
[Kim et al., "Towards a Science of Scaling Agent Systems" (2025)](https://arxiv.org/abs/2512.08296) — 180 controlled experiments across 3 LLM families (frontier commercial providers), 4 agentic benchmarks, 5 coordination topologies. Key findings informing our design:
```
**Rule Used:** CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=6816cd03-d0e1-4fd0-9d04-2417487a584c))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
…, Gemini, Greptile - Add Kim et al. and Cemri et al. to Appendix B bibliography - Add context_dependent to topology enum, clarify as hybrid per-phase strategy - Fix error taxonomy enablement key: prose now matches YAML (.enabled) - Merge two coordination_metrics YAML blocks into single combined block - Clarify task_structure field is M4+ (future tense, not present) - Fix TOC: match heading "Coordination Topology", add §16 subsections - Distinguish coordination overhead (O%) from orchestration_ratio alerts - Update centralized verification reference: §6.9/§10.5 (not §5.6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| - [Microsoft Multi-Agent Reference Architecture](https://microsoft.github.io/multi-agent-reference-architecture/) - Enterprise patterns | ||
| - [OpenRouter](https://openrouter.ai/) - Multi-model API gateway | ||
| - [Kim et al., "Towards a Science of Scaling Agent Systems" (2025)](https://arxiv.org/abs/2512.08296) - Empirical agent scaling research (180 experiments, 3 LLM families) | ||
| - [Cemri et al., "Multi-Agent System Failure Taxonomy (MAST)" (2025)] - MAS coordination error classification |
There was a problem hiding this comment.
Malformed markdown link missing URL
The Cemri et al. entry is not a valid markdown hyperlink — [text] without a following (url) renders as literal square-bracketed text, not a clickable link. The PR test plan explicitly includes "Verify DESIGN_SPEC.md renders correctly on GitHub", and this entry will fail that check. A URL is needed, or the entry should be formatted as plain text until a URL is available.
If the arXiv URL is known, use the full link format:
| - [Cemri et al., "Multi-Agent System Failure Taxonomy (MAST)" (2025)] - MAS coordination error classification | |
| - [Cemri et al., "Multi-Agent System Failure Taxonomy (MAST)" (2025)](https://arxiv.org/abs/XXXX.XXXXX) - MAS coordination error classification |
If the URL isn't available yet, use plain text to avoid a broken-looking entry:
| - [Cemri et al., "Multi-Agent System Failure Taxonomy (MAST)" (2025)] - MAS coordination error classification | |
| - Cemri et al., "Multi-Agent System Failure Taxonomy (MAST)" (2025) - MAS coordination error classification |
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 2610
Comment:
**Malformed markdown link missing URL**
The Cemri et al. entry is not a valid markdown hyperlink — `[text]` without a following `(url)` renders as literal square-bracketed text, not a clickable link. The PR test plan explicitly includes "Verify DESIGN_SPEC.md renders correctly on GitHub", and this entry will fail that check. A URL is needed, or the entry should be formatted as plain text until a URL is available.
If the arXiv URL is known, use the full link format:
```suggestion
- [Cemri et al., "Multi-Agent System Failure Taxonomy (MAST)" (2025)](https://arxiv.org/abs/XXXX.XXXXX) - MAS coordination error classification
```
If the URL isn't available yet, use plain text to avoid a broken-looking entry:
```suggestion
- Cemri et al., "Multi-Agent System Failure Taxonomy (MAST)" (2025) - MAS coordination error classification
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| #### Task Structure Classification | ||
|
|
||
| Each task will carry a `task_structure` field (to be added to §6.2 Task Definition at M4) classifying its decomposability: |
There was a problem hiding this comment.
Stale forward reference — field already added in this PR
This sentence describes task_structure as something that will be added to §6.2 in the future ("to be added to §6.2 Task Definition at M4"), but this same PR already adds that field to the §6.2 task config schema (at line 747 with task_structure: "parallel" # M4 — see §6.9). The forward reference is now stale and mildly misleading.
| Each task will carry a `task_structure` field (to be added to §6.2 Task Definition at M4) classifying its decomposability: | |
| Each task carries a `task_structure` field (§6.2 Task Definition) classifying its decomposability: |
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 1106
Comment:
**Stale forward reference — field already added in this PR**
This sentence describes `task_structure` as something that will be added to §6.2 in the future ("to be added to §6.2 Task Definition at M4"), but this same PR already adds that field to the §6.2 task config schema (at line 747 with `task_structure: "parallel" # M4 — see §6.9`). The forward reference is now stale and mildly misleading.
```suggestion
Each task carries a `task_structure` field (§6.2 Task Definition) classifying its decomposability:
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| | Metric | Symbol | Definition | What It Signals | | ||
| |--------|--------|------------|-----------------| | ||
| | **Coordination efficiency** | `Ec` | `success_rate / (turns / turns_sas)` — success normalized by relative turn count vs single-agent baseline | Overall coordination ROI. Low Ec = coordination costs exceed benefits | |
There was a problem hiding this comment.
Inconsistent variable notation in Ec formula
The Ec formula uses the bare name turns, while every other metric in this table explicitly qualifies the variable — O% uses turns_mas and turns_sas. A reader interpreting the table in isolation won't know whether turns means MAS turns, SAS turns, or total turns. Using turns_mas here aligns with the rest of the table and removes the ambiguity.
| | **Coordination efficiency** | `Ec` | `success_rate / (turns / turns_sas)` — success normalized by relative turn count vs single-agent baseline | Overall coordination ROI. Low Ec = coordination costs exceed benefits | | |
| | **Coordination efficiency** | `Ec` | `success_rate / (turns_mas / turns_sas)` — success normalized by relative turn count vs single-agent baseline | Overall coordination ROI. Low Ec = coordination costs exceed benefits | |
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 1546
Comment:
**Inconsistent variable notation in `Ec` formula**
The `Ec` formula uses the bare name `turns`, while every other metric in this table explicitly qualifies the variable — `O%` uses `turns_mas` and `turns_sas`. A reader interpreting the table in isolation won't know whether `turns` means MAS turns, SAS turns, or total turns. Using `turns_mas` here aligns with the rest of the table and removes the ambiguity.
```suggestion
| **Coordination efficiency** | `Ec` | `success_rate / (turns_mas / turns_sas)` — success normalized by relative turn count vs single-agent baseline | Overall coordination ROI. Low Ec = coordination costs exceed benefits |
```
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
sequential/parallel/mixed), per-task coordination topology selection, and auto topology selector rules based on empirical researchtask_structurefield to task config schemaAll additions are M4+ forward-looking design sections. No code changes.
Research Source
Kim et al., "Towards a Science of Scaling Agent Systems" (2025) — 180 controlled experiments across 3 LLM families, 4 benchmarks, 5 coordination topologies.
Test Plan
🤖 Generated with Claude Code