Skip to content

Add design specification and project setup#1

Closed
Aureliolo wants to merge 2 commits intomainfrom
feat/design-spec
Closed

Add design specification and project setup#1
Aureliolo wants to merge 2 commits intomainfrom
feat/design-spec

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • DESIGN_SPEC.md: Comprehensive high-level design specification (17 sections) covering agent identity system, company structure templates, communication architecture, task/workflow engine, memory system, multi-provider LLM layer, budget management, tool/capability system, security & approval workflows, API-first architecture, and template/builder system
  • LICENSE: BSL 1.1 (non-commercial use, converts to Apache 2.0 after 4 years)
  • README.md: Updated with project overview, planned features, and tech stack

Research Included

Analysis of MetaGPT, ChatDev, CrewAI, AutoGen, LangGraph, LiteLLM, Mem0, MCP, A2A Protocol, and event-driven multi-agent architecture patterns. Recommendation: build from scratch leveraging existing libraries (LiteLLM, Mem0, FastAPI, MCP).

Test plan

  • Review DESIGN_SPEC.md for completeness and feasibility
  • Verify BSL 1.1 license parameters are correct
  • Validate README accurately represents the project

Comprehensive design spec covering:
- Agent identity system with roles, personalities, skills, seniority
- Company structure templates (solo founder to full enterprise)
- Communication architecture (event-driven bus, hierarchical, meetings, hybrid)
- Task/workflow engine with multiple execution patterns
- Memory system (working, episodic, semantic, procedural, social)
- Multi-provider LLM layer via LiteLLM (Claude, OpenRouter, Ollama)
- Budget management with CFO agent and cost optimization
- Tool/capability system with progressive trust
- Security operations agent and configurable autonomy levels
- API-first architecture (FastAPI + WebSocket) with web UI
- Template system and company builder
- Research analysis of MetaGPT, ChatDev, CrewAI, AutoGen, LangGraph
- Build-from-scratch recommendation leveraging LiteLLM, Mem0, MCP, A2A
- Open questions, risks, backlog, and scaling path
Business Source License 1.1 with parameters:
- Non-commercial use only (personal, education, research, evaluation)
- Commercial/production use requires separate license
- Converts to Apache 2.0 after 4 years per version
Copilot AI review requested due to automatic review settings February 27, 2026 08:36
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Documentation

    • Added comprehensive design specification document detailing framework architecture and core concepts
    • Updated README with expanded sections including project concept, key features, status, and tech stack information
  • Chores

    • Added Business Source License 1.1 with usage terms and conditions

Walkthrough

Added a comprehensive design specification document (DESIGN_SPEC.md) outlining the architecture for an AI company framework, covering agent systems, company structure, task workflows, memory, model providers, cost management, and security. Added Business Source License 1.1 file. Updated README with expanded sections.

Changes

Cohort / File(s) Summary
Documentation
DESIGN_SPEC.md, README.md
Added comprehensive design specification detailing system vision, architecture across multiple domains (agent system, company structure, workflows, memory, providers, costs, tools, security), and configuration principles. Restructured README with expanded sections including concept, key features, status, and tech stack.
Licensing
LICENSE
Added Business Source License 1.1 granting non-production use for personal/educational/research purposes, with four-year change date to Apache 2.0 for production/commercial use rights.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add design specification and project setup' accurately summarizes the main changes: adding DESIGN_SPEC.md, LICENSE, and updating README.md for initial project documentation and licensing.
Description check ✅ Passed The description is well-structured and directly related to the changeset, detailing the three files modified/added (DESIGN_SPEC.md, LICENSE, README.md) with relevant context about research and test plan.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/design-spec

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the foundational documentation for the 'AI Company' project, a framework designed to orchestrate autonomous AI agents within a virtual company structure. It introduces a comprehensive design specification detailing the project's vision, architecture, and planned features, along with a new license. The updated README provides a clear, high-level overview of the project's goals and technical direction, setting the stage for future development.

Highlights

  • Comprehensive Design Specification: A detailed DESIGN_SPEC.md document was added, outlining the high-level architecture, core concepts, agent system, communication, task management, memory, model provider layer, cost management, tool system, security, human interaction, templates, and technical stack for the 'AI Company' framework. This document also includes extensive research and a build-vs-fork decision.
  • Project Licensing Established: The project's licensing was defined with the addition of a LICENSE file, specifying a Business Source License (BSL 1.1) that permits non-production use and converts to Apache License 2.0 after four years.
  • Updated Project Overview: The README.md file was significantly updated to reflect the project's new vision, planned key features, current design phase status, and the proposed technical stack, providing a clear introduction to the 'AI Company' concept.
  • Research-Driven Architecture Decisions: The design specification includes a thorough analysis of existing multi-agent frameworks (e.g., MetaGPT, CrewAI, AutoGen) and concludes with a recommendation to build from scratch while leveraging key libraries like LiteLLM, Mem0, and FastAPI.
Changelog
  • DESIGN_SPEC.md
    • Added a new document detailing the high-level design specification for the AI Company project.
    • Included sections on vision, core concepts, agent system, company structure, communication, task/workflow engine, memory, model provider layer, cost management, tools, security, human interaction, templates, technical architecture, research, risks, and future vision.
  • LICENSE
    • Added the Business Source License (BSL 1.1) for the project.
    • Specified non-commercial use, with conversion to Apache License 2.0 after four years.
  • README.md
    • Updated the project title and concept description.
    • Added a 'Key Features (Planned)' section.
    • Included a 'Status' section indicating the design phase.
    • Detailed the 'Tech Stack (Planned)'.
    • Updated the 'Documentation' section to link to the new design specification.
Activity
  • No human activity has been recorded on this pull request since its creation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the foundational documents for the 'AI Company' project, including a comprehensive design specification, the project license, and an updated README. The design specification is incredibly detailed and well-thought-out, covering a wide range of architectural and conceptual aspects. My review focuses on improving clarity and consistency in a few key areas of the design to ensure a smoother implementation. Specifically, I've pointed out inconsistencies in model identifiers, ambiguity in how agents are referenced, and minor omissions in the documentation.

Comment on lines +500 to +501
assigned_to: "sarah_chen"
reviewers: ["engineering_lead", "security_engineer"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Throughout the document, identifiers for agents and roles are used inconsistently. For example, tasks are assigned to "sarah_chen", but the agent card shows name: "Sarah Chen" and id: "uuid". Other fields use role-based identifiers like "product_manager_1" and "engineering_lead". For a robust implementation, the design should specify a single, consistent identification scheme. Using unique agent IDs (like the uuid from the agent card) is generally recommended over names or roles to avoid ambiguity.

Comment on lines +137 to +140
model_id: "claude-sonnet-4-6"
temperature: 0.3
max_tokens: 8192
fallback_model: "openrouter/anthropic/claude-haiku"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback_model is specified as openrouter/anthropic/claude-haiku. However, in the openrouter provider configuration (Section 8.2), there is no model defined with this ID or a similar alias. The model IDs under OpenRouter follow a provider/model-name format (e.g., anthropic/claude-sonnet-4-6). Please ensure consistency between the agent's model configuration and the provider definitions.

Comment on lines +684 to +690
models:
- id: "anthropic/claude-sonnet-4-6"
alias: "or-sonnet"
- id: "google/gemini-2.5-pro"
alias: "or-gemini-pro"
- id: "deepseek/deepseek-r1"
alias: "or-deepseek"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the provider configuration (Section 8.2), cost details (cost_per_1k_input, cost_per_1k_output) are provided for anthropic and ollama models, but are omitted for openrouter models. For consistency in the design document, it would be clearer to either include example costs for all providers or state explicitly why they are omitted for some (e.g., due to dynamic pricing).

├── DESIGN_SPEC.md # This document
├── README.md
├── pyproject.toml
└── CLAUDE.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposed project structure in section 14.3 includes a CLAUDE.md file at the root level. The purpose and content of this file are not explained anywhere in the design specification. Please add a description for this file to clarify its role in the project.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial project scaffolding documentation and licensing to establish the project’s direction and usage terms.

Changes:

  • Introduces a comprehensive high-level design specification (DESIGN_SPEC.md) for the planned architecture and systems.
  • Adds a Business Source License 1.1 (LICENSE) with a planned conversion to Apache 2.0.
  • Replaces the minimal README with a fuller project overview, planned features, and planned tech stack.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
README.md New project overview, planned feature list, and planned tech stack, linking to the design spec.
LICENSE Adds BSL 1.1 text with project-specific parameters and planned conversion license/date.
DESIGN_SPEC.md Adds a detailed design spec covering architecture, systems, workflows, and research references.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"Non-production use" means use solely for personal learning,
academic research, testing, development, and evaluation in
non-commercial settings.
Change Date: Four years from the date the Licensed Work is published.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BSL 1.1 “Change Date” parameter is typically a specific calendar date (used by the license text as a fixed conversion trigger). Using a relative phrase like “Four years from the date the Licensed Work is published” is ambiguous (publication date isn’t defined here) and can make the conversion date hard to determine. Consider setting an explicit ISO-8601 date (YYYY-MM-DD) tied to the first public release, and updating it per version if needed.

Suggested change
Change Date: Four years from the date the Licensed Work is published.
Change Date: 2030-01-01

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +22
Additional Use Grant: You may make non-production use of the Licensed Work for
personal, educational, research, and evaluation purposes only.
Production use and commercial use of the Licensed Work require
a separate commercial license from the Licensor.

"Production use" means using the Licensed Work or any
derivative work in a live, revenue-generating, or
business-critical environment, including but not limited to:
deploying the Licensed Work as part of a commercial product or
service, using it to generate revenue directly or indirectly,
or offering it to third parties on a hosted or embedded basis.

"Non-production use" means use solely for personal learning,
academic research, testing, development, and evaluation in
non-commercial settings.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Additional Use Grant” section currently restates non-production use restrictions and introduces custom definitions of “Production use” / “Non-production use”. This can create confusion or conflict with the standard BSL 1.1 wording/definitions, and it doesn’t actually grant any additional production rights beyond the base license. Consider either (a) removing the Additional Use Grant entirely, or (b) using it only to explicitly allow the specific limited production uses you intend, while keeping the standard BSL definitions unchanged.

Suggested change
Additional Use Grant: You may make non-production use of the Licensed Work for
personal, educational, research, and evaluation purposes only.
Production use and commercial use of the Licensed Work require
a separate commercial license from the Licensor.
"Production use" means using the Licensed Work or any
derivative work in a live, revenue-generating, or
business-critical environment, including but not limited to:
deploying the Licensed Work as part of a commercial product or
service, using it to generate revenue directly or indirectly,
or offering it to third parties on a hosted or embedded basis.
"Non-production use" means use solely for personal learning,
academic research, testing, development, and evaluation in
non-commercial settings.
Additional Use Grant: None.

Copilot uses AI. Check for mistakes.
│ │
│ ┌──────────────────────┐ ┌─────────────────────────────┐ │
│ │ Web UI (Local) │ │ CLI Tool │ │
│ │ React/Vue Dashboard │ │ ai-company <command> │ │
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tech stack is inconsistent about the planned web UI framework: the architecture diagram calls it a “React/Vue Dashboard” while elsewhere (including README) it’s framed as Vue 3 (or at least “React or Vue 3”). Consider picking a single planned option or explicitly marking React vs Vue as an unresolved decision so readers don’t infer conflicting commitments.

Suggested change
│ │ React/Vue Dashboard │ │ ai-company <command> │ │
│ │ Vue 3 Dashboard │ │ ai-company <command> │ │

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 523-544: Update the markdown code fences for the workflow diagrams
to include a language spec (e.g., ```text) so linters/renderers treat them as
plain text; specifically modify the code blocks under the "Sequential Pipeline",
"Parallel Execution", "Kanban Board", and "Agile Sprints" headings in
DESIGN_SPEC.md to use a fenced code block with a language identifier such as
text.
- Around line 393-396: Update the protocol attributions in the DESIGN_SPEC where
A2A Protocol and MCP are listed: change the A2A attribution to state it is
governed by the Linux Foundation via the Agent2Agent (A2A) project (remove any
"Google/Linux Foundation" wording) and change the MCP attribution to indicate it
is stewarded by the Agentic AI Foundation (a directed fund under the Linux
Foundation), replacing any "Anthropic/Linux Foundation" phrasing; ensure the
lines mention the exact names "Agent2Agent (A2A) project" and "Agentic AI
Foundation (a directed fund under the Linux Foundation)".

In `@LICENSE`:
- Line 23: Update the "Change Date" entry to remove ambiguity by either
inserting an explicit calendar date (e.g., "Change Date: 2030-02-27" or "Change
Date: February 27, 2030") or by defining what "published" means (e.g., "Change
Date: Four years from the first public release of version 1.0" or "four years
from the date of the first public commit/tag"). Edit the existing "Change Date:
Four years from the date the Licensed Work is published." line to one of these
clear alternatives so the timing is unambiguous and enforceable.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e040da and 3537f9c.

📒 Files selected for processing (3)
  • DESIGN_SPEC.md
  • LICENSE
  • README.md
📜 Review details
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Ollama Integration - all AI agents use Ollama for local LLM serving with default endpoint `http://localhost:11434`

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes

Applied to files:

  • README.md
📚 Learning: 2026-02-26T17:43:50.869Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.869Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • README.md
🪛 LanguageTool
DESIGN_SPEC.md

[typographical] ~1-~1: To join two clauses or introduce examples, consider using an em dash.
Context: # AI Company - High-Level Design Specification > A fra...

(DASH_RULE)


[typographical] ~193-~193: To join two clauses or introduce examples, consider using an em dash.
Context: ...### Engineering - Software Architect - System design, technology decisions, pat...

(DASH_RULE)


[typographical] ~194-~194: To join two clauses or introduce examples, consider using an em dash.
Context: ...Frontend Developer** (Junior/Mid/Senior) - UI implementation, components, state man...

(DASH_RULE)


[typographical] ~195-~195: To join two clauses or introduce examples, consider using an em dash.
Context: ...Backend Developer* (Junior/Mid/Senior) - APIs, business logic, databases - **Full...

(DASH_RULE)


[typographical] ~196-~196: To join two clauses or introduce examples, consider using an em dash.
Context: ...ll-Stack Developer** (Junior/Mid/Senior) - End-to-end implementation - **DevOps/SRE...

(DASH_RULE)


[typographical] ~197-~197: To join two clauses or introduce examples, consider using an em dash.
Context: ...implementation - DevOps/SRE Engineer - Infrastructure, CI/CD, monitoring, deplo...

(DASH_RULE)


[typographical] ~198-~198: To join two clauses or introduce examples, consider using an em dash.
Context: ...ring, deployment - Database Engineer - Schema design, query optimization, migra...

(DASH_RULE)


[typographical] ~199-~199: To join two clauses or introduce examples, consider using an em dash.
Context: ...tion, migrations - Security Engineer - Security audits, vulnerability assessmen...

(DASH_RULE)


[style] ~383-~383: Consider using a different adjective to strengthen your wording.
Context: ...#### Pattern 4: Hybrid (Recommended for Full Company) Combines all three: - **Messa...

(FULL_ENTIRE)


[style] ~616-~616: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...onality would complement the team? - What model/provider fits the budget? 3. Cand...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~619-~619: Please add a punctuation mark at the end of paragraph.
Context: ...company context, project briefing, team introductions #### Firing / Offboarding 1. Triggered...

(PUNCTUATION_PARAGRAPH_END)


[style] ~625-~625: You have used the passive voice repeatedly in nearby sentences. To make your writing clearer and easier to read, consider using active voice.
Context: ... 3. Active tasks are reassigned 4. Team is notified #### Performance Tracking ```yaml empl...

(REP_PASSIVE_VOICE)


[typographical] ~1324-~1324: In American English, use a period after an abbreviation.
Context: ...pid prototyping | ### 15.2 What Exists vs What We Need | Feature | MetaGPT | Cha...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1328-~1328: To join two clauses or introduce examples, consider using an em dash.
Context: ...ulation | Partial | Partial | No | Yes - complete | | HR (hiring/firing) | No |...

(DASH_RULE)


[typographical] ~1332-~1332: To join two clauses or introduce examples, consider using an em dash.
Context: ...alities | Basic | Basic | Basic | Deep - traits, styles, evolution | | Dynamic ...

(DASH_RULE)


[typographical] ~1333-~1333: To join two clauses or introduce examples, consider using an em dash.
Context: ... team scaling | No | No | Manual | Yes - auto + manual | | Multiple company typ...

(DASH_RULE)


[typographical] ~1334-~1334: To join two clauses or introduce examples, consider using an em dash.
Context: ...company types | No | No | Manual | Yes - templates + builder | | Security ops a...

(DASH_RULE)


[typographical] ~1336-~1336: To join two clauses or introduce examples, consider using an em dash.
Context: ...ble autonomy | No | No | Limited | Yes - full spectrum | | Local + cloud provid...

(DASH_RULE)


[typographical] ~1337-~1337: To join two clauses or introduce examples, consider using an em dash.
Context: ...rs | Partial | Partial | Partial | Yes - all via LiteLLM | | Cost tracking per ...

(DASH_RULE)


[typographical] ~1338-~1338: To join two clauses or introduce examples, consider using an em dash.
Context: ...racking per agent | No | No | No | Yes - full budget system | | Progressive tru...

(DASH_RULE)


[typographical] ~1345-~1345: In American English, use a period after an abbreviation.
Context: ...Planned (backlog) | ### 15.3 Build vs Fork Decision **Recommendation: Build ...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1357-~1357: To join two clauses or introduce examples, consider using an em dash.
Context: ...hat we use (not fork): - LiteLLM - Provider abstraction (don't reinvent thi...

(DASH_RULE)


[typographical] ~1358-~1358: To join two clauses or introduce examples, consider using an em dash.
Context: ...raction (don't reinvent this) - Mem0 - Agent memory (don't reinvent this) - **F...

(DASH_RULE)


[typographical] ~1359-~1359: To join two clauses or introduce examples, consider using an em dash.
Context: ...mory (don't reinvent this) - FastAPI - API layer - MCP - Tool integration s...

(DASH_RULE)


[typographical] ~1360-~1360: To join two clauses or introduce examples, consider using an em dash.
Context: ...his) - FastAPI - API layer - MCP - Tool integration standard - Pydantic...

(DASH_RULE)


[typographical] ~1361-~1361: To join two clauses or introduce examples, consider using an em dash.
Context: ...Tool integration standard - Pydantic - Config validation and data models - **Ty...

(DASH_RULE)


[typographical] ~1362-~1362: To join two clauses or introduce examples, consider using an em dash.
Context: ...g validation and data models - Typer - CLI - Vue 3 - Web UI --- ## 16. Op...

(DASH_RULE)


[typographical] ~1363-~1363: To join two clauses or introduce examples, consider using an em dash.
Context: ...ata models - Typer - CLI - Vue 3 - Web UI --- ## 16. Open Questions & Ris...

(DASH_RULE)


[style] ~1379-~1379: Consider using the typographical ellipsis character here instead.
Context: ... | Agent A asks Agent B who asks Agent A... | | 8 | Optimal message bus for local-f...

(ELLIPSIS)


[typographical] ~1380-~1380: In American English, use a period after an abbreviation.
Context: ...architecture? | Medium | asyncio queues vs Redis vs embedded broker | | 9 | How to...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1380-~1380: In American English, use a period after an abbreviation.
Context: ...ure? | Medium | asyncio queues vs Redis vs embedded broker | | 9 | How to handle c...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1381-~1381: In American English, use a period after an abbreviation.
Context: ...y? | High | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the ...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[typographical] ~1381-~1381: In American English, use a period after an abbreviation.
Context: ...h | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the minimum ...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[style] ~1456-~1456: To make your text as clear as possible to all readers, do not use this foreign term. Possible alternatives are ‘in fact’ or ‘in reality’.
Context: ...rop | | OpenAI API format | OpenAI (de facto standard) | LLM API interface | Via Lit...

(DE_FACTO)


[style] ~1456-~1456: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “API”.
Context: ...at** | OpenAI (de facto standard) | LLM API interface | Via LiteLLM compatibility | ## Appen...

(ACRONYM_TAUTOLOGY)


[typographical] ~1460-~1460: To join two clauses or introduce examples, consider using an em dash.
Context: ...s://github.com/FoundationAgents/MetaGPT) - Multi-agent SOP framework (64.5k stars) ...

(DASH_RULE)


[typographical] ~1461-~1461: To join two clauses or introduce examples, consider using an em dash.
Context: ...2.0](https://github.com/openbmb/ChatDev) - Zero-code multi-agent platform (31.2k st...

(DASH_RULE)


[typographical] ~1462-~1462: To join two clauses or introduce examples, consider using an em dash.
Context: ...AI](https://github.com/crewAIInc/crewAI) - Role-based agent collaboration framework...

(DASH_RULE)


[typographical] ~1463-~1463: To join two clauses or introduce examples, consider using an em dash.
Context: ...n](https://github.com/microsoft/autogen) - Microsoft async multi-agent framework - ...

(DASH_RULE)


[typographical] ~1464-~1464: To join two clauses or introduce examples, consider using an em dash.
Context: ...LLM](https://github.com/BerriAI/litellm) - Unified LLM API gateway (100+ providers)...

(DASH_RULE)


[typographical] ~1465-~1465: To join two clauses or introduce examples, consider using an em dash.
Context: ...- Mem0 - Universal memory layer for AI agents - [...

(DASH_RULE)


[typographical] ~1466-~1466: To join two clauses or introduce examples, consider using an em dash.
Context: ...ocol](https://github.com/a2aproject/A2A) - Agent-to-Agent protocol (Linux Foundatio...

(DASH_RULE)


[typographical] ~1467-~1467: To join two clauses or introduce examples, consider using an em dash.
Context: ...extprotocol.io/specification/2025-11-25) - Model Context Protocol - [Langfuse Agent...

(DASH_RULE)


[typographical] ~1468-~1468: To join two clauses or introduce examples, consider using an em dash.
Context: ...com/blog/2025-03-19-ai-agent-comparison) - Framework comparison - [Confluent Event-...

(DASH_RULE)


[typographical] ~1469-~1469: To join two clauses or introduce examples, consider using an em dash.
Context: .../blog/event-driven-multi-agent-systems/) - Multi-agent architecture patterns - [Mic...

(DASH_RULE)


[typographical] ~1470-~1470: To join two clauses or introduce examples, consider using an em dash.
Context: ....io/multi-agent-reference-architecture/) - Enterprise patterns - [OpenRouter](https...

(DASH_RULE)


[typographical] ~1471-~1471: To join two clauses or introduce examples, consider using an em dash.
Context: ...s - OpenRouter - Multi-model API gateway

(DASH_RULE)

README.md

[typographical] ~11-~11: To join two clauses or introduce examples, consider using an em dash.
Context: ...s (Planned) - Any Company Structure - From a 2-person startup to a 50+ enterpr...

(DASH_RULE)


[typographical] ~12-~12: To join two clauses or introduce examples, consider using an em dash.
Context: ...nfig/templates - Deep Agent Identity - Names, personalities, skills, seniority ...

(DASH_RULE)


[typographical] ~13-~13: To join two clauses or introduce examples, consider using an em dash.
Context: ...erformance tracking - Multi-Provider - Anthropic Claude, OpenRouter (400+ model...

(DASH_RULE)


[typographical] ~14-~14: To join two clauses or introduce examples, consider using an em dash.
Context: ... via LiteLLM - Smart Cost Management - Per-agent budget tracking, auto model ro...

(DASH_RULE)


[typographical] ~15-~15: To join two clauses or introduce examples, consider using an em dash.
Context: ...optimization - Configurable Autonomy - From fully autonomous to human-approves-...

(DASH_RULE)


[typographical] ~16-~16: To join two clauses or introduce examples, consider using an em dash.
Context: ...agent in between - Persistent Memory - Agents remember past decisions, code, re...

(DASH_RULE)


[typographical] ~17-~17: To join two clauses or introduce examples, consider using an em dash.
Context: ...relationships (via Mem0) - HR System - Hire, fire, promote agents. HR agent ana...

(DASH_RULE)


[typographical] ~18-~18: To join two clauses or introduce examples, consider using an em dash.
Context: ...tem, git, code execution, web, databases - role-based and sandboxed - API-First...

(DASH_RULE)


[grammar] ~20-~20: Please add a punctuation mark at the end of paragraph.
Context: ...built company templates and interactive builder ## Status Design phase. See [DESI...

(PUNCTUATION_PARAGRAPH_END)


[typographical] ~37-~37: To join two clauses or introduce examples, consider using an em dash.
Context: ...- Design Specification - Full high-level design

(DASH_RULE)

🪛 markdownlint-cli2 (0.21.0)
DESIGN_SPEC.md

[warning] 76-76: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 178-178: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 185-185: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 192-192: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 201-201: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 207-207: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 212-212: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 218-218: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 256-256: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 332-332: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 352-352: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 365-365: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 461-461: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 523-523: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 524-524: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 524-524: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 528-528: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 529-529: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 529-529: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 535-535: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 536-536: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 536-536: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 543-543: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 544-544: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 544-544: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 567-567: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 610-610: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 621-621: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 627-627: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 628-628: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 645-645: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 747-747: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 894-894: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 967-967: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 986-986: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1080-1080: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1127-1127: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1185-1185: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 1434-1434: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (8)
README.md (1)

1-37: LGTM! Clear and well-structured README for the design phase.

The README accurately represents the project's current design phase status with appropriate "Planned" labels throughout. The tech stack aligns with the detailed specifications in DESIGN_SPEC.md, and the documentation structure provides a clear entry point for understanding the project's ambitious vision.

The static analysis hints about em dashes and punctuation are stylistic preferences for markdown lists and can be safely ignored - your current formatting is conventional and readable.

LICENSE (1)

1-72: Solid license choice with clear use restrictions.

Business Source License 1.1 is a well-established choice for commercial open-source projects. The Additional Use Grant clearly defines production vs. non-production boundaries, and the four-year conversion to Apache 2.0 provides a clear path to fully open-source status. This aligns well with your stated PR objectives.

DESIGN_SPEC.md (6)

1311-1364: Excellent research section with sound build-from-scratch rationale.

The framework comparison and build-vs-fork analysis is thorough and well-reasoned. The decision to build from scratch while leveraging battle-tested libraries (LiteLLM, Mem0, FastAPI, MCP) is pragmatic and aligns well with the unique requirements of this project.

The feature comparison clearly demonstrates the gaps in existing frameworks that justify a new implementation.


1367-1404: Outstanding risk awareness and honest assessment of open questions.

The Open Questions & Risks section demonstrates excellent technical foresight. Highlighting challenges like context window exhaustion, agent communication loops, and conflicting agent opinions shows mature architectural thinking.

The proposed mitigations are practical:

  • Context window: Memory summarization + task decomposition
  • Cost explosion: Budget hard stops + loop detection
  • Over-engineering: Start with 3-5 agents, iterate

This honest assessment of risks and uncertainties significantly strengthens the credibility of the design specification.


1-1471: Exceptionally comprehensive and well-architected design specification.

This 1471-line design document demonstrates remarkable depth and breadth of thought. Key strengths include:

  1. Innovative unique features: HR system with hiring/firing, CFO agent for cost management, Security Operations agent, progressive trust system - none of these exist in current frameworks
  2. Multiple architecture patterns: Event-driven, hierarchical, meeting-based, and hybrid communication patterns provide flexibility
  3. Mature risk awareness: Section 16's open questions and risks show realistic assessment of implementation challenges
  4. Well-researched technology stack: Sound choices (Python, FastAPI, LiteLLM, Mem0, MCP) with clear rationales
  5. Clear MVP path: Starting with 3-5 agents and iterating avoids over-engineering
  6. Extensibility by design: Plugin architecture, custom roles, configurable autonomy levels

The comparison with existing frameworks (MetaGPT, CrewAI, AutoGen, etc.) clearly justifies building from scratch while leveraging battle-tested libraries.

Minor formatting improvements suggested by static analysis (missing language specs for code blocks, blank lines around headings) can be addressed during implementation but don't detract from the specification quality.


705-712: The LiteLLM integration section is accurate. As of February 2026, LiteLLM supports all features claimed in the design: multi-provider support with unified API, built-in cost tracking (with recent additions for agent-level cost tracking per query/token as of Dec 2025), automatic fallback chains and load balancing via the Router, and an OpenAI-compatible gateway that normalizes provider interfaces. The design's reliance on LiteLLM for these capabilities is well-founded.


135-140: No action needed. The model IDs referenced (claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5) are valid Anthropic API identifiers as of February 2026 and match current naming conventions. The DESIGN_SPEC.md file contains accurate, current model references.

Likely an incorrect or invalid review comment.


1303-1303: The "26% accuracy boost" claim for Mem0 is documented in peer-reviewed research and Mem0's official materials. It refers to a 26% relative improvement on the LoCoMo (Long Conversation Memory) benchmark's LLM-as-a-Judge metric over OpenAI Memory (66.9% vs 52.9%), published in Mem0's arXiv paper (arXiv:2504.19413). A public LoCoMo benchmark repository provides reproducible evaluation code. This claim has reliable sources and does not require additional verification.

Likely an incorrect or invalid review comment.

Comment on lines +393 to +396

- **A2A Protocol** (Agent-to-Agent, Google/Linux Foundation) - For inter-agent task delegation, capability discovery via Agent Cards, and structured task lifecycle management
- **MCP** (Model Context Protocol, Anthropic/Linux Foundation) - For agent-to-tool integration, providing standardized tool discovery and invocation

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

A2A Agent-to-Agent Protocol governance standard body 2025 2026

💡 Result:

In 2025–2026, the governance / standards steward for the A2A (Agent2Agent / Agent-to-Agent) Protocol is the Linux Foundation, via the Agent2Agent (A2A) project (formed after Google transferred/donated the spec, SDKs, and tooling in June 2025). The project is run under Linux Foundation neutral governance, with a Technical Steering Committee (TSC) establishing the protocol’s governance structure and working groups. [1], [2], [3]

Answer: The Linux Foundation (Agent2Agent/A2A project, governed via a TSC). [1], [2]

Sources
[1] Google Developers Blog (Jun 23, 2025), “Google Cloud donates A2A to Linux Foundation” (developers.googleblog.com)
[2] A2A Protocol Roadmap (Governance section; “donated to the Linux Foundation”, TSC) (a2a-protocol.org)
[3] IANA well-known URI registration issue (lists “Change controller: Linux Foundation” + governance doc link) (github.com)

Citations:


🌐 Web query:

Model Context Protocol MCP governance Anthropic Linux Foundation 2025 2026

💡 Result:

  • What changed (governance): On December 9, 2025, Anthropic transferred stewardship of the Model Context Protocol (MCP) into the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation, alongside other founding project contributions (Block’s goose and OpenAI’s AGENTS.md). [1][2]
  • How governance is structured: AAIF has a Governing Board responsible for strategic investments, budget allocation, member recruitment, and approving new projects, while each project (including MCP) retains autonomy over technical direction and day-to-day operations; the MCP project stated its maintainer-led governance continues and the Linux Foundation does not dictate MCP’s technical direction. [1]
  • 2026 continuity / community operations: AAIF’s launch materials also point to ongoing community activities, including the next MCP Dev Summit in New York City on April 2–3, 2026. [2]

Sources: [1] MCP project blog (Dec 9, 2025) [2] AAIF / Linux Foundation press release (Dec 9, 2025)


Update A2A and MCP protocol attribution to reflect current governance structures.

As of February 2026, both protocol governance structures have changed:

  • A2A Protocol: Following Google's donation in June 2025, it is now governed by the Linux Foundation via the Agent2Agent (A2A) project (not "Google/Linux Foundation").
  • MCP: Following Anthropic's transfer in December 2025, it is now steward by the Agentic AI Foundation (a directed fund under the Linux Foundation), not directly by "Anthropic/Linux Foundation".

Update lines 393–396 to reflect current governance attribution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` around lines 393 - 396, Update the protocol attributions in
the DESIGN_SPEC where A2A Protocol and MCP are listed: change the A2A
attribution to state it is governed by the Linux Foundation via the Agent2Agent
(A2A) project (remove any "Google/Linux Foundation" wording) and change the MCP
attribution to indicate it is stewarded by the Agentic AI Foundation (a directed
fund under the Linux Foundation), replacing any "Anthropic/Linux Foundation"
phrasing; ensure the lines mention the exact names "Agent2Agent (A2A) project"
and "Agentic AI Foundation (a directed fund under the Linux Foundation)".

Comment on lines +523 to +544
#### Sequential Pipeline
```
Requirements ──▶ Design ──▶ Implementation ──▶ Review ──▶ Testing ──▶ Deploy
```

#### Parallel Execution
```
┌──▶ Frontend Dev ──┐
Task ───┤ ├──▶ Integration ──▶ QA
└──▶ Backend Dev ──┘
```

#### Kanban Board
```
Backlog │ Ready │ In Progress │ Review │ Done
○ │ ○ │ ● │ ○ │ ●●●
○ │ ○ │ ● │ │ ●●
○ │ │ │ │ ●
```

#### Agile Sprints
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding language specifications to code blocks for better rendering.

Several workflow diagram code blocks (lines 524, 529, 536, 544) don't specify a language identifier. While the ASCII art renders fine, adding a language spec like ```text would satisfy markdown linters and improve consistency.

This is a minor formatting suggestion and doesn't affect the technical content.

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 523-523: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 524-524: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 524-524: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 528-528: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 529-529: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 529-529: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 535-535: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 536-536: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 536-536: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 543-543: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 544-544: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 544-544: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` around lines 523 - 544, Update the markdown code fences for
the workflow diagrams to include a language spec (e.g., ```text) so
linters/renderers treat them as plain text; specifically modify the code blocks
under the "Sequential Pipeline", "Parallel Execution", "Kanban Board", and
"Agile Sprints" headings in DESIGN_SPEC.md to use a fenced code block with a
language identifier such as text.

"Non-production use" means use solely for personal learning,
academic research, testing, development, and evaluation in
non-commercial settings.
Change Date: Four years from the date the Licensed Work is published.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider specifying an explicit Change Date.

The current wording "Four years from the date the Licensed Work is published" leaves the publication date ambiguous. For clarity and enforceability, consider specifying an explicit date or defining what "published" means (e.g., "first public release", "first GitHub commit", etc.).

Example alternatives:

  • Change Date: 2030-02-27 (explicit date, four years from now)
  • Change Date: Four years from the first public release of version 1.0
  • Change Date: February 27, 2030
📅 Proposed fix for explicit date
-Change Date:          Four years from the date the Licensed Work is published.
+Change Date:          February 27, 2030
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Change Date: Four years from the date the Licensed Work is published.
Change Date: February 27, 2030
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@LICENSE` at line 23, Update the "Change Date" entry to remove ambiguity by
either inserting an explicit calendar date (e.g., "Change Date: 2030-02-27" or
"Change Date: February 27, 2030") or by defining what "published" means (e.g.,
"Change Date: Four years from the first public release of version 1.0" or "four
years from the date of the first public commit/tag"). Edit the existing "Change
Date: Four years from the date the Licensed Work is published." line to one of
these clear alternatives so the timing is unambiguous and enforceable.

@Aureliolo Aureliolo closed this Feb 27, 2026
Aureliolo added a commit that referenced this pull request Mar 5, 2026
…eviewers

Source changes (tracker.py):
- Fix budget_used_percent/alert_level threshold disagreement (#1)
- Add logger.warning before ValueError in _validate_time_range (#2)
- Split build_summary into _build_agent_spendings, _build_dept_spendings,
  _build_budget_context helpers to meet <50 line guideline (#3)
- Replace _aggregate tuple return with _AggregateResult NamedTuple (#4)
- Rewrite _aggregate as single-pass loop (#5)
- Aggregate departments from AgentSpending objects, not raw records (#6)
- Add DEBUG log in __init__ for object creation (#7)
- Fix module docstring "Section 10.2 service layer" accuracy (#15)
- Add DEBUG entry logs for get_total_cost/get_agent_cost (#16)

Event constants (events.py):
- Add BUDGET_TRACKER_CREATED, BUDGET_TOTAL_COST_QUERIED,
  BUDGET_AGENT_COST_QUERIED, BUDGET_TIME_RANGE_INVALID

Test improvements (test_tracker.py):
- Assert budget_used_percent in all alert-level tests (#8)
- Assert budget_total_monthly in configured summary test (#9)
- Add start==end raises ValueError test (#10)
- Add end-only time filter test for get_total_cost (#11)
- Add multi-record token aggregation assertions (#12)
- Add pytestmark = pytest.mark.timeout(30) (#13)
- Move @pytest.mark.unit to class level (#14)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 6, 2026
…t, Gemini, and greptile

- Add produced_artifacts field to AgentRunResult (#1)
- Wrap _log_completion in try/except to preserve valid results (#2)
- Add test for inner TimeoutError propagation without engine timeout (#3)
- Extract _run_loop_with_timeout from _execute (50-line limit) (#4)
- Extract _validate_run_inputs from run() (50-line limit) (#5)
- Rename metrics docstrings from "completed task" to "agent run" + add termination_reason to metrics event (#6)
- Fix raise exc from build_exc chain direction (#7)
- Replace asyncio.wait_for with asyncio.wait for timeout disambiguation (#8)
- Add test for _apply_post_execution_transitions failure resilience (#9)
- Add test for timeout cost recording behavior (#10)
- Fix hardcoded from_status in transition logs (#11)
- Add agent-task ownership check in _validate_task (#12)
- Split test_invalid_timeout_raises into two test methods (#13)
- Add negative validation tests for cost_per_task/duration_seconds (#14)
- Add test_blank_task_id_rejected (#15)
- Update _execute docstring to mention timeout, transitions, metrics (#16)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 7, 2026
…, and Copilot

Source fixes:
- Move asyncio/types out of TYPE_CHECKING for PEP 649 compatibility (#1, #2)
- Guard is_closing() inside try/except in _process_cleanup.py (#4)
- Normalize all control chars (incl. newlines/tabs) in _sanitize_stderr (#6)
- Apply stderr sanitization to sandbox git path too (#3)
- Fix list_directory truncation to use raw scan cap, not post-classification count (#7)
- Narrow except to InvalidStateError + log task identity in shutdown (#8)
- Add loop.stop() fallback when request_shutdown() fails in signal handlers (#9)
- Include zombie diagnostic in _drain_after_kill stderr output (#10)
- Remove @staticmethod from _log_post_cancel_exceptions (#17)
- Add _process_cleanup.py to DESIGN_SPEC §15.3 (#12) and §11.1.1 (#16)

Test additions:
- New test_process_cleanup.py: 7 tests covering all transport states (#5)
- Add _sanitize_stderr truncation test (#14)
- Add _log_post_cancel_exceptions tests (4 tests) (#11)
- Add signal handler recovery tests (3 tests) (#15)
- Use ValidationError instead of ValueError in config tests (#13)
- Update existing tests for new sanitization behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 8, 2026
Fixes from code-reviewer, docs-consistency, issue-resolution-verifier agents
plus Gemini, Greptile, Copilot, and CodeRabbit external reviewers:

- Fix ADR "custom stack is the initial backend" contradiction (→ Mem0)
- Update 6 stale "TBD" references in DESIGN_SPEC.md to reflect ADR-001
- Update CLAUDE.md package structure memory/ comment
- Update README.md memory layer and milestone status
- Clarify architecture constraint #1 (MVP exception for in-process)
- Add Zep→Graphiti pivot context note
- Fix "procedur" typo in ASCII diagram
- Clarify Kuzu concurrency issues (architectural, not bugs)
- Clarify S10/S11 scores and G3 gate for in-process vs full stack
- Add docs/decisions/ to project structure listing
- Fix minor typography (vs. abbreviation, unit spacing, paragraph endings)
- Update swappability table to reflect Mem0-first ordering
- Resolve Open Question #14 as resolved
- Update risk mitigation and extensibility notes
Aureliolo added a commit that referenced this pull request Mar 9, 2026
- Add routing optimization feature (#1): new suggest_routing_optimizations()
  method, RoutingSuggestion and RoutingOptimizationAnalysis models
- Add negative estimated_cost_usd validation (#2)
- Fix double snapshot in generate_report (#3)
- Fix deviation_factor to use spike_ratio when stddev=0 (#4)
- Convert DowngradeAnalysis.total_estimated_savings_per_1k to @computed_field (#5)
- Change str to NotBlankStr in SpendingReport tuple fields (#6)
- Add window_count upper bound validation (#7)
- Pre-group records by agent for O(N+M) complexity (#8)
- Update DESIGN_SPEC.md implementation snapshot (#9)
- Use projected alert level for auto-deny check (#11)
- Move approval log after ApprovalDecision construction (#12)
- Add ReportGenerator.__init__ debug log + event constant (#13)
- Fix _ALERT_LEVEL_ORDER comment (#14)
- Fix _classify_severity docstring for dual-use (#15)
- Add WARNING logs before ValueError raises (#16)
- Update evaluate_operation docstring (#17)
- Add sort-order validator to EfficiencyAnalysis.agents (#18)
- Add debug log when _find_most_used_model returns None (#19)
- Remove redundant stddev > 0 check in is_sigma_anomaly (#20)
- Document approval_warn_threshold_usd=0.0 behavior (#21)
- Extract helpers to _optimizer_helpers.py to stay under 800-line limit
Aureliolo added a commit that referenced this pull request Mar 10, 2026
…reptile

- Cap prompt_tokens instead of rejecting when heuristic exceeds actual (#2)
- Log policy_length instead of policy content to avoid leaks (#6)
- Sort _ACTION_VERBS for deterministic regex alternation (#8)
- Use PROMPT_POLICY_VALIDATION_FAILED event for advisory failures (#12)
- Add isinstance check and strip whitespace in TagBasedMemoryFilter (#13)
- Use MEMORY_FILTER_INIT event for filter-init log paths (#14, #16)
- Remove content_preview from store_guard warning log (#18)
- Track tools section conditionally for custom templates (#10)
- Reconcile enforced vs advisory wording in DESIGN_SPEC (#1)
- Inject fixed estimated_tokens in prompt-ratio test (#19)
- Parametrize timeout and org_policies tests (#20, #22)
- Add prompt_token_ratio assertion in lifecycle test (#21)
Aureliolo added a commit that referenced this pull request Mar 16, 2026
Security fixes:
- Mask sensitive values in validation error messages (#2)
- Sanitize SettingsEncryptionError in API response (#3)
- Don't cache decrypted plaintext secrets (#4)
- Validate namespace against SettingNamespace enum at controller (#8)
- Error on empty SYNTHORG_SETTINGS_KEY instead of silently ignoring (#9)

Persistence fixes:
- V9 migration: prefer settings_old as copy source when both exist (#1)
- Remove redundant idx_settings_namespace index (PK covers it) (#15)
- Fix updated_at format consistency (ISO 8601 with timezone) (#17)
- Align get_setting/set_setting types with protocol (NotBlankStr) (#10)

Performance:
- Use batch repo methods in get_all/get_namespace (eliminate N+1) (#6)

Code quality:
- Fix Any type to SettingsService on create_app parameter (#5)
- Add SETTINGS_NOTIFICATION_FAILED + SETTINGS_FETCH_FAILED events (#7,#16)
- Fix wrong event constants in repo error paths and notification (#7,#16)
- Remove duplicate SettingNotFoundError guard in get_entry (#21)
- Log SETTINGS_VALUE_RESOLVED for all resolution sources (#22)

Tests:
- Add V9 migration crash-safety tests (#11)
- Add tests for sensitive read without encryptor (#12)
- Add tests for get_all/get_namespace service methods (#13)
- Add tests for notification exception handling (#14)
- Use pydantic.ValidationError instead of broad Exception (#18)
- Add INTEGER, JSON, and sensitive-masking validation tests (#19)
- Use actual Unicode characters in encryption roundtrip test (#20)
Aureliolo added a commit that referenced this pull request Mar 16, 2026
Security fixes:
- Mask sensitive values in validation error messages (#2)
- Sanitize SettingsEncryptionError in API response (#3)
- Don't cache decrypted plaintext secrets (#4)
- Validate namespace against SettingNamespace enum at controller (#8)
- Error on empty SYNTHORG_SETTINGS_KEY instead of silently ignoring (#9)

Persistence fixes:
- V9 migration: prefer settings_old as copy source when both exist (#1)
- Remove redundant idx_settings_namespace index (PK covers it) (#15)
- Fix updated_at format consistency (ISO 8601 with timezone) (#17)
- Align get_setting/set_setting types with protocol (NotBlankStr) (#10)

Performance:
- Use batch repo methods in get_all/get_namespace (eliminate N+1) (#6)

Code quality:
- Fix Any type to SettingsService on create_app parameter (#5)
- Add SETTINGS_NOTIFICATION_FAILED + SETTINGS_FETCH_FAILED events (#7,#16)
- Fix wrong event constants in repo error paths and notification (#7,#16)
- Remove duplicate SettingNotFoundError guard in get_entry (#21)
- Log SETTINGS_VALUE_RESOLVED for all resolution sources (#22)

Tests:
- Add V9 migration crash-safety tests (#11)
- Add tests for sensitive read without encryptor (#12)
- Add tests for get_all/get_namespace service methods (#13)
- Add tests for notification exception handling (#14)
- Use pydantic.ValidationError instead of broad Exception (#18)
- Add INTEGER, JSON, and sensitive-masking validation tests (#19)
- Use actual Unicode characters in encryption roundtrip test (#20)
Aureliolo added a commit that referenced this pull request Mar 17, 2026
… reviewers

Source fixes:
- Use NotBlankStr | None for DualModeConfig.summarization_model (#1)
- Parallelize LLM calls with asyncio.TaskGroup in summarize_batch
  and _build_content (#2)
- Remove dead-code guards in _build_anchors (#3)
- Narrow except Exception to re-raise non-retryable ProviderErrors (#4)
- Fix double-logging on abstractive fallback (#9)
- Remove unnecessary import builtins (#10)
- Preserve key-value pairs verbatim in extractive mode (#5)
- Emit extracted facts one per line (#6)
- Strengthen ConsolidationResult validator with cross-field checks (#7)
- Check _backend.delete() return value in _process_group (#8)
- Fix mode_map type to dict[NotBlankStr, ArchivalMode] (#11)
- Move tie-breaking comment to _determine_group_mode (#12)
- Fix misleading DualModeConfig docstring (#13)
- Add missing mkdocstrings entries for retention/archival/simple_strategy (#14)
- Use O(M) lookup dict in _archive_entries (#15)
- Document 1000-entry query limit in run_consolidation docstring (#16)
- Add Raises section to AbstractiveSummarizer docstring (#17)

Test fixes:
- Fix imports in test_density.py to module level (#18)
- Strengthen fallback assertion to verify exact content (#19)
- Use exact call counts for summarizer/extractor (#20)
- Add tests: blank model rejection, MemoryError/RecursionError propagation (#21, #22)
- Add tests: validator rejects invalid archival state (#23)
- Add tests: 50/50 tie-breaking, None relevance handling (#24, #25)
- Assert actual preserved facts in extractive tests (#26)
- Prove archival index keyed by original_id not position (#27)
- Add test: empty string classifies as SPARSE (#28)
Aureliolo added a commit that referenced this pull request Mar 19, 2026
… and Gemini

- Fix resume path to call _resolve_loop instead of using static self._loop (#1)
- Validate loop_type/hybrid_fallback against _KNOWN_LOOP_TYPES at config time (#3)
- Fix redundant any() scan producing false-positive NO_RULE_MATCH warning (#4)
- Downgrade EXECUTION_LOOP_BUDGET_UNAVAILABLE to DEBUG to avoid log noise (#5)
- Add auto_loop_config to AgentEngine class docstring (#6)
- Reduce enforcer.py to 799 lines (was 806, limit 800) (#7)
- Fix select_loop_type Returns docstring accuracy (#8)
- Fix build_execution_loop docstring to mention hybrid (#9)
- Add EXECUTION_LOOP_BUDGET_UNAVAILABLE assertion in budget-error test (#10)
- Add resume path test for _resolve_loop (#11)
- Add test: rule mapping to react does not trigger NO_RULE_MATCH (#12)
- Add _resolve_loop docstring note about compaction/plan_execute_config (#13)
- Update module docstring to mention AutoLoopConfig/AutoLoopRule (#14)
- Simplify verbose log note string (#15)
- Add configurable default_loop_type to AutoLoopConfig (Gemini enhancement)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 19, 2026
… and Gemini

- Fix resume path to call _resolve_loop instead of using static self._loop (#1)
- Validate loop_type/hybrid_fallback against _KNOWN_LOOP_TYPES at config time (#3)
- Fix redundant any() scan producing false-positive NO_RULE_MATCH warning (#4)
- Downgrade EXECUTION_LOOP_BUDGET_UNAVAILABLE to DEBUG to avoid log noise (#5)
- Add auto_loop_config to AgentEngine class docstring (#6)
- Reduce enforcer.py to 799 lines (was 806, limit 800) (#7)
- Fix select_loop_type Returns docstring accuracy (#8)
- Fix build_execution_loop docstring to mention hybrid (#9)
- Add EXECUTION_LOOP_BUDGET_UNAVAILABLE assertion in budget-error test (#10)
- Add resume path test for _resolve_loop (#11)
- Add test: rule mapping to react does not trigger NO_RULE_MATCH (#12)
- Add _resolve_loop docstring note about compaction/plan_execute_config (#13)
- Update module docstring to mention AutoLoopConfig/AutoLoopRule (#14)
- Simplify verbose log note string (#15)
- Add configurable default_loop_type to AutoLoopConfig (Gemini enhancement)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 19, 2026
… and Gemini

- Fix resume path to call _resolve_loop instead of using static self._loop (#1)
- Validate loop_type/hybrid_fallback against _KNOWN_LOOP_TYPES at config time (#3)
- Fix redundant any() scan producing false-positive NO_RULE_MATCH warning (#4)
- Downgrade EXECUTION_LOOP_BUDGET_UNAVAILABLE to DEBUG to avoid log noise (#5)
- Add auto_loop_config to AgentEngine class docstring (#6)
- Reduce enforcer.py to 799 lines (was 806, limit 800) (#7)
- Fix select_loop_type Returns docstring accuracy (#8)
- Fix build_execution_loop docstring to mention hybrid (#9)
- Add EXECUTION_LOOP_BUDGET_UNAVAILABLE assertion in budget-error test (#10)
- Add resume path test for _resolve_loop (#11)
- Add test: rule mapping to react does not trigger NO_RULE_MATCH (#12)
- Add _resolve_loop docstring note about compaction/plan_execute_config (#13)
- Update module docstring to mention AutoLoopConfig/AutoLoopRule (#14)
- Simplify verbose log note string (#15)
- Add configurable default_loop_type to AutoLoopConfig (Gemini enhancement)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 20, 2026
Security hardening:
- Service catch-all now respects configured error policy (DENY/ESCALATE)
  instead of unconditionally returning rule verdict (#1)
- Default error policy changed from USE_RULE_VERDICT to ESCALATE (#30)
- XML-delimited prompt to resist prompt injection (#6)
- Configurable reason visibility (full/generic/category) to prevent
  adversarial feedback loop via LLM reason leakage (#7)
- Configurable argument truncation strategy (whole_string/per_value/
  keys_and_values) to prevent padding bypass (#28)
- Reason field sanitized for newlines/control chars (#16)
- MemoryError/RecursionError guard added to engine._safe_evaluate (#2)

Code quality:
- Enum values and mappings derived from source enums (no drift) (#9)
- MappingProxyType for read-only lookup dicts (#10)
- _call_llm and _select_provider refactored under 50-line limit (#11)
- Warning log when _select_model falls back to provider name (#13)
- Debug log on JSON serialization fallback (#14)
- error_type/error_message in structured exception logging (#15)
- USE_RULE_VERDICT annotates reason with failure context (#8)
- Em-dashes replaced with ASCII dashes (#5)
- Module/method docstrings corrected (#4, #12, #24, #25)
- LlmFallbackConfig/LlmFallbackErrorPolicy re-exported (#3)
- agent_visible_reason field on SecurityVerdict for invoker (#7)

Tests:
- Parametrized verdict parsing and error policy tests (#27)
- Timeout test uses asyncio.Future instead of real sleep (#23)
- Truncation assertion tightened from 10000 to 600 (#20)
- Fixed mismatched matched_rules in integration helper (#21)
- Added: DENY+LOW confidence safety net test (#18)
- Added: MemoryError propagation test
- Added: reason sanitization test
- Added: USE_RULE_VERDICT annotation test
- Added: reason visibility config test
- Added: per-value truncation test
- Added: XML delimiter test
- Added: audit entry confidence assertion (#26)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 21, 2026
… Gemini

- Fix step order: tame third-party loggers before applying per-logger
  levels so user config overrides take precedence (#1)
- Replace contextlib.suppress(Exception) with try/except + stderr
  warning to match _clear_root_handlers error handling pattern (#2)
- Fix misleading docstring: clarify litellm attribute suppression is
  conditional while handler cleanup runs unconditionally (#3)
- Change PR body from Closes #72 to Related to #72 (already closed
  by PR #73) (#4)
- Add precondition to test_litellm_suppress_debug_info_enabled (#5)
- Add suppress_debug_info to CLAUDE.md observability description (#6)
- Add test for handler.close() failure warning to stderr (#7)
- Add test for removing multiple handlers from single logger (#8)
- Add autouse fixture to reset third-party logger state between
  tests (#9)
- Strengthen test_skips_litellm_when_not_imported to verify handler
  and level cleanup still works (#10)
- Add test for non-LiteLLM handler removal (httpx) (#11)
- Update docs/design/operations.md step number and override note

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 29, 2026
…mini

Security: sanitize CSS keyframe names from dynamic edge IDs (#2), replace
hardcoded rgba() with design token (#1). Correctness: guard rollback against
stale overwrites (#6), filter orphan edges in force view (#4), use
collision-safe edge IDs (#3), restrict drag to agent nodes only (#5), fix
pagination offset to use data.length (#8), clamp dashDuration to bounds (#10).
Robustness: add AbortController for fetch cancellation (#9), move announce()
out of state updater (#7), add .catch() on fetchCompanyData (#29), log dept
health errors (#28). Docs: fix WS channel table (#11-12), classify PATCH
endpoint correctly (#13), update ux-guidelines version (#14), update
operations.md summary (#15). Tests: fix misleading test description (#16),
add stale rollback regression test (#24). Polish: extract magic numbers (#21-
22), add JSDoc on CommunicationEdgeData fields (#18-19), simplify Pick<>
type (#20), use Tailwind in stories (#17), add console.warn to empty catch
blocks (#25).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Mar 29, 2026
…mini

Security: sanitize CSS keyframe names from dynamic edge IDs (#2), replace
hardcoded rgba() with design token (#1). Correctness: guard rollback against
stale overwrites (#6), filter orphan edges in force view (#4), use
collision-safe edge IDs (#3), restrict drag to agent nodes only (#5), fix
pagination offset to use data.length (#8), clamp dashDuration to bounds (#10).
Robustness: add AbortController for fetch cancellation (#9), move announce()
out of state updater (#7), add .catch() on fetchCompanyData (#29), log dept
health errors (#28). Docs: fix WS channel table (#11-12), classify PATCH
endpoint correctly (#13), update ux-guidelines version (#14), update
operations.md summary (#15). Tests: fix misleading test description (#16),
add stale rollback regression test (#24). Polish: extract magic numbers (#21-
22), add JSDoc on CommunicationEdgeData fields (#18-19), simplify Pick<>
type (#20), use Tailwind in stories (#17), add console.warn to empty catch
blocks (#25).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aureliolo added a commit that referenced this pull request Apr 2, 2026
## Summary

Populates the comparison page data file (`data/competitors.yaml`) with
53 competitor entries across 7 categories, replacing the original 5
proof-of-concept entries. Updates the FAQ component with more honest,
balanced content.

### Data population
- **54 total entries** (53 competitors + SynthOrg) across all 7
categories:
- Multi-Agent Frameworks (20): CrewAI, AutoGen, LangGraph, OpenAI Agents
SDK, Semantic Kernel, Google ADK, PydanticAI, CAMEL, smolagents, AG2,
Agno, Letta, Agency Swarm, Mastra, Langroid, Strands Agents, BeeAI
Framework, Haystack, SuperAGI, Inngest AgentKit
  - Virtual Org Simulators (3): ChatDev, MetaGPT, GPT Pilot
- Workflow Engines (10): Temporal, n8n, Dify, Restack, Flyte, Flowise,
Langflow, ControlFlow, Motia, Julep
- Commercial Platforms (6): LangSmith, Amazon Bedrock Agents, Vertex AI
Agent Builder, Agentforce, Relevance AI
- Developer Tools (9): LlamaIndex Workflows, Composio, E2B, OpenHands,
Rivet, Vercel AI SDK, Llama Stack, Atomic Agents
- Research (3): Generative Agents (Stanford), AgentVerse, BabyAGI,
AutoGPT
  - Protocols (3): MCP, A2A, Agent Protocol
- Each entry has all **14 dimensions** evaluated
(full/partial/planned/none) with sourced notes
- Every competitor was **web-searched individually** for current
capabilities (docs, GitHub, release notes)

### New metadata fields
- **`pricing`**: free | free-restrictive | depends | open-core | paid
- **`self_hosted`**: true | false | partial

### Honest SynthOrg self-assessment
- `production_ready: partial` with note: "Docker + CI/CD + cosign + SLSA
provenance; pre-alpha, API unstable, no production deployments yet"
- `memory` note updated: "Mem0 backend (sole implementation); pluggable
architecture, 5 memory types"
- Same evaluation rubric applied to SynthOrg as every competitor

### FAQ updates (`site/src/components/ComparisonFAQ.astro`)
- Softened FAQ #1 (CrewAI/AutoGen) to acknowledge their maturity and
communities
- Removed generic FAQ #6 ("How do I evaluate?")
- Added 4 new honest FAQs:
- Commercial platforms comparison (Bedrock, Vertex, AgentForce
trade-offs)
- BSL-1.1 license explanation (what you can/can't do, 2030 Apache
conversion)
  - Data accuracy methodology (invite corrections via GitHub issues)
  - Why SynthOrg shows "partial" for production readiness

### Research methodology
- 59 parallel research agents (1 per competitor + 3 discovery + 3
pricing/hosting)
- Evaluation rubric: same bar for SynthOrg and every competitor
- Code over marketing: evaluated what's shipped, not landing page claims
- Commercial add-ons rated as "partial" for the open-source project

## Test plan
- `uv run python scripts/generate_comparison.py` produces clean Markdown
with zero warnings
- All pre-commit hooks pass (YAML validation, gitleaks, em-dash check)
- Landing page table renders correctly with full dataset (54 entries, 7
categories)

## Review coverage
- Auto-detected as non-substantive code changes (YAML data + Astro
static component)
- All pre-commit and pre-push hooks passed

Closes #993
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants