Add design specification, license, and project setup#2
Conversation
Comprehensive design spec covering: - Agent identity system with roles, personalities, skills, seniority - Company structure templates (solo founder to full enterprise) - Communication architecture (event-driven bus, hierarchical, meetings, hybrid) - Task/workflow engine with multiple execution patterns - Memory system (working, episodic, semantic, procedural, social) - Multi-provider LLM layer via LiteLLM (Claude, OpenRouter, Ollama) - Budget management with CFO agent and cost optimization - Tool/capability system with progressive trust - Security operations agent and configurable autonomy levels - API-first architecture (FastAPI + WebSocket) with web UI - Template system and company builder - Research analysis of MetaGPT, ChatDev, CrewAI, AutoGen, LangGraph - Build-from-scratch recommendation leveraging LiteLLM, Mem0, MCP, A2A - Open questions, risks, backlog, and scaling path
Business Source License 1.1 with parameters: - Non-commercial use only (personal, education, research, evaluation) - Commercial/production use requires separate license - Converts to Apache 2.0 after 4 years per version
- Fix LICENSE Change Date to explicit "February 27, 2030" - Extract HR into its own section (Section 8), renumber all sections - Reconcile budget hierarchy to include QA department - Correct A2A/MCP governance attribution to Linux Foundation - Keep SecOps agent active in locked autonomy mode for audit logging - Consolidate terminology on "Agent" (remove "Employee" as separate concept) - Add communication loop prevention mechanism (Section 5.5) - Mark model pricing and provider configs as illustrative examples - Soften specific technology commitments to "candidates - TBD after research" - Add language spec to workflow diagram code blocks - Add aurelio-review-pr skill for Claude Code
|
Caution Review failedThe pull request is closed. ℹ️ Recent review infoConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds foundational documentation and a PR review workflow: a 10-phase PR review pipeline spec, a comprehensive design specification for multi-agent orchestration, a Business Source License 1.1, and an expanded README. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational elements for the "AI Company" project by introducing its core design principles, licensing, and initial development tooling. It provides a detailed high-level design specification outlining the architecture and components of the AI agent orchestration framework, defines the project's open-source license, updates the main project README with key information, and adds a sophisticated PR review automation skill. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces foundational documents for the project, including a detailed design specification for an AI Company framework, a BSL 1.1 license, and an updated project README. While I've noted minor inconsistencies within the design specification and the aurelio-review-pr skill definition, a critical concern is that the aurelio-review-pr skill contains instructions that could lead to security vulnerabilities. Specifically, it lacks explicit validation for untrusted input extracted from PR bodies, potentially leading to command injection, and does not include safeguards against prompt injection when passing GitHub issue context to review agents. Addressing these inconsistencies and, more importantly, refining the skill definition to mitigate these security risks is crucial for clarity and project security.
| **Fetch issue context.** If an issue reference was found (regardless of warnings), fetch the issue for review context. If the PR body used a full URL (`https://github.com/OWNER/REPO/issues/N`), extract both `OWNER/REPO` and `N` and pass `--repo OWNER/REPO` to query the correct repository: | ||
|
|
||
| ```bash | ||
| gh issue view N --repo OWNER/REPO --json title,body,labels,comments --jq '{title: .title, body: .body, labels: [.labels[].name], comments: [.comments[] | {author: .author.login, body: .body}]}' |
There was a problem hiding this comment.
The skill instructions direct the agent to extract OWNER/REPO and N from a URL provided in the PR body and use them in a shell command (gh issue view N --repo OWNER/REPO). Since the PR body is untrusted input, an attacker could provide a malicious URL (e.g., https://github.com/owner/repo;injection/issues/1) to perform command injection. The instructions should explicitly require the agent to validate that OWNER/REPO matches the expected repository format and that N is a numeric value before using them in any shell command.
| | **comment-analyzer** | Comments or docstrings changed | `pr-review-toolkit:comment-analyzer` | | ||
| | **type-design-analyzer** | Type annotations or classes added/modified | `pr-review-toolkit:type-design-analyzer` | | ||
|
|
||
| Each agent should receive the list of changed files and focus on reviewing them. **If issue context was collected in Phase 2, include the issue title, body, and key comments in each agent's prompt** so they can verify the PR addresses the issue's requirements. |
There was a problem hiding this comment.
The skill instructions specify that untrusted data from GitHub (issue title, body, and comments) should be included in the prompts for review agents. This data is not sanitized or wrapped in delimiters, making the agents vulnerable to prompt injection attacks. An attacker could use malicious issue comments to manipulate the behavior of the review agents. The instructions should be updated to require the use of clear delimiters (e.g., XML tags) and to explicitly instruct the sub-agents to treat this content as untrusted data.
| ```bash | ||
| gh api repos/OWNER/REPO/pulls/NUMBER/comments --paginate | ||
| ``` | ||
| Extract: author, file path, line number, body, subject_type. |
There was a problem hiding this comment.
In the list of fields to extract for inline review comments, subject_type is mentioned. However, according to the GitHub API documentation, this field is not present in the response for listing PR review comments (/repos/OWNER/REPO/pulls/NUMBER/comments). This field is associated with commit comments. To ensure the implementation is based on the correct API schema, I suggest removing subject_type from this list.
DESIGN_SPEC.md
Outdated
| ```yaml | ||
| memory: | ||
| level: "full" # none, session, project, full | ||
| backend: "sqlite" # sqlite, postgresql, mem0, file |
There was a problem hiding this comment.
There appears to be an inconsistency in how Mem0 is defined as part of the memory architecture.
- Here in Section 7.3,
mem0is listed as a value for thebackendconfiguration, suggesting it's a standalone storage option. - However, Section 15.2 describes the stack as
Mem0 + SQLiteand statesMem0is for semantic/episodic memory whileSQLiteis for structured data. This impliesMem0is a layer that works with a database, not as an alternative to it.
To avoid confusion during implementation, it would be helpful to clarify the relationship. Is Mem0 a memory system that can be configured with different storage backends (like SQLite), or is it a backend itself? The configuration schema should reflect this relationship clearly.
DESIGN_SPEC.md
Outdated
| - role_level: "c_suite" | ||
| preferred_model: "opus" | ||
| fallback: "sonnet" | ||
| - role_level: "senior" | ||
| preferred_model: "sonnet" | ||
| fallback: "haiku" | ||
| - role_level: "junior" | ||
| preferred_model: "haiku" | ||
| fallback: "local-small" |
There was a problem hiding this comment.
There's an inconsistency in the casing of agent seniority levels across the design document.
- Section 3.1 (
level) and 3.2 (Level) use capitalized values (e.g.,Senior,C-Suite). - The routing rules here in Section 9.4 use lowercase values (e.g.,
c_suite,senior,junior).
This could lead to bugs where routing rules don't match agent levels. It would be beneficial to standardize on a single casing convention (e.g., all lowercase) for these values throughout the specification.
There was a problem hiding this comment.
Pull request overview
This PR establishes the foundational documentation and legal framework for the AI Company project, a framework for orchestrating autonomous AI agents within a virtual company structure. The PR transitions the repository from a minimal placeholder to a fully specified design-phase project with clear licensing, technical vision, and development tooling.
Changes:
- Added comprehensive 18-section design specification covering agent system, company structure, communication, tasks, memory, HR, providers, budget, tools, security, human interaction, templates, and technical architecture
- Implemented Business Source License 1.1 with 4-year conversion to Apache 2.0
- Updated README with project concept, key features, tech stack, and documentation links
- Added Claude Code PR review skill for automated review workflow
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| DESIGN_SPEC.md | Comprehensive high-level design covering all major system components, with technology choices marked as candidates pending research |
| LICENSE | Business Source License 1.1 with non-production use grant, converting to Apache 2.0 on February 27, 2030 |
| README.md | Project overview with concept description, planned features, tech stack, and documentation links |
| .claude/skills/aurelio-review-pr/skill.md | PR review pipeline skill with 10-phase workflow for local agents, external feedback, triage, and fixes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/skills/aurelio-review-pr/skill.md:
- Around line 32-34: Several fenced code blocks (e.g., the block containing "gh
repo view --json nameWithOwner -q .nameWithOwner") lack surrounding blank lines
and nearby sentences lack terminal punctuation; update each affected fenced
block by inserting a blank line before and after the triple-backtick fence and
ensure the sentence immediately preceding or following each fence ends with
proper punctuation (period, question mark, or exclamation). Apply the same fixes
to the other similar fences called out (the blocks around the other listed
snippets) so all fenced code blocks have blank lines around them and all nearby
sentences end with punctuation for MD lint stability.
In `@DESIGN_SPEC.md`:
- Around line 76-97: The fenced block showing the company org tree (the
triple-backtick block starting with "Company ├── Departments[] ...") and
multiple other fenced blocks lack a language identifier and several headings
(e.g., "#### C-Suite / Executive") are missing surrounding blank lines; to fix,
add an appropriate fence language token (e.g., ```text, ```yaml, ```json, or
```bash) to every fenced code block such as the "Company" tree and the other
blocks listed, and ensure each heading has a blank line above and below it (for
example insert a blank line before "#### C-Suite / Executive" and a blank line
after the heading), applying this pattern across the noted ranges (lines with
MD040/MD022 warnings) so markdownlint MD040 and MD022 are resolved.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
.claude/skills/aurelio-review-pr/skill.mdDESIGN_SPEC.mdLICENSEREADME.md
📜 Review details
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to README.md : Update README.md for significant feature changes
Applied to files:
README.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Never defer work—do not suggest "this can be done later" or "consider for a future PR". Complete all requested changes fully.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Never defer work. Do not suggest 'this can be done later' or 'consider for a future PR'. Complete all requested changes fully.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: After every push, you MUST check that CI passes. If CI fails, fix the issue immediately and push again until all checks are green. Never walk away from a failing CI pipeline.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Always create a PR for issue work. When implementing changes for a GitHub issue, create a branch and open a pull request. Do not wait to be asked.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
🪛 LanguageTool
DESIGN_SPEC.md
[typographical] ~1-~1: To join two clauses or introduce examples, consider using an em dash.
Context: # AI Company - High-Level Design Specification > A fra...
(DASH_RULE)
[typographical] ~193-~193: To join two clauses or introduce examples, consider using an em dash.
Context: ...### Engineering - Software Architect - System design, technology decisions, pat...
(DASH_RULE)
[typographical] ~194-~194: To join two clauses or introduce examples, consider using an em dash.
Context: ...Frontend Developer** (Junior/Mid/Senior) - UI implementation, components, state man...
(DASH_RULE)
[typographical] ~195-~195: To join two clauses or introduce examples, consider using an em dash.
Context: ...Backend Developer* (Junior/Mid/Senior) - APIs, business logic, databases - **Full...
(DASH_RULE)
[typographical] ~196-~196: To join two clauses or introduce examples, consider using an em dash.
Context: ...ll-Stack Developer** (Junior/Mid/Senior) - End-to-end implementation - **DevOps/SRE...
(DASH_RULE)
[typographical] ~197-~197: To join two clauses or introduce examples, consider using an em dash.
Context: ...implementation - DevOps/SRE Engineer - Infrastructure, CI/CD, monitoring, deplo...
(DASH_RULE)
[typographical] ~198-~198: To join two clauses or introduce examples, consider using an em dash.
Context: ...ring, deployment - Database Engineer - Schema design, query optimization, migra...
(DASH_RULE)
[typographical] ~199-~199: To join two clauses or introduce examples, consider using an em dash.
Context: ...tion, migrations - Security Engineer - Security audits, vulnerability assessmen...
(DASH_RULE)
[style] ~383-~383: Consider using a different adjective to strengthen your wording.
Context: ...#### Pattern 4: Hybrid (Recommended for Full Company) Combines all three: - **Messa...
(FULL_ENTIRE)
[style] ~654-~654: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...onality would complement the team? - What model/provider fits the budget? 3. Cand...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~664-~664: You have used the passive voice repeatedly in nearby sentences. To make your writing clearer and easier to read, consider using active voice.
Context: ... 3. Active tasks are reassigned 4. Team is notified ### 8.3 Performance Tracking ```yaml ...
(REP_PASSIVE_VOICE)
[grammar] ~682-~682: Use a hyphen to join words.
Context: ...ce: - Promotion criteria: sustained high quality scores, task complexity handled,...
(QB_NEW_EN_HYPHEN)
[typographical] ~1377-~1377: In American English, use a period after an abbreviation.
Context: ...pid prototyping | ### 16.2 What Exists vs What We Need | Feature | MetaGPT | Cha...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1381-~1381: To join two clauses or introduce examples, consider using an em dash.
Context: ...ulation | Partial | Partial | No | Yes - complete | | HR (hiring/firing) | No |...
(DASH_RULE)
[typographical] ~1385-~1385: To join two clauses or introduce examples, consider using an em dash.
Context: ...alities | Basic | Basic | Basic | Deep - traits, styles, evolution | | Dynamic ...
(DASH_RULE)
[typographical] ~1386-~1386: To join two clauses or introduce examples, consider using an em dash.
Context: ... team scaling | No | No | Manual | Yes - auto + manual | | Multiple company typ...
(DASH_RULE)
[typographical] ~1387-~1387: To join two clauses or introduce examples, consider using an em dash.
Context: ...company types | No | No | Manual | Yes - templates + builder | | Security ops a...
(DASH_RULE)
[typographical] ~1389-~1389: To join two clauses or introduce examples, consider using an em dash.
Context: ...ble autonomy | No | No | Limited | Yes - full spectrum | | Local + cloud provid...
(DASH_RULE)
[typographical] ~1390-~1390: To join two clauses or introduce examples, consider using an em dash.
Context: ...rs | Partial | Partial | Partial | **Yes - unified abstraction (LiteLLM candidate)*...
(DASH_RULE)
[typographical] ~1391-~1391: To join two clauses or introduce examples, consider using an em dash.
Context: ...racking per agent | No | No | No | Yes - full budget system | | Progressive tru...
(DASH_RULE)
[typographical] ~1398-~1398: In American English, use a period after an abbreviation.
Context: ...Planned (backlog) | ### 16.3 Build vs Fork Decision **Recommendation: Build ...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[style] ~1432-~1432: Consider using the typographical ellipsis character here instead.
Context: ... | Agent A asks Agent B who asks Agent A... | | 8 | Optimal message bus for local-f...
(ELLIPSIS)
[typographical] ~1433-~1433: In American English, use a period after an abbreviation.
Context: ...architecture? | Medium | asyncio queues vs Redis vs embedded broker | | 9 | How to...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1433-~1433: In American English, use a period after an abbreviation.
Context: ...ure? | Medium | asyncio queues vs Redis vs embedded broker | | 9 | How to handle c...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1434-~1434: In American English, use a period after an abbreviation.
Context: ...y? | High | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the ...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[typographical] ~1434-~1434: In American English, use a period after an abbreviation.
Context: ...h | Sandboxing strategy, Docker vs WASM vs subprocess | | 10 | What's the minimum ...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[style] ~1509-~1509: To make your text as clear as possible to all readers, do not use this foreign term. Possible alternatives are ‘in fact’ or ‘in reality’.
Context: ...rop | | OpenAI API format | OpenAI (de facto standard) | LLM API interface | Via pro...
(DE_FACTO)
[style] ~1509-~1509: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “API”.
Context: ...at** | OpenAI (de facto standard) | LLM API interface | Via provider abstraction layer (LiteL...
(ACRONYM_TAUTOLOGY)
[typographical] ~1513-~1513: To join two clauses or introduce examples, consider using an em dash.
Context: ...s://github.com/FoundationAgents/MetaGPT) - Multi-agent SOP framework (64.5k stars) ...
(DASH_RULE)
[typographical] ~1514-~1514: To join two clauses or introduce examples, consider using an em dash.
Context: ...2.0](https://github.com/openbmb/ChatDev) - Zero-code multi-agent platform (31.2k st...
(DASH_RULE)
[typographical] ~1515-~1515: To join two clauses or introduce examples, consider using an em dash.
Context: ...AI](https://github.com/crewAIInc/crewAI) - Role-based agent collaboration framework...
(DASH_RULE)
[typographical] ~1516-~1516: To join two clauses or introduce examples, consider using an em dash.
Context: ...n](https://github.com/microsoft/autogen) - Microsoft async multi-agent framework - ...
(DASH_RULE)
[typographical] ~1517-~1517: To join two clauses or introduce examples, consider using an em dash.
Context: ...LLM](https://github.com/BerriAI/litellm) - Unified LLM API gateway (100+ providers)...
(DASH_RULE)
[typographical] ~1518-~1518: To join two clauses or introduce examples, consider using an em dash.
Context: ...- Mem0 - Universal memory layer for AI agents - [...
(DASH_RULE)
[typographical] ~1519-~1519: To join two clauses or introduce examples, consider using an em dash.
Context: ...ocol](https://github.com/a2aproject/A2A) - Agent-to-Agent protocol (Linux Foundatio...
(DASH_RULE)
[typographical] ~1520-~1520: To join two clauses or introduce examples, consider using an em dash.
Context: ...extprotocol.io/specification/2025-11-25) - Model Context Protocol - [Langfuse Agent...
(DASH_RULE)
[typographical] ~1521-~1521: To join two clauses or introduce examples, consider using an em dash.
Context: ...com/blog/2025-03-19-ai-agent-comparison) - Framework comparison - [Confluent Event-...
(DASH_RULE)
[typographical] ~1522-~1522: To join two clauses or introduce examples, consider using an em dash.
Context: .../blog/event-driven-multi-agent-systems/) - Multi-agent architecture patterns - [Mic...
(DASH_RULE)
[typographical] ~1523-~1523: To join two clauses or introduce examples, consider using an em dash.
Context: ....io/multi-agent-reference-architecture/) - Enterprise patterns - [OpenRouter](https...
(DASH_RULE)
[typographical] ~1524-~1524: To join two clauses or introduce examples, consider using an em dash.
Context: ...s - OpenRouter - Multi-model API gateway
(DASH_RULE)
README.md
[typographical] ~11-~11: To join two clauses or introduce examples, consider using an em dash.
Context: ...s (Planned) - Any Company Structure - From a 2-person startup to a 50+ enterpr...
(DASH_RULE)
[typographical] ~12-~12: To join two clauses or introduce examples, consider using an em dash.
Context: ...nfig/templates - Deep Agent Identity - Names, personalities, skills, seniority ...
(DASH_RULE)
[typographical] ~13-~13: To join two clauses or introduce examples, consider using an em dash.
Context: ...erformance tracking - Multi-Provider - Anthropic Claude, OpenRouter (400+ model...
(DASH_RULE)
[typographical] ~14-~14: To join two clauses or introduce examples, consider using an em dash.
Context: ... via LiteLLM - Smart Cost Management - Per-agent budget tracking, auto model ro...
(DASH_RULE)
[typographical] ~15-~15: To join two clauses or introduce examples, consider using an em dash.
Context: ...optimization - Configurable Autonomy - From fully autonomous to human-approves-...
(DASH_RULE)
[typographical] ~16-~16: To join two clauses or introduce examples, consider using an em dash.
Context: ...agent in between - Persistent Memory - Agents remember past decisions, code, re...
(DASH_RULE)
[typographical] ~17-~17: To join two clauses or introduce examples, consider using an em dash.
Context: ...relationships (via Mem0) - HR System - Hire, fire, promote agents. HR agent ana...
(DASH_RULE)
[typographical] ~18-~18: To join two clauses or introduce examples, consider using an em dash.
Context: ...tem, git, code execution, web, databases - role-based and sandboxed - API-First...
(DASH_RULE)
[grammar] ~20-~20: Please add a punctuation mark at the end of paragraph.
Context: ...built company templates and interactive builder ## Status Design phase. See [DESI...
(PUNCTUATION_PARAGRAPH_END)
[typographical] ~37-~37: To join two clauses or introduce examples, consider using an em dash.
Context: ...- Design Specification - Full high-level design
(DASH_RULE)
.claude/skills/aurelio-review-pr/skill.md
[style] ~107-~107: The phrase ‘Look for patterns’ is used very frequently. Consider using a less frequent alternative to set your writing apart from others.
Context: ... lines are outside the PR's diff range. Look for patterns like "Outside diff range comments (N)" ...
(LOOK_FOR_STYLE)
[style] ~121-~121: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...tionable items, not just summaries). Important: Use gh api with --jq for filteri...
(EN_REPEATEDWORDS_IMPORTANT)
[style] ~123-~123: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...omplex Python scripts to parse JSON. Important: When review bodies are large (e.g. C...
(EN_REPEATEDWORDS_IMPORTANT)
[grammar] ~139-~139: Please add a punctuation mark at the end of paragraph.
Context: ...heck against CLAUDE.md rules and actual code Deduplication: If multiple sources...
(PUNCTUATION_PARAGRAPH_END)
[grammar] ~172-~172: Please add a punctuation mark at the end of paragraph.
Context: ... rely on pre-push hooks and CI for full coverage ## Phase 8: Commit and push After all...
(PUNCTUATION_PARAGRAPH_END)
[style] ~181-~181: Consider using a different verb for a more formal wording.
Context: .... If commit or push fails due to hooks, fix the actual issue and create a NEW commi...
(FIX_RESOLVE)
🪛 markdownlint-cli2 (0.21.0)
DESIGN_SPEC.md
[warning] 76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 178-178: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 185-185: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 192-192: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 201-201: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 207-207: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 212-212: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 218-218: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 256-256: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 332-332: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 352-352: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 365-365: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 492-492: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 602-602: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 693-693: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 947-947: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1020-1020: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1039-1039: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1133-1133: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1180-1180: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1238-1238: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1487-1487: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
.claude/skills/aurelio-review-pr/skill.md
[warning] 32-32: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 102-102: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 104-104: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 110-110: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 112-112: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 116-116: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 118-118: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🔇 Additional comments (2)
README.md (1)
1-37: README update aligns well with this PR scope.The document now matches the current project state and points readers to the design spec clearly.
Based on learnings: Applies to README.md : Update README.md for significant feature changes.
LICENSE (1)
6-25: BSL parameterization looks consistent and complete.Line 23 and Line 24 correctly define the conversion trigger and target license, and the use-grant language is explicit.
…d Copilot - Add input validation for OWNER/REPO and issue number extracted from PR bodies (command injection fix) - Add XML delimiters and untrusted-data instructions for issue context passed to sub-agents (prompt injection fix) - Remove incorrect subject_type field from PR comments extraction - Clarify Mem0 relationship: it's a memory layer on top of backends, not a backend itself - Standardize agent level casing in routing rules to match Section 3.1/3.2 conventions - Add language identifiers to all fenced code blocks (MD040) - Add blank lines around headings and fenced code blocks (MD022/MD031)
…eviewers Source changes (tracker.py): - Fix budget_used_percent/alert_level threshold disagreement (#1) - Add logger.warning before ValueError in _validate_time_range (#2) - Split build_summary into _build_agent_spendings, _build_dept_spendings, _build_budget_context helpers to meet <50 line guideline (#3) - Replace _aggregate tuple return with _AggregateResult NamedTuple (#4) - Rewrite _aggregate as single-pass loop (#5) - Aggregate departments from AgentSpending objects, not raw records (#6) - Add DEBUG log in __init__ for object creation (#7) - Fix module docstring "Section 10.2 service layer" accuracy (#15) - Add DEBUG entry logs for get_total_cost/get_agent_cost (#16) Event constants (events.py): - Add BUDGET_TRACKER_CREATED, BUDGET_TOTAL_COST_QUERIED, BUDGET_AGENT_COST_QUERIED, BUDGET_TIME_RANGE_INVALID Test improvements (test_tracker.py): - Assert budget_used_percent in all alert-level tests (#8) - Assert budget_total_monthly in configured summary test (#9) - Add start==end raises ValueError test (#10) - Add end-only time filter test for get_total_cost (#11) - Add multi-record token aggregation assertions (#12) - Add pytestmark = pytest.mark.timeout(30) (#13) - Move @pytest.mark.unit to class level (#14) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t, Gemini, and greptile - Add produced_artifacts field to AgentRunResult (#1) - Wrap _log_completion in try/except to preserve valid results (#2) - Add test for inner TimeoutError propagation without engine timeout (#3) - Extract _run_loop_with_timeout from _execute (50-line limit) (#4) - Extract _validate_run_inputs from run() (50-line limit) (#5) - Rename metrics docstrings from "completed task" to "agent run" + add termination_reason to metrics event (#6) - Fix raise exc from build_exc chain direction (#7) - Replace asyncio.wait_for with asyncio.wait for timeout disambiguation (#8) - Add test for _apply_post_execution_transitions failure resilience (#9) - Add test for timeout cost recording behavior (#10) - Fix hardcoded from_status in transition logs (#11) - Add agent-task ownership check in _validate_task (#12) - Split test_invalid_timeout_raises into two test methods (#13) - Add negative validation tests for cost_per_task/duration_seconds (#14) - Add test_blank_task_id_rejected (#15) - Update _execute docstring to mention timeout, transitions, metrics (#16) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, and Copilot Source fixes: - Move asyncio/types out of TYPE_CHECKING for PEP 649 compatibility (#1, #2) - Guard is_closing() inside try/except in _process_cleanup.py (#4) - Normalize all control chars (incl. newlines/tabs) in _sanitize_stderr (#6) - Apply stderr sanitization to sandbox git path too (#3) - Fix list_directory truncation to use raw scan cap, not post-classification count (#7) - Narrow except to InvalidStateError + log task identity in shutdown (#8) - Add loop.stop() fallback when request_shutdown() fails in signal handlers (#9) - Include zombie diagnostic in _drain_after_kill stderr output (#10) - Remove @staticmethod from _log_post_cancel_exceptions (#17) - Add _process_cleanup.py to DESIGN_SPEC §15.3 (#12) and §11.1.1 (#16) Test additions: - New test_process_cleanup.py: 7 tests covering all transport states (#5) - Add _sanitize_stderr truncation test (#14) - Add _log_post_cancel_exceptions tests (4 tests) (#11) - Add signal handler recovery tests (3 tests) (#15) - Use ValidationError instead of ValueError in config tests (#13) - Update existing tests for new sanitization behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add routing optimization feature (#1): new suggest_routing_optimizations() method, RoutingSuggestion and RoutingOptimizationAnalysis models - Add negative estimated_cost_usd validation (#2) - Fix double snapshot in generate_report (#3) - Fix deviation_factor to use spike_ratio when stddev=0 (#4) - Convert DowngradeAnalysis.total_estimated_savings_per_1k to @computed_field (#5) - Change str to NotBlankStr in SpendingReport tuple fields (#6) - Add window_count upper bound validation (#7) - Pre-group records by agent for O(N+M) complexity (#8) - Update DESIGN_SPEC.md implementation snapshot (#9) - Use projected alert level for auto-deny check (#11) - Move approval log after ApprovalDecision construction (#12) - Add ReportGenerator.__init__ debug log + event constant (#13) - Fix _ALERT_LEVEL_ORDER comment (#14) - Fix _classify_severity docstring for dual-use (#15) - Add WARNING logs before ValueError raises (#16) - Update evaluate_operation docstring (#17) - Add sort-order validator to EfficiencyAnalysis.agents (#18) - Add debug log when _find_most_used_model returns None (#19) - Remove redundant stddev > 0 check in is_sigma_anomaly (#20) - Document approval_warn_threshold_usd=0.0 behavior (#21) - Extract helpers to _optimizer_helpers.py to stay under 800-line limit
…reptile - Cap prompt_tokens instead of rejecting when heuristic exceeds actual (#2) - Log policy_length instead of policy content to avoid leaks (#6) - Sort _ACTION_VERBS for deterministic regex alternation (#8) - Use PROMPT_POLICY_VALIDATION_FAILED event for advisory failures (#12) - Add isinstance check and strip whitespace in TagBasedMemoryFilter (#13) - Use MEMORY_FILTER_INIT event for filter-init log paths (#14, #16) - Remove content_preview from store_guard warning log (#18) - Track tools section conditionally for custom templates (#10) - Reconcile enforced vs advisory wording in DESIGN_SPEC (#1) - Inject fixed estimated_tokens in prompt-ratio test (#19) - Parametrize timeout and org_policies tests (#20, #22) - Add prompt_token_ratio assertion in lifecycle test (#21)
Security fixes: - Mask sensitive values in validation error messages (#2) - Sanitize SettingsEncryptionError in API response (#3) - Don't cache decrypted plaintext secrets (#4) - Validate namespace against SettingNamespace enum at controller (#8) - Error on empty SYNTHORG_SETTINGS_KEY instead of silently ignoring (#9) Persistence fixes: - V9 migration: prefer settings_old as copy source when both exist (#1) - Remove redundant idx_settings_namespace index (PK covers it) (#15) - Fix updated_at format consistency (ISO 8601 with timezone) (#17) - Align get_setting/set_setting types with protocol (NotBlankStr) (#10) Performance: - Use batch repo methods in get_all/get_namespace (eliminate N+1) (#6) Code quality: - Fix Any type to SettingsService on create_app parameter (#5) - Add SETTINGS_NOTIFICATION_FAILED + SETTINGS_FETCH_FAILED events (#7,#16) - Fix wrong event constants in repo error paths and notification (#7,#16) - Remove duplicate SettingNotFoundError guard in get_entry (#21) - Log SETTINGS_VALUE_RESOLVED for all resolution sources (#22) Tests: - Add V9 migration crash-safety tests (#11) - Add tests for sensitive read without encryptor (#12) - Add tests for get_all/get_namespace service methods (#13) - Add tests for notification exception handling (#14) - Use pydantic.ValidationError instead of broad Exception (#18) - Add INTEGER, JSON, and sensitive-masking validation tests (#19) - Use actual Unicode characters in encryption roundtrip test (#20)
Security fixes: - Mask sensitive values in validation error messages (#2) - Sanitize SettingsEncryptionError in API response (#3) - Don't cache decrypted plaintext secrets (#4) - Validate namespace against SettingNamespace enum at controller (#8) - Error on empty SYNTHORG_SETTINGS_KEY instead of silently ignoring (#9) Persistence fixes: - V9 migration: prefer settings_old as copy source when both exist (#1) - Remove redundant idx_settings_namespace index (PK covers it) (#15) - Fix updated_at format consistency (ISO 8601 with timezone) (#17) - Align get_setting/set_setting types with protocol (NotBlankStr) (#10) Performance: - Use batch repo methods in get_all/get_namespace (eliminate N+1) (#6) Code quality: - Fix Any type to SettingsService on create_app parameter (#5) - Add SETTINGS_NOTIFICATION_FAILED + SETTINGS_FETCH_FAILED events (#7,#16) - Fix wrong event constants in repo error paths and notification (#7,#16) - Remove duplicate SettingNotFoundError guard in get_entry (#21) - Log SETTINGS_VALUE_RESOLVED for all resolution sources (#22) Tests: - Add V9 migration crash-safety tests (#11) - Add tests for sensitive read without encryptor (#12) - Add tests for get_all/get_namespace service methods (#13) - Add tests for notification exception handling (#14) - Use pydantic.ValidationError instead of broad Exception (#18) - Add INTEGER, JSON, and sensitive-masking validation tests (#19) - Use actual Unicode characters in encryption roundtrip test (#20)
… reviewers Source fixes: - Use NotBlankStr | None for DualModeConfig.summarization_model (#1) - Parallelize LLM calls with asyncio.TaskGroup in summarize_batch and _build_content (#2) - Remove dead-code guards in _build_anchors (#3) - Narrow except Exception to re-raise non-retryable ProviderErrors (#4) - Fix double-logging on abstractive fallback (#9) - Remove unnecessary import builtins (#10) - Preserve key-value pairs verbatim in extractive mode (#5) - Emit extracted facts one per line (#6) - Strengthen ConsolidationResult validator with cross-field checks (#7) - Check _backend.delete() return value in _process_group (#8) - Fix mode_map type to dict[NotBlankStr, ArchivalMode] (#11) - Move tie-breaking comment to _determine_group_mode (#12) - Fix misleading DualModeConfig docstring (#13) - Add missing mkdocstrings entries for retention/archival/simple_strategy (#14) - Use O(M) lookup dict in _archive_entries (#15) - Document 1000-entry query limit in run_consolidation docstring (#16) - Add Raises section to AbstractiveSummarizer docstring (#17) Test fixes: - Fix imports in test_density.py to module level (#18) - Strengthen fallback assertion to verify exact content (#19) - Use exact call counts for summarizer/extractor (#20) - Add tests: blank model rejection, MemoryError/RecursionError propagation (#21, #22) - Add tests: validator rejects invalid archival state (#23) - Add tests: 50/50 tie-breaking, None relevance handling (#24, #25) - Assert actual preserved facts in extractive tests (#26) - Prove archival index keyed by original_id not position (#27) - Add test: empty string classifies as SPARSE (#28)
Security hardening: - Service catch-all now respects configured error policy (DENY/ESCALATE) instead of unconditionally returning rule verdict (#1) - Default error policy changed from USE_RULE_VERDICT to ESCALATE (#30) - XML-delimited prompt to resist prompt injection (#6) - Configurable reason visibility (full/generic/category) to prevent adversarial feedback loop via LLM reason leakage (#7) - Configurable argument truncation strategy (whole_string/per_value/ keys_and_values) to prevent padding bypass (#28) - Reason field sanitized for newlines/control chars (#16) - MemoryError/RecursionError guard added to engine._safe_evaluate (#2) Code quality: - Enum values and mappings derived from source enums (no drift) (#9) - MappingProxyType for read-only lookup dicts (#10) - _call_llm and _select_provider refactored under 50-line limit (#11) - Warning log when _select_model falls back to provider name (#13) - Debug log on JSON serialization fallback (#14) - error_type/error_message in structured exception logging (#15) - USE_RULE_VERDICT annotates reason with failure context (#8) - Em-dashes replaced with ASCII dashes (#5) - Module/method docstrings corrected (#4, #12, #24, #25) - LlmFallbackConfig/LlmFallbackErrorPolicy re-exported (#3) - agent_visible_reason field on SecurityVerdict for invoker (#7) Tests: - Parametrized verdict parsing and error policy tests (#27) - Timeout test uses asyncio.Future instead of real sleep (#23) - Truncation assertion tightened from 10000 to 600 (#20) - Fixed mismatched matched_rules in integration helper (#21) - Added: DENY+LOW confidence safety net test (#18) - Added: MemoryError propagation test - Added: reason sanitization test - Added: USE_RULE_VERDICT annotation test - Added: reason visibility config test - Added: per-value truncation test - Added: XML delimiter test - Added: audit entry confidence assertion (#26) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… Gemini - Fix step order: tame third-party loggers before applying per-logger levels so user config overrides take precedence (#1) - Replace contextlib.suppress(Exception) with try/except + stderr warning to match _clear_root_handlers error handling pattern (#2) - Fix misleading docstring: clarify litellm attribute suppression is conditional while handler cleanup runs unconditionally (#3) - Change PR body from Closes #72 to Related to #72 (already closed by PR #73) (#4) - Add precondition to test_litellm_suppress_debug_info_enabled (#5) - Add suppress_debug_info to CLAUDE.md observability description (#6) - Add test for handler.close() failure warning to stderr (#7) - Add test for removing multiple handlers from single logger (#8) - Add autouse fixture to reset third-party logger state between tests (#9) - Strengthen test_skips_litellm_when_not_imported to verify handler and level cleanup still works (#10) - Add test for non-LiteLLM handler removal (httpx) (#11) - Update docs/design/operations.md step number and override note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mini Security: sanitize CSS keyframe names from dynamic edge IDs (#2), replace hardcoded rgba() with design token (#1). Correctness: guard rollback against stale overwrites (#6), filter orphan edges in force view (#4), use collision-safe edge IDs (#3), restrict drag to agent nodes only (#5), fix pagination offset to use data.length (#8), clamp dashDuration to bounds (#10). Robustness: add AbortController for fetch cancellation (#9), move announce() out of state updater (#7), add .catch() on fetchCompanyData (#29), log dept health errors (#28). Docs: fix WS channel table (#11-12), classify PATCH endpoint correctly (#13), update ux-guidelines version (#14), update operations.md summary (#15). Tests: fix misleading test description (#16), add stale rollback regression test (#24). Polish: extract magic numbers (#21- 22), add JSDoc on CommunicationEdgeData fields (#18-19), simplify Pick<> type (#20), use Tailwind in stories (#17), add console.warn to empty catch blocks (#25). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mini Security: sanitize CSS keyframe names from dynamic edge IDs (#2), replace hardcoded rgba() with design token (#1). Correctness: guard rollback against stale overwrites (#6), filter orphan edges in force view (#4), use collision-safe edge IDs (#3), restrict drag to agent nodes only (#5), fix pagination offset to use data.length (#8), clamp dashDuration to bounds (#10). Robustness: add AbortController for fetch cancellation (#9), move announce() out of state updater (#7), add .catch() on fetchCompanyData (#29), log dept health errors (#28). Docs: fix WS channel table (#11-12), classify PATCH endpoint correctly (#13), update ux-guidelines version (#14), update operations.md summary (#15). Tests: fix misleading test description (#16), add stale rollback regression test (#24). Polish: extract magic numbers (#21- 22), add JSDoc on CommunicationEdgeData fields (#18-19), simplify Pick<> type (#20), use Tailwind in stories (#17), add console.warn to empty catch blocks (#25). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove docs version banner (user request) - Fix docs sidebar rectangles -- transparent background (user request) - Add competitor entry validation in generate_comparison.py (#2) - Fix compare.astro: import.meta.url path, type imports, validation (#3) - Export TypeScript interfaces from ComparisonTable.tsx (#4) - Update CLAUDE.md CI section with comparison generation step (#5) - Add keyboard accessibility + aria-sort to sort headers (#6) - Add role=img + aria-label to SupportIcon (#7) - Add license and feature-set filter dropdowns (#8) - Escape JSON-LD output to prevent XSS (#9) - Add aria-pressed to category filter buttons (#10) - Warn on unknown dimension/category/support keys (#11, #12, #15) - Remove unused meta.version from competitors.yaml (#13) - Fix reCAPTCHA: IntersectionObserver fallback + onerror handler (#14) - Clarify YAML comments (#16, #17) - Add 28 Python tests for generate_comparison.py (#18) - Add 13 React tests for ComparisonTable.tsx (#18) - Set up Vitest test infrastructure for site/
- Remove docs version banner (user request) - Fix docs sidebar rectangles -- transparent background (user request) - Add competitor entry validation in generate_comparison.py (#2) - Fix compare.astro: import.meta.url path, type imports, validation (#3) - Export TypeScript interfaces from ComparisonTable.tsx (#4) - Update CLAUDE.md CI section with comparison generation step (#5) - Add keyboard accessibility + aria-sort to sort headers (#6) - Add role=img + aria-label to SupportIcon (#7) - Add license and feature-set filter dropdowns (#8) - Escape JSON-LD output to prevent XSS (#9) - Add aria-pressed to category filter buttons (#10) - Warn on unknown dimension/category/support keys (#11, #12, #15) - Remove unused meta.version from competitors.yaml (#13) - Fix reCAPTCHA: IntersectionObserver fallback + onerror handler (#14) - Clarify YAML comments (#16, #17) - Add 28 Python tests for generate_comparison.py (#18) - Add 13 React tests for ComparisonTable.tsx (#18) - Set up Vitest test infrastructure for site/
Summary
DESIGN_SPEC.md) — comprehensive 18-section high-level design for the AI Company framework covering agent system, company structure, communication architecture, task engine, memory, HR, model providers, budget management, tools, security, human interaction, templates, technical architecture, research/prior art, open questions, and backlogaurelio-review-prPR review pipeline for future useAll technology choices in the design spec are marked as candidates/TBD pending further research. No final decisions on frameworks, libraries, or specific models.
Test plan
🤖 Generated with Claude Code