Skip to content

docs: add crash recovery, sandboxing, analytics, and testing decisions#127

Merged
Aureliolo merged 2 commits intomainfrom
docs/design-spec-review-decisions
Mar 6, 2026
Merged

docs: add crash recovery, sandboxing, analytics, and testing decisions#127
Aureliolo merged 2 commits intomainfrom
docs/design-spec-review-decisions

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Addresses open questions raised by three external reviews of the design spec. Adds four new sections and updates existing tables with resolved decisions.

  • §6.6 Agent Crash Recovery — pluggable RecoveryStrategy protocol: fail-and-reassign (M3 MVP), checkpoint recovery (M4/M5), with environment reconciliation on resume
  • §10.5 LLM Call Analytics — incremental build: proxy overhead metrics (M3) → call categorization + orchestration ratio (M4) → full analytics with retry/latency/cache tracking (M5+)
  • §11.1.2 Tool Sandboxing — layered SandboxBackend protocol: SubprocessSandbox for low-risk tools (file, git), DockerSandbox for code execution/terminal, K8sSandbox for future container deployments
  • §15.3 updated project tree (tools/sandbox/ directory with protocol, subprocess, docker)
  • §15.4 new design decision row for sandboxing
  • §15.5 five new convention rows: sandboxing, crash recovery, agent behavior testing, LLM call analytics
  • §17.1 resolved question Implement retry logic, rate limiting, and provider error handling #9 (sandboxing), added Design and implement basic tool system (registry, invocation, results) #15-17 as resolved (crash recovery, testing strategy, overhead tracking)
  • §17.3 updated crash risk mitigation, added orchestration overhead risk

Test plan

  • Verify all new sections have correct markdown formatting and render properly
  • Verify cross-references (§ links) point to correct sections
  • Verify §15.3 project tree matches the intended sandbox directory structure
  • Verify §17.1 open questions table has correct resolved/open status

🤖 Generated with Claude Code

…s to design spec

Address open questions raised by three external reviews of the design spec:

- §6.6: Agent Crash Recovery — pluggable RecoveryStrategy protocol with
  fail-and-reassign (M3 MVP) and checkpoint recovery (M4/M5)
- §10.5: LLM Call Analytics — incremental tracking: proxy overhead metrics
  (M3), call categorization with orchestration ratio (M4), full analytics
  layer with retry/latency/cache tracking (M5+)
- §11.1.2: Tool Sandboxing — layered SandboxBackend protocol with
  SubprocessSandbox (low-risk tools), DockerSandbox (code execution),
  and K8sSandbox (future container deployments)
- §15.3: Updated project tree (tools/sandbox/ directory)
- §15.4: Added sandboxing to key design decisions table
- §15.5: Five new convention rows (sandboxing, crash recovery, agent
  behavior testing, LLM call analytics, tool sandboxing)
- §17.1: Resolved questions #9 (sandboxing), added #15-17 as resolved
- §17.3: Updated crash risk mitigation, added orchestration overhead risk

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 10:57
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the design specification by formalizing critical architectural decisions related to agent resilience, security, and observability. It introduces detailed strategies for handling agent crashes, implementing layered tool sandboxing, and establishing an incremental LLM call analytics framework. These updates address key concerns raised during external reviews, providing clear paths for development across upcoming milestones and improving the overall robustness and manageability of the agent system.

Highlights

  • Agent Crash Recovery: Introduced a pluggable RecoveryStrategy protocol, detailing an M3 MVP 'fail-and-reassign' strategy and planned M4/M5 'checkpoint recovery' with environment reconciliation.
  • LLM Call Analytics: Defined an incremental analytics build, starting with M3 proxy overhead metrics, moving to M4 call categorization and orchestration ratio, and M5+ full analytics including retry/latency/cache tracking.
  • Tool Sandboxing: Established a layered SandboxBackend protocol, detailing SubprocessSandbox for low-risk tools, DockerSandbox for high-risk tools (code execution/terminal), and K8sSandbox for future deployments.
  • Project Structure Update: Modified the project tree to include a tools/sandbox/ directory with protocol.py, subprocess.py, and docker.py.
  • Design Decisions & Conventions: Added a new design decision row for sandboxing and five new convention rows covering sandboxing, crash recovery, agent behavior testing, and LLM call analytics.
  • Resolved Questions & Risks: Updated the open questions table, resolving question Implement retry logic, rate limiting, and provider error handling #9 (sandboxing) and adding Design and implement basic tool system (registry, invocation, results) #15-17 (crash recovery, testing strategy, overhead tracking) as resolved. Also updated crash risk mitigation and added orchestration overhead risk.
Changelog
Activity
  • No human activity has occurred on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the design specification by adding detailed sections on agent crash recovery, LLM call analytics, tool sandboxing, and agent behavior testing. The new content is well-structured, clear, and resolves several previously open questions. The updates to the project structure, design decisions, and risk tables are consistent with these additions. I have one suggestion to improve the security posture of the proposed tool sandboxing design.

Note: Security Review has been skipped due to the limited scope of the PR.

DESIGN_SPEC.md Outdated
overrides: # per-category backend overrides
file_system: "subprocess" # low risk — fast, no deps
git: "subprocess" # low risk — workspace-scoped
web: "subprocess" # medium risk — timeout + allowlist
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the sandboxing.overrides configuration, assigning the web tool category to the subprocess backend might introduce security risks. A subprocess has access to the host's network stack, which could allow it to connect to internal services on localhost or the local network, even if its filesystem access is restricted.

The comment mentions a 'timeout + allowlist', but the subprocess configuration doesn't show how this network allowlist would be implemented or enforced. For better security and isolation, consider defaulting the web category to the docker backend. The docker sandbox is configured with network: "none" by default, providing strong network isolation. If network access is needed for specific web tools, a dedicated Docker network with an egress-only policy could be used.

Suggested change
web: "subprocess" # medium risk — timeout + allowlist
web: "docker" # medium risk — requires network isolation

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the design specification to resolve external review questions by documenting crash recovery, tool sandboxing, LLM call analytics, and testing conventions, plus reflecting those decisions in the architecture tables.

Changes:

  • Add new spec sections: §6.6 Agent Crash Recovery, §10.5 LLM Call Analytics, §11.1.2 Tool Sandboxing.
  • Update §15 architecture tables/project tree to include sandboxing decisions and planned directory structure.
  • Mark previously open questions as resolved in §17.1 and add related risk mitigations in §17.3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

DESIGN_SPEC.md Outdated
Comment on lines +855 to +868
The engine catches the failure at its outermost boundary, logs the error with the full `AgentContext` snapshot for debugging, transitions the task to `FAILED`, and makes it available for reassignment (manual or automatic via the task router).

```yaml
crash_recovery:
strategy: "fail_reassign" # fail_reassign, checkpoint
```

- Simple, no persistence dependency, M3-ready
- All progress is lost on crash — acceptable for short single-agent tasks in the MVP

On crash:
1. Catch exception at the engine boundary (outermost `try/except` in the execution loop)
2. Log at ERROR with full `AgentContext` snapshot (conversation, turn count, accumulated cost)
3. Transition `TaskExecution` → `FAILED` with the exception as the failure reason
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says the engine should log the full AgentContext snapshot including the conversation on crash. In the current codebase, AgentContext.to_snapshot() produces an AgentContextSnapshot that intentionally excludes message contents (only message_count, turn_count, cost, etc.), which is safer and avoids leaking sensitive prompts/tool outputs into logs. Suggest updating this section to align with the existing snapshot model and explicitly call out redaction/truncation if any message content is ever logged.

Copilot uses AI. Check for mistakes.
DESIGN_SPEC.md Outdated
Comment on lines +1229 to +1233
Every call to `BaseCompletionProvider.complete()` already records a `CostRecord` with token counts, cost, provider, model, agent, and task. In M3, the engine additionally logs **proxy overhead metrics** at task completion:

- `turns_per_task` — number of LLM turns to complete the task (from `AgentContext.turn_count`)
- `tokens_per_task` — total tokens consumed (from `AgentContext.accumulated_cost`)
- `cost_per_task` — total USD cost (from `TaskExecution.accumulated_cost.cost_usd`)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseCompletionProvider.complete() does not currently record a CostRecord (and it also lacks agent/task context needed to populate one). It only logs provider call start/success/error; cost aggregation happens via TokenUsage on responses and (when wired) the budget layer. Please reword this to avoid stating it "already records a CostRecord" and instead describe where/when CostRecord entries are created (e.g., in the engine when agent_id/task_id are known, recorded into CostTracker).

Copilot uses AI. Check for mistakes.
DESIGN_SPEC.md Outdated
Every call to `BaseCompletionProvider.complete()` already records a `CostRecord` with token counts, cost, provider, model, agent, and task. In M3, the engine additionally logs **proxy overhead metrics** at task completion:

- `turns_per_task` — number of LLM turns to complete the task (from `AgentContext.turn_count`)
- `tokens_per_task` — total tokens consumed (from `AgentContext.accumulated_cost`)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokens_per_task is described as coming from AgentContext.accumulated_cost, but accumulated_cost is a TokenUsage object. To avoid ambiguity, consider calling out the exact field used for the token total (e.g., accumulated_cost.total_tokens or input_tokens + output_tokens) vs cost (accumulated_cost.cost_usd).

Suggested change
- `tokens_per_task` — total tokens consumed (from `AgentContext.accumulated_cost`)
- `tokens_per_task` — total tokens consumed (from `AgentContext.accumulated_cost.total_tokens`)

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR addresses four open questions from external design-spec reviews by adding §6.6 (Agent Crash Recovery), §10.5 (LLM Call Analytics), §11.1.2 (Tool Sandboxing), and corresponding entries in §15–§17. The overall structure is well-thought-out — the pluggable protocol pattern (RecoveryStrategy, SandboxBackend) is consistent with the rest of the spec, and the incremental milestone approach for analytics is practical.

Key issues found:

  • egress-only is not a valid Docker network mode (DESIGN_SPEC.md §11.1.2, line 1337) — Docker has no built-in "egress-only" driver; this would fail at runtime and requires a concrete implementation plan (sidecar proxy, iptables rules, etc.)
  • Empty allowed_hosts: [] with database: "bridge" provides no host-level isolation (DESIGN_SPEC.md §11.1.2, lines 1335–1338) — without a populated allowlist and an enforcement mechanism, bridge-networked database containers have unrestricted network access
  • Checkpoint storage silently persists full AgentContext message contents (DESIGN_SPEC.md §6.6, lines 873–891) — Strategy 1 explicitly redacts message contents from crash logs, but Strategy 2 checkpoints write the full conversation history to SQLite/filesystem at rest with no mention of encryption, access controls, or whether redaction applies

Confidence Score: 3/5

  • Safe to merge for documentation purposes, but two sections contain implementation-blocking spec errors that will cause confusion or failures when developers implement them in M3.
  • The structural and organizational changes are clean and address prior review feedback well. However, §11.1.2 specifies a non-existent Docker network mode (egress-only) and an unenforced allowlist mechanism — both would mislead M3 implementers. §6.6 has a security gap (plaintext checkpoint storage of sensitive message contents) that is inconsistent with the redaction discipline established in Strategy 1. These are spec-level errors in sections that will be directly implemented in M3, so they warrant a lower confidence score despite being a docs-only PR.
  • Pay close attention to DESIGN_SPEC.md §11.1.2 (Docker network configuration) and §6.6 (checkpoint storage security).

Important Files Changed

Filename Overview
DESIGN_SPEC.md Adds four new spec sections (§6.6, §10.5, §11.1.2, updates to §15–§17). Contains two concrete implementation blockers in §11.1.2 (egress-only is not a valid Docker network mode; empty allowed_hosts with bridge networking provides no isolation), and a security gap in §6.6 (checkpoint storage persists full message contents without any acknowledgement or controls, inconsistent with Strategy 1's explicit redaction).
CLAUDE.md Minor package-structure comment update: moves sandboxing annotation from security/ to tools/. Accurately reflects the §15.3 project tree change. No issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Agent Execution Start] --> B[Engine Execution Loop]
    B --> C{Exception?}
    C -- No --> D[Turn Completed]
    D --> E{Strategy?}
    E -- fail_reassign --> F[Strategy 1: Skip checkpoint]
    E -- checkpoint --> G[Strategy 2: Persist AgentContext snapshot\nto SQLite / filesystem]
    G --> B

    C -- Yes --> H{Strategy?}
    H -- fail_reassign --> I[Catch at engine boundary\nLog redacted snapshot\nmessage contents excluded]
    H -- checkpoint --> J[Detect via exception\nor heartbeat timeout]

    J --> K[Load last checkpoint\nfull AgentContext incl. messages]
    K --> L{Resume attempts\n< max_resume_attempts?}
    L -- Yes --> M[Environment reconciliation\nsummary of changes since checkpoint]
    M --> B
    L -- No --> N[Fall back to fail_reassign]
    N --> I

    I --> O[TaskExecution → FAILED\nwith failure reason]
    O --> P[Task available for reassignment\nvia task router]
Loading

Last reviewed commit: 383125d

…nd Greptile

- Add FAILED terminal state note to §6.6 (needs TaskStatus enum update in M3)
- Fix AgentContext snapshot to use redacted form (exclude message contents)
- Fix CostRecord attribution (engine layer, not BaseCompletionProvider)
- Fix tokens_per_task to reference accumulated_cost.total_tokens
- Move web tools from subprocess to Docker (no network controls in subprocess)
- Add database network override (needs TCP access to DB host)
- Add Docker network_overrides and allowed_hosts config
- Change Adopted→Planned for unimplemented M3 conventions (sandboxing, crash recovery, testing)
- Rename §15.5 to "Engineering Conventions" (scope expanded beyond Pydantic)
- Update CLAUDE.md: move sandboxing from security/ to tools/ description

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Summary by CodeRabbit

Documentation

  • Updated design specifications with agent crash-recovery system featuring configurable recovery strategies
  • Added tool sandboxing capabilities supporting multiple backends (subprocess, Docker, Kubernetes planned)
  • Extended engineering conventions documentation with new development milestones

Walkthrough

Documentation updates to CLAUDE.md with directory taxonomy and capability terminology adjustments. DESIGN_SPEC.md significantly expanded with new crash-recovery design (RecoveryStrategy pattern), tool sandboxing specification (SandboxBackend protocol), updated task failure workflows, and extended engineering conventions section.

Changes

Cohort / File(s) Summary
Documentation Metadata
CLAUDE.md
Updated directory structure, adjusted security description from "sandboxing" to "audit", and added "sandboxing" to tools/capabilities list.
Design Specification Expansion
DESIGN_SPEC.md
Added crash-recovery design section with pluggable RecoveryStrategy implementations (Fail-and-Reassign default, Checkpoint planned), new FAILED task status variant, YAML config examples. Introduced modular Tool Sandboxing (§11.1.2) with SandboxBackend protocol supporting SubprocessSandbox, DockerSandbox, and future K8sSandbox backends. Extended Engineering Conventions (§15.5) with entries for sandboxing, crash recovery, and agent analytics. Updated risk/plan narrative and cross-references to align crash-recovery and analytics as M3–M5 milestones.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main changes: adding crash recovery, sandboxing, analytics, and testing decisions to the design specification documentation.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, providing detailed summaries of each new section and update made to the design specification.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs/design-spec-review-decisions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

network: "none" # no network by default; per-category overrides below
network_overrides: # category-specific network policies
database: "bridge" # database tools need TCP access to DB host
web: "egress-only" # web tools need outbound HTTP; no inbound
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

egress-only is not a valid Docker network mode

network: "egress-only" is not a recognized Docker network driver. Docker's built-in network modes are bridge, host, overlay, macvlan, ipvlan, and none. Passing egress-only as the --network argument to docker run would fail with an error like network egress-only not found.

Restricting a container to outbound-only traffic in Docker actually requires either:

  1. A custom network + iptables rules (or nftables) to block inbound connections while permitting outbound ones
  2. Routing all traffic through a local HTTP proxy (e.g., Squid) whose allowlist controls what the container can reach
  3. A userspace sandboxing layer like gVisor (--runtime=runsc) with a custom network policy

The spec should document which of these mechanisms is intended and not use a non-existent Docker primitive. Otherwise implementers will hit a runtime error and have no guidance on what to build.

Suggested change
web: "egress-only" # web tools need outbound HTTP; no inbound
web: "bridge" # web tools need outbound HTTP; enforce allowlist via proxy or iptables — see note

Note: Docker has no native "egress-only" mode. Outbound-only restriction for web tools must be implemented via a sidecar proxy (e.g., Squid + http_access rules) or iptables FORWARD rules on a custom bridge network. Document the chosen mechanism before implementing DockerSandbox for the web category.

Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 1337

Comment:
**`egress-only` is not a valid Docker network mode**

`network: "egress-only"` is not a recognized Docker network driver. Docker's built-in network modes are `bridge`, `host`, `overlay`, `macvlan`, `ipvlan`, and `none`. Passing `egress-only` as the `--network` argument to `docker run` would fail with an error like `network egress-only not found`.

Restricting a container to outbound-only traffic in Docker actually requires either:
1. A custom network + `iptables` rules (or `nftables`) to block inbound connections while permitting outbound ones
2. Routing all traffic through a local HTTP proxy (e.g., [Squid](http://www.squid-cache.org/)) whose allowlist controls what the container can reach
3. A userspace sandboxing layer like `gVisor` (`--runtime=runsc`) with a custom network policy

The spec should document which of these mechanisms is intended and not use a non-existent Docker primitive. Otherwise implementers will hit a runtime error and have no guidance on what to build.

```suggestion
      web: "bridge"                      # web tools need outbound HTTP; enforce allowlist via proxy or iptables — see note
```

> **Note:** Docker has no native "egress-only" mode. Outbound-only restriction for web tools must be implemented via a sidecar proxy (e.g., Squid + `http_access` rules) or `iptables FORWARD` rules on a custom bridge network. Document the chosen mechanism before implementing `DockerSandbox` for the `web` category.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1335 to +1338
network_overrides: # category-specific network policies
database: "bridge" # database tools need TCP access to DB host
web: "egress-only" # web tools need outbound HTTP; no inbound
allowed_hosts: [] # allowlist of host:port pairs (e.g. ["db:5432"])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed_hosts is empty by default but database: "bridge" provides no isolation without it

The network_overrides section sets database: "bridge", which correctly allows the container to reach a database over TCP. However, allowed_hosts: [] is empty by default, meaning there is no host-level allowlist enforced. A Docker container on a bridge network can reach any host reachable on that bridge — not just the database. The combination of bridge network + empty allowed_hosts offers essentially no isolation beyond the container boundary itself.

For the allowed_hosts field to be meaningful, DockerSandbox must:

  1. Enforce it via iptables rules, a sidecar proxy, or Docker's --add-host + --network combination
  2. Document that allowed_hosts: [] means "deny all outbound connections on bridge" (deny-by-default), not "allow all"

The spec currently implies the allowlist is opt-in but provides no mechanism for its enforcement, leaving a gap that implementers will silently skip. At minimum, add a clarifying note:

    allowed_hosts: []                  # REQUIRED for database category: e.g. ["db:5432"]. Empty = deny all outbound on bridge.
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 1335-1338

Comment:
**`allowed_hosts` is empty by default but `database: "bridge"` provides no isolation without it**

The `network_overrides` section sets `database: "bridge"`, which correctly allows the container to reach a database over TCP. However, `allowed_hosts: []` is empty by default, meaning there is no host-level allowlist enforced. A Docker container on a `bridge` network can reach *any* host reachable on that bridge — not just the database. The combination of `bridge` network + empty `allowed_hosts` offers essentially no isolation beyond the container boundary itself.

For the `allowed_hosts` field to be meaningful, `DockerSandbox` must:
1. Enforce it via `iptables` rules, a sidecar proxy, or Docker's `--add-host` + `--network` combination
2. Document that `allowed_hosts: []` means "deny all outbound connections on bridge" (deny-by-default), not "allow all"

The spec currently implies the allowlist is opt-in but provides no mechanism for its enforcement, leaving a gap that implementers will silently skip. At minimum, add a clarifying note:

```yaml
    allowed_hosts: []                  # REQUIRED for database category: e.g. ["db:5432"]. Empty = deny all outbound on bridge.
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +873 to +891
#### Strategy 2: Checkpoint Recovery (Planned — M4/M5)

The engine persists an `AgentContext` snapshot after each completed turn. On crash, the framework detects the failure (via heartbeat timeout or exception), loads the last checkpoint, and resumes execution from the exact turn where it left off. The immutable `model_copy(update=...)` pattern makes checkpointing trivial — each `AgentContext` is a complete, self-contained frozen state that serializes cleanly via `model_dump_json()`.

```yaml
crash_recovery:
strategy: "checkpoint"
checkpoint:
persist_every_n_turns: 1 # checkpoint frequency
storage: "sqlite" # sqlite, filesystem
heartbeat_interval_seconds: 30 # detect unresponsive agents
max_resume_attempts: 2 # retry limit before falling back to fail_reassign
```

- Preserves progress — critical for long tasks (multi-step plans, epic-level work)
- Requires persistence layer and environment state reconciliation on resume
- Natural fit with the existing immutable state model

> **Environment reconciliation:** When resuming from a checkpoint, the agent's tools and workspace may have changed (other agents modified files, external state drifted). The checkpoint strategy includes a reconciliation step: the resumed agent receives a summary of changes since the checkpoint timestamp and can adapt its plan accordingly. This is analogous to a developer returning to a branch after colleagues have pushed changes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkpoint storage silently persists full message contents

Strategy 1 (fail-and-reassign) explicitly redacts message contents from its log entry: "excluding message contents to avoid leaking sensitive prompts/tool outputs". But Strategy 2 (checkpoint recovery) persists the full AgentContext snapshot — which includes the entire message history — to SQLite or the filesystem after every turn.

This creates an inconsistency: the same sensitive content that is deliberately excluded from crash logs in Strategy 1 is written in plaintext to a persistent checkpoint storage in Strategy 2. If the SQLite file or filesystem checkpoint directory is accessible to other agents, processes, or backup systems, sensitive prompts and tool outputs (API keys returned by tools, user PII in prompts, etc.) are silently at rest.

The spec should acknowledge this security implication and at least document the intended controls:

  • Should checkpoint storage be encrypted at rest? (e.g., SQLCipher for SQLite, or filesystem-level encryption)
  • Should AgentContext checkpoints exclude message contents (storing only tool call history + turn count) and rely on the task description for context on resume?
  • What is the access model for the checkpoint database — is it per-agent, shared, or controlled by the engine process only?

Without explicit guidance here, implementers will default to unencrypted plaintext storage, which is a meaningful downgrade from the redaction discipline already applied in Strategy 1.

Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 873-891

Comment:
**Checkpoint storage silently persists full message contents**

Strategy 1 (fail-and-reassign) explicitly redacts message contents from its log entry: *"excluding message contents to avoid leaking sensitive prompts/tool outputs"*. But Strategy 2 (checkpoint recovery) persists the *full* `AgentContext` snapshot — which includes the entire message history — to SQLite or the filesystem after every turn.

This creates an inconsistency: the same sensitive content that is deliberately excluded from crash logs in Strategy 1 is written in plaintext to a persistent `checkpoint` storage in Strategy 2. If the SQLite file or filesystem checkpoint directory is accessible to other agents, processes, or backup systems, sensitive prompts and tool outputs (API keys returned by tools, user PII in prompts, etc.) are silently at rest.

The spec should acknowledge this security implication and at least document the intended controls:

- Should checkpoint storage be encrypted at rest? (e.g., SQLCipher for SQLite, or filesystem-level encryption)
- Should `AgentContext` checkpoints exclude message contents (storing only tool call history + turn count) and rely on the task description for context on resume?
- What is the access model for the checkpoint database — is it per-agent, shared, or controlled by the engine process only?

Without explicit guidance here, implementers will default to unencrypted plaintext storage, which is a meaningful downgrade from the redaction discipline already applied in Strategy 1.

How can I resolve this? If you propose a fix, please make it concise.

@Aureliolo Aureliolo merged commit 5c11595 into main Mar 6, 2026
10 of 11 checks passed
@Aureliolo Aureliolo deleted the docs/design-spec-review-decisions branch March 6, 2026 11:54
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Aureliolo added a commit that referenced this pull request Mar 11, 2026
…ing (#299)

## Summary

- **Upgrade `actions/upload-pages-artifact` v3 → v4** — v4.0.0 ([PR
#127](actions/upload-pages-artifact#127))
SHA-pins its internal `actions/upload-artifact` dependency, fixing the
`sha_pinning_required` conflict where the composite action's tag
reference (`@v4`) was rejected by the repo's Actions permissions policy
- **Add `zizmor` workflow security analysis** — runs on workflow file
changes (push to main + PRs), catches unpinned actions, script
injection, excessive permissions, and uploads SARIF to the Security tab
- **Add explicit failure on release retry exhaustion** — retry loop now
sets a `FOUND` flag so exhaustion surfaces a clear `::error::` instead
of falling through to a confusing `gh release edit` failure (Greptile PR
#298 finding)

## Context

After merging #298, the Pages workflow failed on main because
`upload-pages-artifact` v3 internally called
`actions/upload-artifact@v4` (tag, not SHA), violating the repo's
`sha_pinning_required: true` setting. This is a [known
limitation](actions/runner#2195) with
composite actions — GitHub enforces SHA pinning transitively but
composite action authors don't always pin their internal deps. v4.0.0
fixed this upstream.

The zizmor workflow provides CI-level enforcement of SHA pinning and
other workflow security checks, complementing the repo-level
`sha_pinning_required` setting.

## Test plan

- [ ] Pages workflow succeeds on main after merge (v4
upload-pages-artifact)
- [ ] zizmor workflow runs and uploads SARIF on this PR's workflow
changes
- [ ] Verify no breaking change from v4 dotfile exclusion (MkDocs/Astro
output has no dotfiles)
- [ ] Release retry loop fails clearly after exhaustion (manual
verification of logic)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants