feat: implement personality injection and system prompt construction#105
feat: implement personality injection and system prompt construction#105
Conversation
…13) Add build_system_prompt() that renders agent identity, personality, skills, authority, and seniority into structured system prompts via Jinja2 templates. Supports optional task/tool/company context, token-budget trimming, custom templates, and pluggable token estimation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 8 agents, 16 findings addressed: - Remove unused seniority_info parameter (dead code, 6 agents flagged) - Reuse single SandboxedEnvironment instead of creating two per call - Add Final annotation and import-time completeness check on AUTONOMY_INSTRUCTIONS - Add error path tests (invalid template syntax, render failures) - Add top-level error wrapping in build_system_prompt for unpaired logs - Use tuples instead of lists for template context values (immutability) - Add trimming priority order test - Expand metadata assertions to cover all 5 keys - Improve docstrings (PromptTokenEstimator, DefaultTokenEstimator, SystemPrompt) - Fix misleading comments and noqa explanations - Extract _build_core_context and _render_and_estimate helpers (polish pass) - Add missing fixture docstring - Update pre-pr-review skill: resilience audit now triggers on any src_py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🧰 Additional context used📓 Path-based instructions (4)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/ai_company/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/ai_company/engine/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (14)📓 Common learnings📚 Learning: 2026-01-24T09:54:45.426ZApplied to files:
📚 Learning: 2026-03-05T14:30:10.714ZApplied to files:
📚 Learning: 2026-01-24T09:54:45.426ZApplied to files:
📚 Learning: 2026-01-26T08:59:32.818ZApplied to files:
📚 Learning: 2026-01-24T09:54:56.100ZApplied to files:
📚 Learning: 2026-01-24T09:54:56.100ZApplied to files:
📚 Learning: 2026-01-24T09:54:56.100ZApplied to files:
📚 Learning: 2026-01-26T08:59:32.818ZApplied to files:
📚 Learning: 2026-01-24T16:33:29.354ZApplied to files:
📚 Learning: 2026-03-05T14:30:10.714ZApplied to files:
📚 Learning: 2026-02-26T17:43:50.902ZApplied to files:
📚 Learning: 2026-03-05T14:30:10.714ZApplied to files:
📚 Learning: 2026-01-24T09:54:45.426ZApplied to files:
🔇 Additional comments (25)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a Jinja2-based system-prompt builder (with token estimation, budget-aware trimming, observability events, and error types), public engine re-exports, prompt template/constants, extensive unit tests and fixtures, and documentation/workflow updates broadening resilience-audit scope and post-implementation guidance. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Builder as build_system_prompt
participant Template as Jinja2 Renderer
participant Estimator as TokenEstimator
participant Logger as Observability
participant Error as ErrorHandler
Caller->>Builder: call(agent, role, task, tools, company, max_tokens, custom_template)
Builder->>Logger: PROMPT_BUILD_START
alt custom_template provided
Builder->>Template: validate custom template
alt validation fails
Template-->>Error: validation error
Error->>Logger: PROMPT_CUSTOM_TEMPLATE_FAILED
Error-->>Caller: raise PromptBuildError
else
Builder->>Logger: PROMPT_CUSTOM_TEMPLATE_LOADED
end
end
Builder->>Template: render template with context
alt render fails
Template-->>Error: render error
Error->>Logger: PROMPT_BUILD_ERROR
Error-->>Caller: raise PromptBuildError
else
Template-->>Builder: rendered_content
end
Builder->>Estimator: estimate_tokens(rendered_content)
Estimator-->>Builder: token_count
alt max_tokens set and token_count > max_tokens
loop trimming until within budget
Builder->>Builder: trim sections (company → tools → task)
Builder->>Template: re-render trimmed context
Builder->>Estimator: re-estimate_tokens
Builder->>Logger: PROMPT_BUILD_TOKEN_TRIMMED
end
Builder->>Logger: PROMPT_BUILD_BUDGET_EXCEEDED
end
Builder->>Builder: assemble SystemPrompt (content, version, tokens, sections, metadata)
Builder->>Logger: PROMPT_BUILD_SUCCESS
Builder-->>Caller: return SystemPrompt
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive system for generating dynamic and context-aware system prompts for AI agents. It centralizes the logic for integrating agent identity, personality, skills, and various contextual elements into prompts, ensuring that agents receive tailored instructions. The new system also incorporates token budget management through estimation and progressive trimming, enhancing efficiency and control over prompt length. This foundational work significantly improves the flexibility and robustness of agent behavior definition. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Greptile SummaryThis PR implements Key changes:
Three issues identified:
Confidence Score: 3/5
Last reviewed commit: 07d8833 |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Code Review
This pull request introduces a well-designed and robust implementation for constructing contextual system prompts for AI agents, featuring personality injection, token budget trimming, and custom template support. However, the system is vulnerable to prompt injection as untrusted data from sources like task descriptions and agent names is rendered directly into Jinja2 templates without sanitization or clear demarcation, potentially allowing an attacker to manipulate agent behavior. Additionally, there is a minor suggestion to improve the readability of the generated tool list within the default prompt template.
| DEFAULT_TEMPLATE: Final[str] = """\ | ||
| ## Identity | ||
|
|
||
| You are **{{ agent_name }}**, a {{ agent_level }} {{ agent_role }} \ | ||
| in the {{ agent_department }} department. | ||
| {% if role_description %} | ||
| **Role**: {{ role_description }} | ||
| {% endif %} | ||
|
|
||
| ## Personality | ||
| {% if personality_description %} | ||
| {{ personality_description }} | ||
| {% endif %} | ||
| - **Communication style**: {{ communication_style }} | ||
| - **Risk tolerance**: {{ risk_tolerance }} | ||
| - **Creativity**: {{ creativity }} | ||
| {% if personality_traits %} | ||
| - **Traits**: {{ personality_traits | join(', ') }} | ||
| {% endif %} | ||
|
|
||
| ## Skills | ||
| {% if primary_skills %} | ||
| - **Primary**: {{ primary_skills | join(', ') }} | ||
| {% endif %} | ||
| {% if secondary_skills %} | ||
| - **Secondary**: {{ secondary_skills | join(', ') }} | ||
| {% endif %} | ||
|
|
||
| ## Authority | ||
| {% if can_approve %} | ||
| - **Can approve**: {{ can_approve | join(', ') }} | ||
| {% endif %} | ||
| {% if reports_to %} | ||
| - **Reports to**: {{ reports_to }} | ||
| {% endif %} | ||
| {% if can_delegate_to %} | ||
| - **Can delegate to**: {{ can_delegate_to | join(', ') }} | ||
| {% endif %} | ||
| {% if budget_limit > 0 %} | ||
| - **Budget limit**: ${{ "%.2f" | format(budget_limit) }} per task | ||
| {% endif %} | ||
|
|
||
| ## Autonomy | ||
|
|
||
| {{ autonomy_instructions }} | ||
| {% if task %} | ||
|
|
||
| ## Current Task | ||
|
|
||
| **{{ task.title }}** | ||
|
|
||
| {{ task.description }} | ||
| {% if task.acceptance_criteria %} | ||
|
|
||
| ### Acceptance Criteria | ||
| {% for criterion in task.acceptance_criteria %} | ||
| - {{ criterion.description }} | ||
| {% endfor %} | ||
| {% endif %} | ||
| {% if task.budget_limit > 0 %} | ||
|
|
||
| **Task budget**: ${{ "%.2f" | format(task.budget_limit) }} | ||
| {% endif %} | ||
| {% if task.deadline %} | ||
| **Deadline**: {{ task.deadline }} | ||
| {% endif %} | ||
| {% endif %} | ||
| {% if tools %} | ||
|
|
||
| ## Available Tools | ||
| {% for tool in tools %} | ||
| - **{{ tool.name }}**{% if tool.description %}: {{ tool.description }}{% endif %} | ||
|
|
||
| {% endfor %} | ||
| {% endif %} | ||
| {% if company %} | ||
|
|
||
| ## Company Context | ||
|
|
||
| You work at **{{ company.name }}**. | ||
| {% if company_departments %} | ||
| **Departments**: {{ company_departments | join(', ') }} | ||
| {% endif %} | ||
| {% endif %} | ||
| """ |
There was a problem hiding this comment.
The system prompt is constructed by rendering several fields (e.g., agent_name, task.description, tool.description) directly into a Jinja2 template without any sanitization or escaping. This creates a high-severity prompt injection vulnerability, as untrusted input could allow an attacker to manipulate the agent's behavior. For example, a malicious task.description could override the agent's instructions.
Additionally, there is a minor formatting issue on line 148: a blank line adds an extra newline between each item in the 'Available Tools' list, making it look sparse. For a more standard, compact list format, this line should be removed.
Remediation for prompt injection: Sanitize all user-provided inputs before rendering them into the prompt. Use delimiters to clearly separate data from instructions, and explicitly instruct the LLM to treat the content within those delimiters as untrusted data (e.g., wrap task descriptions in tags like <task_description>...</task_description>).
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/engine/prompt.py`:
- Around line 140-225: The build_system_prompt function is too large; refactor
it into smaller helpers to keep the entrypoint under 50 lines: extract input
validation and estimator/template resolution into a helper (e.g., new
_prepare_prompt_inputs(agent, role, task, available_tools, company,
custom_template, token_estimator) that returns estimator and template_str), move
the try/except orchestration and call to _render_with_trimming into a concise
runner (e.g., _execute_prompt_render(template_str, agent, role, task,
available_tools, company, max_tokens, estimator) that raises PromptBuildError on
failures), and isolate the success/error logging (use _log_prompt_start(agent,
...) and _log_prompt_result(result, ...) or similar). Update build_system_prompt
to call these helpers in sequence (logging start, calling
_prepare_prompt_inputs, then _execute_prompt_render, then logging success) so
build_system_prompt only orchestrates high-level steps while retaining existing
calls to _resolve_template, _render_with_trimming, and the same logger messages.
- Around line 422-511: The _render_with_trimming function is over the 50-line
guideline—extract three helpers to simplify it: 1) a helper that attempts to
"trim one section" given the current section name and mutable contexts (use
symbols _TRIMMABLE_SECTIONS, _SECTION_COMPANY, _SECTION_TOOLS, _SECTION_TASK and
the local variables company, available_tools, task) and returns whether it
trimmed; 2) a "re-render_and_estimate" wrapper around _render_and_estimate that
returns (content, estimated) for current
agent/role/task/available_tools/company/estimator to centralize repeated calls;
and 3) a "handle_post_trim_overflow" helper that logs PROMPT_BUILD_TOKEN_TRIMMED
and computes sections via _compute_sections and returns the final SystemPrompt
construction values (content, estimated, sections, metadata). Replace the
inlined loop and logging in _render_with_trimming with calls to these helpers
and keep SystemPrompt creation using SystemPrompt, PROMPT_TEMPLATE_VERSION, and
_build_metadata(agent).
- Around line 462-510: The function currently best-effort trims sections
(_TRIMMABLE_SECTIONS via _SECTION_COMPANY, _SECTION_TOOLS, _SECTION_TASK) using
_render_and_estimate but may still return a SystemPrompt that exceeds
max_tokens; change the logic after the trimming loop to enforce the limit: if
estimated > max_tokens then log PROMPT_BUILD_TOKEN_TRIMMED with details
(agent_id, max_tokens, estimated, trimmed_sections) and raise a clear exception
(e.g., ValueError) instead of returning success so callers cannot receive an
over-budget prompt; ensure the exception message includes agent.id, max_tokens
and estimated_tokens to aid debugging.
- Around line 181-190: The code accepts non-positive max_tokens and lets it
reach the trimming logic; add an explicit validation at the API boundary (in the
function that emits PROMPT_BUILD_START / accepts the max_tokens argument) to
reject values <= 0: check that max_tokens is an int and > 0 and raise a
ValueError with a clear message (e.g., "max_tokens must be a positive integer")
before any trimming or budget calculations that use max_tokens so invalid inputs
never reach trimming logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e8249802-6eb7-4101-b09d-e753f5ef9975
📒 Files selected for processing (10)
.claude/skills/pre-pr-review/SKILL.mdCLAUDE.mdsrc/ai_company/engine/__init__.pysrc/ai_company/engine/errors.pysrc/ai_company/engine/prompt.pysrc/ai_company/engine/prompt_template.pysrc/ai_company/observability/events.pytests/unit/engine/__init__.pytests/unit/engine/conftest.pytests/unit/engine/test_prompt.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (without parentheses) for exception handling — ruff enforces this on Python 3.14
Include type hints on all public functions, validated with mypy strict mode
Use Google-style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones — prioritize immutability
Use Pydantic v2 for data models withBaseModel,model_validator, andConfigDict
Enforce 88 character line length (ruff configured)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files) in Python code
Every module with business logic must import logging with:from ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Always use variable namelogger(not_logger, notlog) for the logging instance
Pure data models, enums, and re-exports do not require logging
Files:
src/ai_company/engine/__init__.pysrc/ai_company/observability/events.pysrc/ai_company/engine/errors.pytests/unit/engine/test_prompt.pysrc/ai_company/engine/prompt_template.pytests/unit/engine/conftest.pysrc/ai_company/engine/prompt.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Never useimport logging,logging.getLogger(), orprint()in application code — use the project's logger
Use event name constants fromai_company.observability.eventsrather than inline strings
Use structured logging format:logger.info(EVENT, key=value)— neverlogger.info('msg %s', val)
All error paths must log at WARNING or ERROR level with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
All provider calls must go throughBaseCompletionProviderwhich applies retry and rate limiting automatically
Files:
src/ai_company/engine/__init__.pysrc/ai_company/observability/events.pysrc/ai_company/engine/errors.pysrc/ai_company/engine/prompt_template.pysrc/ai_company/engine/prompt.py
src/ai_company/engine/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedErrorsignals that all retries failed — the engine layer catches this to trigger fallback chains
Files:
src/ai_company/engine/__init__.pysrc/ai_company/engine/errors.pysrc/ai_company/engine/prompt_template.pysrc/ai_company/engine/prompt.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Useasyncio_mode = 'auto'for async tests — no manual@pytest.mark.asyncioneeded
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g.test-haiku-001,test-provider), never real vendor model IDs
Files:
tests/unit/engine/test_prompt.pytests/unit/engine/conftest.py
🧠 Learnings (27)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/prompts/**/*.yaml : YAML prompt templates should be loaded at runtime from `src/prompts/` directory
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` rather than inline strings
Applied to files:
src/ai_company/observability/events.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR level with context before raising
Applied to files:
src/ai_company/engine/errors.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/unit/engine/test_prompt.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
Applied to files:
tests/unit/engine/test_prompt.pysrc/ai_company/engine/prompt.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
CLAUDE.md.claude/skills/pre-pr-review/SKILL.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: For trivial/docs-only changes, use `/pre-pr-review quick` to skip agents but still run automated checks
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Always create a PR for issue work. When implementing changes for a GitHub issue, create a branch and open a pull request. Do not wait to be asked.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Pre-commit hooks enforce: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch (main), ruff check+format, gitleaks
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Never create a PR directly — always use `/pre-pr-review` command to create PRs with automated checks and review agents
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: No bypassing CI - never use `git push --no-verify` or modify test coverage thresholds to make tests pass. If tests fail, fix the actual issue. Pre-push hooks exist to catch problems before they reach CI.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Never defer work—do not suggest "this can be done later" or "consider for a future PR". Complete all requested changes fully.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: After a PR exists, use `/aurelio-review-pr` to handle external reviewer feedback
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Never defer work. Do not suggest 'this can be done later' or 'consider for a future PR'. Complete all requested changes fully.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Use branch naming format `<type>/<slug>` from main
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: After every push, you MUST check that CI passes. If CI fails, fix the issue immediately and push again until all checks are green. Never walk away from a failing CI pipeline.
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
.claude/skills/pre-pr-review/SKILL.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest in `tests/unit/` with `test_*.py` naming convention
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to tests/**/*.py : Use vendor-agnostic fake model IDs/names in tests (e.g. `test-haiku-001`, `test-provider`), never real vendor model IDs
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Applied to files:
tests/unit/engine/conftest.py
🪛 LanguageTool
.claude/skills/pre-pr-review/SKILL.md
[style] ~203-~203: Since ownership is already implied, this phrasing may be redundant.
Context: ...layer):** 1. Driver subclass implements its own retry/backoff logic instead of relying ...
(PRP_OWN)
[style] ~207-~207: This phrase is redundant. Consider using “outside”.
Context: .... asyncio.sleep used for retry delays outside of RetryHandler (MAJOR) **Hard rules (a...
(OUTSIDE_OF)
[style] ~212-~212: Consider using the typographical ellipsis character here instead.
Context: ...8. Manual retry/backoff patterns (e.g., for attempt in range(...), while retries > 0, time.sleep in...
(ELLIPSIS)
🔇 Additional comments (17)
CLAUDE.md (1)
114-118: LGTM!The new "Post-Implementation" section provides clear workflow guidance that complements the existing "Pre-PR Review" section, ensuring developers know to branch/commit/push before using
/pre-pr-review.src/ai_company/engine/errors.py (1)
1-9: LGTM!Clean, minimal error hierarchy with appropriate docstrings. The exception definitions follow the guideline that pure data models and enums don't require logging—the calling code in
prompt.pyis responsible for logging before raising these errors..claude/skills/pre-pr-review/SKILL.md (1)
196-218: LGTM!The expanded resilience-audit scope from provider-only to "any code" is a valuable improvement. The new hard rules (6-8) for catching broad exception handling and manual retry patterns across the entire codebase will help maintain resilience consistency. The renumbered soft rules (9-12) provide good guidance for non-retryable error classification.
src/ai_company/engine/prompt_template.py (2)
68-72: LGTM!Excellent defensive programming—the runtime validation ensures
AUTONOMY_INSTRUCTIONSstays in sync with theSeniorityLevelenum. If a new level is added to the enum, this will fail fast at import time with a clear error message rather than silently producing incomplete prompts.
76-160: LGTM!The Jinja2 template is well-structured with:
- Clear section headers matching the
sectionstuple tracked inSystemPrompt- Conditional rendering with
{% if %}for optional sections (task, tools, company)- Proper budget formatting with
"%.2f" | format()- Immutable
Final[str]declaration preventing accidental modificationsrc/ai_company/engine/__init__.py (1)
1-21: LGTM!Clean public API surface with appropriate re-exports. The
__all__list is alphabetically sorted and includes the essential symbols for system prompt construction. The module correctly follows the guideline that re-export__init__.pyfiles don't require logging.tests/unit/engine/conftest.py (2)
30-33: LGTM!Fixtures correctly use vendor-agnostic fake model IDs (
"test-provider","test-model-001") as required by the coding guidelines, ensuring tests aren't coupled to external providers.
36-131: LGTM!Comprehensive fixtures with:
- Proper docstrings explaining each fixture's purpose
- Immutable tuples for collection fields (
traits,primary,acceptance_criteria,departments)- Realistic data structures that exercise all prompt template sections
- Appropriate scope (function-level by default)
src/ai_company/observability/events.py (1)
131-139: LGTM!New event constants follow the established
domain.noun.verbnaming convention and cover the complete prompt construction lifecycle (start, success, token_trimmed, error) plus custom template handling (loaded, failed). Thenoqa: S105comment on line 136 correctly suppresses a false positive for the "token" substring in "token_trimmed".tests/unit/engine/test_prompt.py (8)
37-54: LGTM!Well-structured test class with clear test method naming. The minimal agent test validates all essential properties of the returned
SystemPrompt.
71-108: LGTM!Excellent test demonstrating that different personality configurations produce distinct prompts. The inline agent creation uses vendor-agnostic model IDs (
"test","test-001") as required by coding guidelines.
256-322: LGTM!Good coverage of seniority-based autonomy instructions:
- Tests verify specific phrases for junior ("Follow instructions carefully", "seek approval")
- Tests verify senior autonomy ("Take ownership")
- Tests verify C-suite scope ("company-wide authority", "vision")
- Meta-test confirms all 8 seniority levels have unique instruction text
383-404: LGTM!Good test for custom token estimator injection. The inline
CountingEstimatorclass verifies that the protocol is correctly used during prompt construction.
450-481: LGTM!Immutability and metadata tests are well-designed:
- Frozen model test validates that mutation raises
ValidationError- Metadata test verifies all 5 expected keys with correct values from the agent
487-520: LGTM!Logging tests use
structlog.testing.capture_logs()correctly to verify thatprompt.build.start,prompt.build.success, andprompt.build.token_trimmedevents are emitted during prompt construction.
526-565: LGTM!Error handling tests cover:
- Invalid Jinja2 syntax raises
PromptBuildErrorwith descriptive message- Exception chaining preserves the original cause
- Undefined filter at render time raises
PromptBuildErrorThese tests ensure proper error propagation and debugging context.
571-613: LGTM!The trimming priority test cleverly calculates a budget between "without company" and "full" sizes to verify that company is trimmed first while tools and task are preserved. This validates the documented priority order: company → tools → task.
| if max_tokens is not None and estimated > max_tokens: | ||
| trimmed_sections: list[str] = [] | ||
|
|
||
| for section in _TRIMMABLE_SECTIONS: | ||
| if estimated <= max_tokens: | ||
| break | ||
|
|
||
| if section == _SECTION_COMPANY and company is not None: | ||
| company = None | ||
| elif section == _SECTION_TOOLS and available_tools: | ||
| available_tools = () | ||
| elif section == _SECTION_TASK and task is not None: | ||
| task = None | ||
| else: | ||
| continue | ||
|
|
||
| trimmed_sections.append(section) | ||
| content, estimated = _render_and_estimate( | ||
| template_str, | ||
| agent, | ||
| role, | ||
| task, | ||
| available_tools, | ||
| company, | ||
| estimator, | ||
| ) | ||
|
|
||
| if trimmed_sections: | ||
| logger.warning( | ||
| PROMPT_BUILD_TOKEN_TRIMMED, | ||
| agent_id=str(agent.id), | ||
| max_tokens=max_tokens, | ||
| estimated_tokens=estimated, | ||
| trimmed_sections=trimmed_sections, | ||
| ) | ||
|
|
||
| sections = _compute_sections( | ||
| task=task, | ||
| available_tools=available_tools, | ||
| company=company, | ||
| ) | ||
|
|
||
| return SystemPrompt( | ||
| content=content, | ||
| template_version=PROMPT_TEMPLATE_VERSION, | ||
| estimated_tokens=estimated, | ||
| sections=sections, | ||
| metadata=_build_metadata(agent), | ||
| ) |
There was a problem hiding this comment.
Do not return prompts that still exceed max_tokens.
At Line 462, trimming is best-effort only; if the prompt is still over budget after all trimmable sections are removed, the function still returns success. This breaks token-limit enforcement and can fail downstream provider calls.
🔧 Proposed fix
if max_tokens is not None and estimated > max_tokens:
trimmed_sections: list[str] = []
for section in _TRIMMABLE_SECTIONS:
if estimated <= max_tokens:
break
@@
if trimmed_sections:
logger.warning(
PROMPT_BUILD_TOKEN_TRIMMED,
agent_id=str(agent.id),
max_tokens=max_tokens,
estimated_tokens=estimated,
trimmed_sections=trimmed_sections,
)
+ if estimated > max_tokens:
+ logger.error(
+ PROMPT_BUILD_ERROR,
+ agent_id=str(agent.id),
+ max_tokens=max_tokens,
+ estimated_tokens=estimated,
+ trimmed_sections=trimmed_sections,
+ )
+ msg = (
+ f"Prompt exceeds max_tokens={max_tokens} after trimming "
+ f"(estimated={estimated})"
+ )
+ raise PromptBuildError(msg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/prompt.py` around lines 462 - 510, The function
currently best-effort trims sections (_TRIMMABLE_SECTIONS via _SECTION_COMPANY,
_SECTION_TOOLS, _SECTION_TASK) using _render_and_estimate but may still return a
SystemPrompt that exceeds max_tokens; change the logic after the trimming loop
to enforce the limit: if estimated > max_tokens then log
PROMPT_BUILD_TOKEN_TRIMMED with details (agent_id, max_tokens, estimated,
trimmed_sections) and raise a clear exception (e.g., ValueError) instead of
returning success so callers cannot receive an over-budget prompt; ensure the
exception message includes agent.id, max_tokens and estimated_tokens to aid
debugging.
Source fixes: - Validate max_tokens > 0 at API boundary (raises PromptBuildError) - Add PROMPT_BUILD_BUDGET_EXCEEDED warning when prompt still exceeds max_tokens after all trimmable sections removed - Move PROMPT_CUSTOM_TEMPLATE_LOADED log after syntax validation - Add read-only docstring warning to metadata field - Add security comment explaining SandboxedEnvironment choice - Add comment explaining intentional early-fail double-parse - Extract _trim_sections helper from _render_with_trimming (now <50 lines) - Remove blank line in tool list template (sparse output) Test improvements: - Add test for catch-all exception wrapping path - Add test for all-sections-trimmed / budget-exceeded scenario - Add test for tools-trimmed-before-task cascade level - Add test for agent with default/empty personality - Add test for task with zero budget and no deadline - Add tests for max_tokens=0 and negative validation - Add test for budget-exceeded warning log - Assert company departments in prompt content - Derive budget strings from fixture values instead of hardcoding - Assert trimmed_sections content in trim log test Minor: - Fix misleading "prompt injection" wording in fixture docstring Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| You are **{{ agent_name }}**, a {{ agent_level }} {{ agent_role }} \ | ||
| in the {{ agent_department }} department. | ||
| {% if role_description %} | ||
| **Role**: {{ role_description }} |
There was a problem hiding this comment.
The trailing backslash in the template is a literal character inside a triple-quoted string, so it will likely appear in the rendered prompt and/or produce unintended formatting. Remove the literal \\ and rely on normal wrapping, or use Jinja2 whitespace control ({%- ... -%}) if the intent is to avoid newlines.
| You are **{{ agent_name }}**, a {{ agent_level }} {{ agent_role }} \ | |
| in the {{ agent_department }} department. | |
| {% if role_description %} | |
| **Role**: {{ role_description }} | |
| You are **{{ agent_name }}**, a {{ agent_level }} {{ agent_role }} in the {{ agent_department }} department. | |
| {% if role_description %} | |
| **Role**: {{ role_description }} | |
| **Role**: {{ role_description }} |
| adds additional sections. | ||
|
|
There was a problem hiding this comment.
The trimming implementation removes the entire tools section by setting available_tools = (), not just 'tool descriptions'. Either update the docstring to match current behavior (company → tools → task), or implement a tools-only description trim (e.g., keep tool names but blank descriptions) to align with the documented contract.
| adds additional sections. | |
| sections are progressively trimmed in order: company context, tools | |
| section, task details. |
| from ai_company.providers.models import ToolDefinition | ||
|
|
||
| logger = get_logger(__name__) | ||
|
|
There was a problem hiding this comment.
With the default Jinja2 Undefined, custom templates that reference missing variables typically render empty strings instead of failing, which can silently produce incomplete system prompts. Consider configuring the environment to raise on undefined variables (e.g., undefined=StrictUndefined) so template typos reliably surface as PromptBuildError.
|
|
||
| def _resolve_template(custom_template: str | None) -> str: | ||
| """Resolve the template string to use for rendering. | ||
|
|
There was a problem hiding this comment.
The custom-template loaded/failed events don’t include correlation fields (e.g., agent_id, agent_name), making them harder to tie back to a specific build attempt. Consider binding agent context at the start of build_system_prompt() (e.g., logger.bind(...)) or passing agent_id/agent_name into _resolve_template() and including them in these log calls.
| The template string to render. | ||
|
|
||
| Raises: | ||
| PromptBuildError: If custom template syntax is invalid. |
There was a problem hiding this comment.
The custom-template loaded/failed events don’t include correlation fields (e.g., agent_id, agent_name), making them harder to tie back to a specific build attempt. Consider binding agent context at the start of build_system_prompt() (e.g., logger.bind(...)) or passing agent_id/agent_name into _resolve_template() and including them in these log calls.
|
|
||
| def _render_template(template_str: str, context: dict[str, Any]) -> str: | ||
| """Render a Jinja2 template string with the given context. | ||
|
|
There was a problem hiding this comment.
Render failures are logged without agent context, and build_system_prompt() intentionally re-raises PromptBuildError without adding another log, so the only error event may lack agent_id/agent_name. To improve observability/debugging, include agent identifiers in this log entry (by binding logger context earlier or by passing identifiers into _render_template()).
| # Include agent identifiers in the log when available in the context | |
| metadata = context.get("metadata", {}) if isinstance(context, dict) else {} | |
| agent_id = metadata.get("agent_id") | |
| agent_name = metadata.get("name") | |
| logger.exception( | |
| PROMPT_BUILD_ERROR, | |
| error=str(exc), | |
| agent_id=agent_id, | |
| agent_name=agent_name, | |
| ) |
| metadata: dict[str, str] = Field( | ||
| description="Agent identity metadata (treat as read-only)", | ||
| ) |
There was a problem hiding this comment.
SystemPrompt declares model_config = ConfigDict(frozen=True) and is documented as an "immutable result type." However, metadata: dict[str, str] is a mutable dict. Pydantic's frozen=True prevents field reassignment but does not prevent in-place mutation like prompt.metadata["agent_id"] = "hijacked".
For consistent immutability, use a read-only mapping type:
from collections.abc import Mapping
metadata: Mapping[str, str] = Field(
description="Agent identity metadata (treat as read-only)",
)Then update _build_metadata() to return dict (which is a Mapping), and the call at line 594 will work unchanged since Mapping is the interface.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/prompt.py
Line: 79-81
Comment:
`SystemPrompt` declares `model_config = ConfigDict(frozen=True)` and is documented as an "immutable result type." However, `metadata: dict[str, str]` is a mutable dict. Pydantic's `frozen=True` prevents field reassignment but does **not** prevent in-place mutation like `prompt.metadata["agent_id"] = "hijacked"`.
For consistent immutability, use a read-only mapping type:
```python
from collections.abc import Mapping
metadata: Mapping[str, str] = Field(
description="Agent identity metadata (treat as read-only)",
)
```
Then update `_build_metadata()` to return `dict` (which is a `Mapping`), and the call at line 594 will work unchanged since `Mapping` is the interface.
How can I resolve this? If you propose a fix, please make it concise.| _, estimated = _render_and_estimate( | ||
| template_str, | ||
| agent, | ||
| role, | ||
| task, | ||
| available_tools, | ||
| company, | ||
| estimator, | ||
| ) | ||
|
|
||
| if trimmed_sections: | ||
| logger.warning( | ||
| PROMPT_BUILD_TOKEN_TRIMMED, | ||
| agent_id=str(agent.id), | ||
| max_tokens=max_tokens, | ||
| estimated_tokens=estimated, | ||
| trimmed_sections=trimmed_sections, | ||
| ) | ||
|
|
||
| if estimated > max_tokens: | ||
| logger.warning( | ||
| PROMPT_BUILD_BUDGET_EXCEEDED, | ||
| agent_id=str(agent.id), | ||
| max_tokens=max_tokens, | ||
| estimated_tokens=estimated, | ||
| ) |
There was a problem hiding this comment.
_trim_sections calls _render_and_estimate at lines 492–500 solely to compute estimated for the warning logs. Then _render_with_trimming calls _render_and_estimate again at lines 573–580 with the exact same inputs to get the content for the returned SystemPrompt.
This causes a Jinja2 re-render on every trimming path. The first loop iteration (lines 469–477) also re-renders the same unchanged state that was already computed before trimming.
Return estimated as part of _trim_sections's return tuple to eliminate the redundant render:
def _trim_sections(...) -> tuple[Task | None, tuple[ToolDefinition, ...], Company | None, int]:
# ... (loop as before)
_, estimated = _render_and_estimate(...)
# ... (logging as before)
return task, available_tools, company, estimated # Return estimatedThen in _render_with_trimming (lines 562–581):
if max_tokens is not None and estimated > max_tokens:
task, available_tools, company, estimated = _trim_sections(...)
# (no second _render_and_estimate call needed)Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/prompt.py
Line: 492-517
Comment:
`_trim_sections` calls `_render_and_estimate` at lines 492–500 solely to compute `estimated` for the warning logs. Then `_render_with_trimming` calls `_render_and_estimate` **again** at lines 573–580 with the exact same inputs to get the `content` for the returned `SystemPrompt`.
This causes a Jinja2 re-render on every trimming path. The first loop iteration (lines 469–477) also re-renders the same unchanged state that was already computed before trimming.
Return `estimated` as part of `_trim_sections`'s return tuple to eliminate the redundant render:
```python
def _trim_sections(...) -> tuple[Task | None, tuple[ToolDefinition, ...], Company | None, int]:
# ... (loop as before)
_, estimated = _render_and_estimate(...)
# ... (logging as before)
return task, available_tools, company, estimated # Return estimated
```
Then in `_render_with_trimming` (lines 562–581):
```python
if max_tokens is not None and estimated > max_tokens:
task, available_tools, company, estimated = _trim_sections(...)
# (no second _render_and_estimate call needed)
```
How can I resolve this? If you propose a fix, please make it concise.| if max_tokens is not None and max_tokens <= 0: | ||
| msg = f"max_tokens must be > 0, got {max_tokens}" | ||
| logger.error( | ||
| PROMPT_BUILD_ERROR, | ||
| agent_id=str(agent.id), | ||
| agent_name=agent.name, | ||
| max_tokens=max_tokens, | ||
| ) | ||
| raise PromptBuildError(msg) | ||
|
|
||
| logger.info( | ||
| PROMPT_BUILD_START, | ||
| agent_id=str(agent.id), | ||
| agent_name=agent.name, | ||
| has_task=task is not None, | ||
| tool_count=len(available_tools), | ||
| has_company=company is not None, | ||
| has_custom_template=custom_template is not None, | ||
| ) |
There was a problem hiding this comment.
PROMPT_BUILD_ERROR is logged at line 186–191 before PROMPT_BUILD_START is reached at line 194. If max_tokens <= 0, the exception at line 192 prevents PROMPT_BUILD_START from being emitted, creating an orphaned error event in the log stream.
Log-based monitoring and dashboards that correlate start → error pairs will silently miss this failure mode. Move the start event before the max_tokens validation check to ensure all error paths are paired with their start event.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/prompt.py
Line: 184-202
Comment:
`PROMPT_BUILD_ERROR` is logged at line 186–191 before `PROMPT_BUILD_START` is reached at line 194. If `max_tokens <= 0`, the exception at line 192 prevents `PROMPT_BUILD_START` from being emitted, creating an orphaned error event in the log stream.
Log-based monitoring and dashboards that correlate start → error pairs will silently miss this failure mode. Move the start event before the max_tokens validation check to ensure all error paths are paired with their start event.
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
build_system_prompt()— constructs contextually rich system prompts from agent identity, personality, skills, authority, and autonomy levelSystemPromptfrozen Pydantic model as the immutable result typePromptTokenEstimatorprotocol +DefaultTokenEstimator(len//4 heuristic)SandboxedEnvironment-based template rendering with custom template supportEngineError/PromptBuildErrorerror hierarchy for the engine layerAUTONOMY_INSTRUCTIONSmapping — seniority-level-specific autonomy text for all 8 levelsobservability.eventsCLAUDE.mdpost-implementation workflow sectionTest Plan
prompt.pyat 100% line coverageReview Coverage
Pre-reviewed by 8 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit). 16 findings addressed:
seniority_infoparameter (dead code, flagged by 6 agents)SandboxedEnvironmentinstead of creating two per callFinal+ import-time completeness check onAUTONOMY_INSTRUCTIONSlisttotuple(immutability)noqaexplanations_build_core_contextand_render_and_estimatehelpers (polish pass)src_pychangeCloses #13