Skip to content

feat: implement personality injection and system prompt construction#105

Merged
Aureliolo merged 3 commits intomainfrom
feat/prompt-builder
Mar 5, 2026
Merged

feat: implement personality injection and system prompt construction#105
Aureliolo merged 3 commits intomainfrom
feat/prompt-builder

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Implement build_system_prompt() — constructs contextually rich system prompts from agent identity, personality, skills, authority, and autonomy level
  • Add SystemPrompt frozen Pydantic model as the immutable result type
  • Add PromptTokenEstimator protocol + DefaultTokenEstimator (len//4 heuristic)
  • Add Jinja2 SandboxedEnvironment-based template rendering with custom template support
  • Implement progressive token-budget trimming (company → tools → task)
  • Add EngineError / PromptBuildError error hierarchy for the engine layer
  • Add AUTONOMY_INSTRUCTIONS mapping — seniority-level-specific autonomy text for all 8 levels
  • Add 6 new prompt event constants to observability.events
  • Add CLAUDE.md post-implementation workflow section

Test Plan

  • 32 unit tests covering: prompt construction, personality injection, authority boundaries, task/tool/company context, seniority autonomy, token estimation, budget trimming, section tracking, versioning, metadata, model immutability, structured logging, error paths (invalid syntax, render failure, exception chaining), trimming priority order
  • prompt.py at 100% line coverage
  • 1704 total tests passing, 95.16% overall coverage
  • mypy strict clean, ruff clean

Review Coverage

Pre-reviewed by 8 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit). 16 findings addressed:

  1. Removed unused seniority_info parameter (dead code, flagged by 6 agents)
  2. Reused single SandboxedEnvironment instead of creating two per call
  3. Added Final + import-time completeness check on AUTONOMY_INSTRUCTIONS
  4. Added error path tests (invalid template syntax, render failures, exception chaining)
  5. Added top-level error wrapping for unpaired start/error log events
  6. Switched template context values from list to tuple (immutability)
  7. Added trimming priority order test
  8. Expanded metadata assertions to all 5 keys
  9. Improved docstrings (protocol, estimator, model)
  10. Fixed misleading comments and noqa explanations
  11. Extracted _build_core_context and _render_and_estimate helpers (polish pass)
  12. Added missing fixture docstring
  13. Updated pre-pr-review skill: resilience audit now triggers on any src_py change

Closes #13

Aureliolo and others added 2 commits March 5, 2026 18:14
…13)

Add build_system_prompt() that renders agent identity, personality, skills,
authority, and seniority into structured system prompts via Jinja2 templates.
Supports optional task/tool/company context, token-budget trimming, custom
templates, and pluggable token estimation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 8 agents, 16 findings addressed:

- Remove unused seniority_info parameter (dead code, 6 agents flagged)
- Reuse single SandboxedEnvironment instead of creating two per call
- Add Final annotation and import-time completeness check on AUTONOMY_INSTRUCTIONS
- Add error path tests (invalid template syntax, render failures)
- Add top-level error wrapping in build_system_prompt for unpaired logs
- Use tuples instead of lists for template context values (immutability)
- Add trimming priority order test
- Expand metadata assertions to cover all 5 keys
- Improve docstrings (PromptTokenEstimator, DefaultTokenEstimator, SystemPrompt)
- Fix misleading comments and noqa explanations
- Extract _build_core_context and _render_and_estimate helpers (polish pass)
- Add missing fixture docstring
- Update pre-pr-review skill: resilience audit now triggers on any src_py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 5, 2026 17:33
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cc87c8fe-8563-455c-bbb5-9de0b847c1d3

📥 Commits

Reviewing files that changed from the base of the PR and between fe6b56f and 07d8833.

📒 Files selected for processing (5)
  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/prompt_template.py
  • src/ai_company/observability/events.py
  • tests/unit/engine/conftest.py
  • tests/unit/engine/test_prompt.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (without parentheses) for exception handling — ruff enforces this on Python 3.14
Include type hints on all public functions, validated with mypy strict mode
Use Google-style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones — prioritize immutability
Use Pydantic v2 for data models with BaseModel, model_validator, and ConfigDict
Enforce 88 character line length (ruff configured)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files) in Python code
Every module with business logic must import logging with: from ai_company.observability import get_logger then logger = get_logger(__name__)
Always use variable name logger (not _logger, not log) for the logging instance
Pure data models, enums, and re-exports do not require logging

Files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/prompt_template.py
  • tests/unit/engine/conftest.py
  • src/ai_company/observability/events.py
  • tests/unit/engine/test_prompt.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Never use import logging, logging.getLogger(), or print() in application code — use the project's logger
Use event name constants from ai_company.observability.events rather than inline strings
Use structured logging format: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR level with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
All provider calls must go through BaseCompletionProvider which applies retry and rate limiting automatically

Files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/prompt_template.py
  • src/ai_company/observability/events.py
src/ai_company/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains

Files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/prompt_template.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Use asyncio_mode = 'auto' for async tests — no manual @pytest.mark.asyncio needed
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider), never real vendor model IDs

Files:

  • tests/unit/engine/conftest.py
  • tests/unit/engine/test_prompt.py
🧠 Learnings (14)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state

Applied to files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/prompt_template.py
  • tests/unit/engine/test_prompt.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to **/*.py : Keep functions under 50 lines and files under 800 lines

Applied to files:

  • src/ai_company/engine/prompt.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • src/ai_company/engine/prompt.py
  • tests/unit/engine/test_prompt.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest in `tests/unit/` with `test_*.py` naming convention

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to tests/**/*.py : Use vendor-agnostic fake model IDs/names in tests (e.g. `test-haiku-001`, `test-provider`), never real vendor model IDs

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` rather than inline strings

Applied to files:

  • src/ai_company/observability/events.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/engine/test_prompt.py
🔇 Additional comments (25)
src/ai_company/engine/prompt_template.py (3)

1-13: LGTM!

Module docstring is clear, imports are minimal, and the version constant follows the Final[str] pattern correctly.


17-72: LGTM!

The AUTONOMY_INSTRUCTIONS mapping covers all 8 seniority levels with appropriate, distinct guidance for each. The import-time completeness check at lines 68-72 is a good defensive pattern that will catch any future enum additions that lack corresponding instructions.


76-159: LGTM!

The Jinja2 template is well-structured with conditional sections that gracefully handle missing optional context. Template variables align with the context builders in prompt.py (verified against _build_core_context and _build_template_context). The use of {% if %} guards ensures clean output when fields are empty or absent.

src/ai_company/observability/events.py (1)

131-140: LGTM!

The seven new prompt event constants follow the established domain.noun.verb naming convention and are consistent with existing event patterns. The # noqa: S105 comment appropriately clarifies that TOKEN_TRIMMED is an event name, not a credential.

tests/unit/engine/conftest.py (3)

30-34: LGTM!

Vendor-agnostic model config using "test-provider" and "test-model-001" follows testing guidelines.


36-64: LGTM!

Comprehensive agent fixture with rich personality, skills, and authority configuration. All required fields (model, hiring_date) are properly supplied.


67-131: LGTM!

The remaining fixtures (sample_role_with_description, sample_task_with_criteria, sample_tool_definitions, sample_company) are well-structured with clear docstrings and provide realistic test data for prompt construction scenarios.

tests/unit/engine/test_prompt.py (9)

1-32: LGTM!

Imports are well-organized with TYPE_CHECKING for type-only imports. The module imports all necessary fixtures and production code.


37-109: LGTM!

The first set of tests in TestBuildSystemPrompt thoroughly validates basic prompt construction, personality trait inclusion, and that different personalities produce different prompts. Good use of vendor-agnostic model IDs.


110-253: LGTM!

Tests for role description, custom templates, authority boundaries, context injection, tool availability, and task context are comprehensive. The section absence tests (lines 221-252) properly verify that optional sections are excluded when not provided.


258-324: LGTM!

TestSeniorityAutonomy validates level-specific language and confirms all levels produce unique instructions. The _make_agent helper keeps tests DRY.


329-411: LGTM!

Token estimation tests cover the DefaultTokenEstimator behavior, verify estimated_tokens is populated, validate trimming triggers, and confirm custom estimators are invoked. The CountingEstimator inner class is an effective test double.


416-489: LGTM!

Versioning, section tracking, immutability, and metadata tests are well-structured. The frozen model test correctly expects ValidationError on mutation attempts.


494-575: LGTM!

Logging and error handling tests properly verify structured log events and exception chaining. The tests confirm that PromptBuildError wraps underlying errors with preserved cause chains.


580-657: LGTM!

TestTrimmingPriority validates the documented trimming order (company → tools → task) using clever budget calculations between full and partial prompts. This ensures the priority is enforced correctly.


662-787: LGTM!

Edge case tests for default agents, zero-budget tasks, budget-exceeded warnings, and the catch-all exception wrapper are comprehensive. The monkeypatch approach for simulating unexpected errors is appropriate.

src/ai_company/engine/prompt.py (9)

1-49: LGTM!

Module structure is clean with appropriate imports, structured logging setup using get_logger(__name__), and a module-level SandboxedEnvironment with clear thread-safety documentation.


55-82: LGTM!

The SystemPrompt model is well-designed with frozen configuration, appropriate field constraints (ge=0 for tokens), and clear docstrings. The metadata field documents its read-only intent.


87-138: LGTM!

The PromptTokenEstimator protocol with @runtime_checkable enables duck typing while the DefaultTokenEstimator provides a reasonable heuristic. Section constants and _TRIMMABLE_SECTIONS clearly document the trimming priority.


143-238: Past review comment addressed: max_tokens validation is now present at lines 184-192.

The build_system_prompt function now correctly validates max_tokens <= 0 at the API boundary before any processing. However, the function still exceeds the 50-line guideline at ~95 lines. While the logic is well-organized with clear sections (validation → logging → orchestration → error handling → success logging), further extraction could improve maintainability.


244-274: LGTM!

The _resolve_template helper properly validates custom template syntax early and logs appropriate events for both success and failure paths.


276-362: LGTM!

Context builders _build_core_context and _build_template_context correctly assemble all template variables matching the DEFAULT_TEMPLATE references. The use of tuples for nested structures ensures immutability.


365-434: LGTM!

Helper functions _compute_sections, _render_template, and _build_metadata are focused and well-documented. Error handling in _render_template properly wraps Jinja2 errors.


437-519: Past review comment addressed: _trim_sections extracted as helper.

The trimming logic is now cleanly separated. The function logs both PROMPT_BUILD_TOKEN_TRIMMED (when sections are removed) and PROMPT_BUILD_BUDGET_EXCEEDED (when still over budget after trimming). The design choice to warn rather than raise on budget exceeded is confirmed by tests (TestBudgetExceeded.test_budget_exceeded_logs_warning).


522-628: LGTM!

The _render_with_trimming and _render_and_estimate functions complete the rendering pipeline. The flow is: initial render → check budget → trim if needed → recompute sections → return SystemPrompt. The final render after trimming ensures the returned content matches the trimmed context.


📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • System prompt builder with token estimation, custom templates, budget-aware trimming, observability events, and new engine-level error types.
  • Documentation
    • Broadened resilience audit guidance to all Python source and added Post-Implementation and Pre-PR workflow governance.
  • Tests
    • Extensive unit tests and fixtures covering prompt construction, token budgeting/trimming, template handling, event logging, and error scenarios.

Walkthrough

Adds a Jinja2-based system-prompt builder (with token estimation, budget-aware trimming, observability events, and error types), public engine re-exports, prompt template/constants, extensive unit tests and fixtures, and documentation/workflow updates broadening resilience-audit scope and post-implementation guidance.

Changes

Cohort / File(s) Summary
Documentation & Workflow
\.claude/skills/pre-pr-review/SKILL.md, CLAUDE.md
Broaden resilience-audit scope to all Python source files; expand hard/soft rule set and resilience prompts; add/duplicate Post-Implementation workflow and explicit /pre-pr-review usage guidance.
Engine Public API
src/ai_company/engine/__init__.py
New public re-exports: EngineError, PromptBuildError, DefaultTokenEstimator, PromptTokenEstimator, SystemPrompt, build_system_prompt.
Engine Errors
src/ai_company/engine/errors.py
Adds engine-layer exceptions: EngineError and PromptBuildError for prompt construction failures.
Prompt Construction
src/ai_company/engine/prompt.py
Implements prompt builder: SystemPrompt model, build_system_prompt API, sandboxed Jinja2 rendering, token-estimator protocol and DefaultTokenEstimator, token-budget trimming loop, template resolution, metadata, and observability event emission.
Prompt Templates & Constants
src/ai_company/engine/prompt_template.py
Adds template version, DEFAULT_TEMPLATE, and AUTONOMY_INSTRUCTIONS per seniority; validates coverage for all SeniorityLevel values.
Observability Events
src/ai_company/observability/events.py
Adds prompt lifecycle event constants: PROMPT_BUILD_START, PROMPT_BUILD_SUCCESS, PROMPT_BUILD_TOKEN_TRIMMED, PROMPT_BUILD_ERROR, PROMPT_BUILD_BUDGET_EXCEEDED, PROMPT_CUSTOM_TEMPLATE_LOADED, PROMPT_CUSTOM_TEMPLATE_FAILED.
Tests & Fixtures
tests/unit/engine/conftest.py, tests/unit/engine/test_prompt.py
New fixtures for ModelConfig/Agent/Role/Task/Tool/Company and an extensive test suite covering prompt rendering, token estimation/trimming, custom templates, error handling, event logging, immutability, and edge cases.
Manifest / Minor
manifest_file, pyproject.toml
Test and package manifest additions reflecting new modules and tests.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Builder as build_system_prompt
    participant Template as Jinja2 Renderer
    participant Estimator as TokenEstimator
    participant Logger as Observability
    participant Error as ErrorHandler

    Caller->>Builder: call(agent, role, task, tools, company, max_tokens, custom_template)
    Builder->>Logger: PROMPT_BUILD_START

    alt custom_template provided
        Builder->>Template: validate custom template
        alt validation fails
            Template-->>Error: validation error
            Error->>Logger: PROMPT_CUSTOM_TEMPLATE_FAILED
            Error-->>Caller: raise PromptBuildError
        else
            Builder->>Logger: PROMPT_CUSTOM_TEMPLATE_LOADED
        end
    end

    Builder->>Template: render template with context
    alt render fails
        Template-->>Error: render error
        Error->>Logger: PROMPT_BUILD_ERROR
        Error-->>Caller: raise PromptBuildError
    else
        Template-->>Builder: rendered_content
    end

    Builder->>Estimator: estimate_tokens(rendered_content)
    Estimator-->>Builder: token_count

    alt max_tokens set and token_count > max_tokens
        loop trimming until within budget
            Builder->>Builder: trim sections (company → tools → task)
            Builder->>Template: re-render trimmed context
            Builder->>Estimator: re-estimate_tokens
            Builder->>Logger: PROMPT_BUILD_TOKEN_TRIMMED
        end
        Builder->>Logger: PROMPT_BUILD_BUDGET_EXCEEDED
    end

    Builder->>Builder: assemble SystemPrompt (content, version, tokens, sections, metadata)
    Builder->>Logger: PROMPT_BUILD_SUCCESS
    Builder-->>Caller: return SystemPrompt
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 96.67% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main feature: personality injection and system prompt construction, which aligns with the primary objective and changes in the PR.
Description check ✅ Passed The description is detailed and directly related to the changeset, covering the implementation of build_system_prompt, token estimation, error classes, and comprehensive testing.
Linked Issues check ✅ Passed All acceptance criteria from issue #13 are addressed: template system (Jinja2), personality trait injection, role/authority/company context, tool description, token budgeting with trimming, unit tests, token estimation utility, and versioning support.
Out of Scope Changes check ✅ Passed Changes in CLAUDE.md and pre-pr-review skills documentation are supplementary workflow/documentation updates; all code changes directly support the core objective of personality injection and system prompt construction.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/prompt-builder

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive system for generating dynamic and context-aware system prompts for AI agents. It centralizes the logic for integrating agent identity, personality, skills, and various contextual elements into prompts, ensuring that agents receive tailored instructions. The new system also incorporates token budget management through estimation and progressive trimming, enhancing efficiency and control over prompt length. This foundational work significantly improves the flexibility and robustness of agent behavior definition.

Highlights

  • System Prompt Construction: Implemented build_system_prompt() to construct contextually rich system prompts from agent identity, personality, skills, authority, and autonomy levels.
  • Prompt Result Model: Introduced SystemPrompt as a frozen Pydantic model to represent the immutable result of system prompt construction.
  • Token Estimation and Trimming: Added a PromptTokenEstimator protocol and DefaultTokenEstimator (using a len//4 heuristic) for token estimation, along with progressive token-budget trimming for optional sections (company → tools → task).
  • Templating Engine: Integrated Jinja2 SandboxedEnvironment for flexible template rendering, supporting custom templates.
  • Error Handling: Established a new error hierarchy with EngineError and PromptBuildError for robust error management within the engine layer.
  • Autonomy Instructions: Defined AUTONOMY_INSTRUCTIONS mapping to provide seniority-level-specific autonomy text for all 8 levels.
  • Observability: Added 6 new prompt event constants to observability.events for better tracking and debugging.
  • Documentation and Review Process Updates: Updated CLAUDE.md with a post-implementation workflow section and modified the resilience-audit skill to trigger on any src_py change, expanding its custom prompt rules.
  • Pre-Review Findings Addressed: Addressed 16 findings from pre-PR review by 8 agents, including optimizing SandboxedEnvironment usage, adding error path tests, improving docstrings, and ensuring immutability of template context values.
Changelog
  • .claude/skills/pre-pr-review/SKILL.md
    • Updated the trigger condition for the resilience-audit skill to apply to any src_py change.
    • Expanded the custom prompt rules for resilience-audit to include new hard and soft rules for resilience violations across all code.
  • CLAUDE.md
    • Added a new "Post-Implementation (MANDATORY)" section detailing mandatory steps after finishing an issue implementation.
  • src/ai_company/engine/init.py
    • Exported new prompt construction components including EngineError, PromptBuildError, DefaultTokenEstimator, PromptTokenEstimator, SystemPrompt, and build_system_prompt.
  • src/ai_company/engine/errors.py
    • Created EngineError as the base exception for engine-layer errors.
    • Created PromptBuildError specifically for failures during system prompt construction.
  • src/ai_company/engine/prompt.py
    • Implemented the build_system_prompt function, which constructs system prompts based on agent identity and contextual information.
    • Defined the SystemPrompt Pydantic model to represent the immutable result of prompt construction, including content, version, estimated tokens, sections, and metadata.
    • Introduced PromptTokenEstimator protocol and DefaultTokenEstimator for estimating token counts.
    • Integrated Jinja2 SandboxedEnvironment for secure and flexible template rendering.
    • Implemented logic for progressive token-budget trimming of optional prompt sections.
    • Added internal helper functions for template resolution, context building, rendering, and metadata generation.
  • src/ai_company/engine/prompt_template.py
    • Defined PROMPT_TEMPLATE_VERSION for tracking template changes.
    • Created AUTONOMY_INSTRUCTIONS, a mapping of SeniorityLevel to specific autonomy guidelines.
    • Provided the DEFAULT_TEMPLATE string, a Jinja2 template used for generating system prompts.
  • src/ai_company/observability/events.py
    • Added new constants for prompt construction events: PROMPT_BUILD_START, PROMPT_BUILD_SUCCESS, PROMPT_BUILD_TOKEN_TRIMMED, PROMPT_BUILD_ERROR, PROMPT_CUSTOM_TEMPLATE_LOADED, and PROMPT_CUSTOM_TEMPLATE_FAILED.
  • tests/unit/engine/conftest.py
    • Added Pytest fixtures for sample_model_config, sample_agent_with_personality, sample_role_with_description, sample_task_with_criteria, sample_tool_definitions, and sample_company to facilitate unit testing of prompt construction.
  • tests/unit/engine/test_prompt.py
    • Implemented comprehensive unit tests for the build_system_prompt function, covering various scenarios including personality injection, role descriptions, custom templates, authority boundaries, company context, tool availability, task context, token estimation, trimming logic, prompt versioning, section tracking, metadata, immutability, logging, and error handling.
Activity
  • Pre-reviewed by 8 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit).
  • Addressed 16 findings from the pre-PR review, including removing unused parameters, reusing a single SandboxedEnvironment instance, adding Final and import-time completeness checks, expanding error path tests, improving top-level error wrapping, switching template context values to tuple for immutability, adding trimming priority order tests, expanding metadata assertions, improving docstrings, fixing misleading comments, extracting helper functions, and updating the resilience-audit skill trigger.
  • Achieved 100% line coverage for prompt.py.
  • Ensured 1704 total tests are passing with 95.16% overall coverage.
  • Confirmed mypy strict clean and ruff clean.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR implements build_system_prompt(), which translates an AgentIdentity (personality, skills, authority, seniority level) into a structured Jinja2-rendered system prompt. It adds the SystemPrompt frozen Pydantic result model, a PromptTokenEstimator protocol with DefaultTokenEstimator, and progressive token-budget trimming (company → tools → task). The implementation fits cleanly into the existing engine layer.

Key changes:

  • src/ai_company/engine/prompt.py — core build_system_prompt() implementation, template resolution, token trimming, and structured logging
  • src/ai_company/engine/prompt_template.py — default Jinja2 template with import-time completeness check
  • src/ai_company/engine/errors.pyEngineError → PromptBuildError hierarchy
  • src/ai_company/observability/events.py — six new prompt.* event constants
  • tests/unit/engine/test_prompt.py — 32 unit tests at 100% line coverage

Three issues identified:

  • PROMPT_BUILD_ERROR logged before PROMPT_BUILD_START in the early validation path (max_tokens ≤ 0), creating orphaned log events that break log-consumer correlation patterns.
  • metadata field is a mutable dict, allowing in-place mutation despite the model being frozen and documented as an "immutable result type."
  • _trim_sections renders the prompt a final time after the trimming loop purely for logging, then _render_with_trimming renders again with identical inputs — one redundant Jinja2 render per trimming call.

Confidence Score: 3/5

  • Safe to merge with fixes — three polish issues affect observability, immutability guarantees, and token-trimming performance, not functional correctness.
  • The PR is functionally sound with 100% test coverage and mypy strict compliance. However, three distinct issues warrant attention: (1) orphaned log events when max_tokens ≤ 0 breaks log-consumer correlation, (2) mutable dict in a frozen model undermines immutability guarantees, and (3) redundant Jinja2 renders in the trimming path cause unnecessary computation. None are correctness bugs, but they represent quality gaps in observability, design consistency, and performance optimization that the author should address.
  • src/ai_company/engine/prompt.py — the max_tokens validation block (logging order), metadata field type, and _trim_sections / _render_with_trimming render hand-off.

Last reviewed commit: 07d8833

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 5, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-designed and robust implementation for constructing contextual system prompts for AI agents, featuring personality injection, token budget trimming, and custom template support. However, the system is vulnerable to prompt injection as untrusted data from sources like task descriptions and agent names is rendered directly into Jinja2 templates without sanitization or clear demarcation, potentially allowing an attacker to manipulate agent behavior. Additionally, there is a minor suggestion to improve the readability of the generated tool list within the default prompt template.

Comment on lines +76 to +160
DEFAULT_TEMPLATE: Final[str] = """\
## Identity

You are **{{ agent_name }}**, a {{ agent_level }} {{ agent_role }} \
in the {{ agent_department }} department.
{% if role_description %}
**Role**: {{ role_description }}
{% endif %}

## Personality
{% if personality_description %}
{{ personality_description }}
{% endif %}
- **Communication style**: {{ communication_style }}
- **Risk tolerance**: {{ risk_tolerance }}
- **Creativity**: {{ creativity }}
{% if personality_traits %}
- **Traits**: {{ personality_traits | join(', ') }}
{% endif %}

## Skills
{% if primary_skills %}
- **Primary**: {{ primary_skills | join(', ') }}
{% endif %}
{% if secondary_skills %}
- **Secondary**: {{ secondary_skills | join(', ') }}
{% endif %}

## Authority
{% if can_approve %}
- **Can approve**: {{ can_approve | join(', ') }}
{% endif %}
{% if reports_to %}
- **Reports to**: {{ reports_to }}
{% endif %}
{% if can_delegate_to %}
- **Can delegate to**: {{ can_delegate_to | join(', ') }}
{% endif %}
{% if budget_limit > 0 %}
- **Budget limit**: ${{ "%.2f" | format(budget_limit) }} per task
{% endif %}

## Autonomy

{{ autonomy_instructions }}
{% if task %}

## Current Task

**{{ task.title }}**

{{ task.description }}
{% if task.acceptance_criteria %}

### Acceptance Criteria
{% for criterion in task.acceptance_criteria %}
- {{ criterion.description }}
{% endfor %}
{% endif %}
{% if task.budget_limit > 0 %}

**Task budget**: ${{ "%.2f" | format(task.budget_limit) }}
{% endif %}
{% if task.deadline %}
**Deadline**: {{ task.deadline }}
{% endif %}
{% endif %}
{% if tools %}

## Available Tools
{% for tool in tools %}
- **{{ tool.name }}**{% if tool.description %}: {{ tool.description }}{% endif %}

{% endfor %}
{% endif %}
{% if company %}

## Company Context

You work at **{{ company.name }}**.
{% if company_departments %}
**Departments**: {{ company_departments | join(', ') }}
{% endif %}
{% endif %}
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The system prompt is constructed by rendering several fields (e.g., agent_name, task.description, tool.description) directly into a Jinja2 template without any sanitization or escaping. This creates a high-severity prompt injection vulnerability, as untrusted input could allow an attacker to manipulate the agent's behavior. For example, a malicious task.description could override the agent's instructions.

Additionally, there is a minor formatting issue on line 148: a blank line adds an extra newline between each item in the 'Available Tools' list, making it look sparse. For a more standard, compact list format, this line should be removed.

Remediation for prompt injection: Sanitize all user-provided inputs before rendering them into the prompt. Use delimiters to clearly separate data from instructions, and explicitly instruct the LLM to treat the content within those delimiters as untrusted data (e.g., wrap task descriptions in tags like <task_description>...</task_description>).

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/engine/prompt.py`:
- Around line 140-225: The build_system_prompt function is too large; refactor
it into smaller helpers to keep the entrypoint under 50 lines: extract input
validation and estimator/template resolution into a helper (e.g., new
_prepare_prompt_inputs(agent, role, task, available_tools, company,
custom_template, token_estimator) that returns estimator and template_str), move
the try/except orchestration and call to _render_with_trimming into a concise
runner (e.g., _execute_prompt_render(template_str, agent, role, task,
available_tools, company, max_tokens, estimator) that raises PromptBuildError on
failures), and isolate the success/error logging (use _log_prompt_start(agent,
...) and _log_prompt_result(result, ...) or similar). Update build_system_prompt
to call these helpers in sequence (logging start, calling
_prepare_prompt_inputs, then _execute_prompt_render, then logging success) so
build_system_prompt only orchestrates high-level steps while retaining existing
calls to _resolve_template, _render_with_trimming, and the same logger messages.
- Around line 422-511: The _render_with_trimming function is over the 50-line
guideline—extract three helpers to simplify it: 1) a helper that attempts to
"trim one section" given the current section name and mutable contexts (use
symbols _TRIMMABLE_SECTIONS, _SECTION_COMPANY, _SECTION_TOOLS, _SECTION_TASK and
the local variables company, available_tools, task) and returns whether it
trimmed; 2) a "re-render_and_estimate" wrapper around _render_and_estimate that
returns (content, estimated) for current
agent/role/task/available_tools/company/estimator to centralize repeated calls;
and 3) a "handle_post_trim_overflow" helper that logs PROMPT_BUILD_TOKEN_TRIMMED
and computes sections via _compute_sections and returns the final SystemPrompt
construction values (content, estimated, sections, metadata). Replace the
inlined loop and logging in _render_with_trimming with calls to these helpers
and keep SystemPrompt creation using SystemPrompt, PROMPT_TEMPLATE_VERSION, and
_build_metadata(agent).
- Around line 462-510: The function currently best-effort trims sections
(_TRIMMABLE_SECTIONS via _SECTION_COMPANY, _SECTION_TOOLS, _SECTION_TASK) using
_render_and_estimate but may still return a SystemPrompt that exceeds
max_tokens; change the logic after the trimming loop to enforce the limit: if
estimated > max_tokens then log PROMPT_BUILD_TOKEN_TRIMMED with details
(agent_id, max_tokens, estimated, trimmed_sections) and raise a clear exception
(e.g., ValueError) instead of returning success so callers cannot receive an
over-budget prompt; ensure the exception message includes agent.id, max_tokens
and estimated_tokens to aid debugging.
- Around line 181-190: The code accepts non-positive max_tokens and lets it
reach the trimming logic; add an explicit validation at the API boundary (in the
function that emits PROMPT_BUILD_START / accepts the max_tokens argument) to
reject values <= 0: check that max_tokens is an int and > 0 and raise a
ValueError with a clear message (e.g., "max_tokens must be a positive integer")
before any trimming or budget calculations that use max_tokens so invalid inputs
never reach trimming logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e8249802-6eb7-4101-b09d-e753f5ef9975

📥 Commits

Reviewing files that changed from the base of the PR and between c51068b and fe6b56f.

📒 Files selected for processing (10)
  • .claude/skills/pre-pr-review/SKILL.md
  • CLAUDE.md
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/errors.py
  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/prompt_template.py
  • src/ai_company/observability/events.py
  • tests/unit/engine/__init__.py
  • tests/unit/engine/conftest.py
  • tests/unit/engine/test_prompt.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (without parentheses) for exception handling — ruff enforces this on Python 3.14
Include type hints on all public functions, validated with mypy strict mode
Use Google-style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones — prioritize immutability
Use Pydantic v2 for data models with BaseModel, model_validator, and ConfigDict
Enforce 88 character line length (ruff configured)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files) in Python code
Every module with business logic must import logging with: from ai_company.observability import get_logger then logger = get_logger(__name__)
Always use variable name logger (not _logger, not log) for the logging instance
Pure data models, enums, and re-exports do not require logging

Files:

  • src/ai_company/engine/__init__.py
  • src/ai_company/observability/events.py
  • src/ai_company/engine/errors.py
  • tests/unit/engine/test_prompt.py
  • src/ai_company/engine/prompt_template.py
  • tests/unit/engine/conftest.py
  • src/ai_company/engine/prompt.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Never use import logging, logging.getLogger(), or print() in application code — use the project's logger
Use event name constants from ai_company.observability.events rather than inline strings
Use structured logging format: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR level with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
All provider calls must go through BaseCompletionProvider which applies retry and rate limiting automatically

Files:

  • src/ai_company/engine/__init__.py
  • src/ai_company/observability/events.py
  • src/ai_company/engine/errors.py
  • src/ai_company/engine/prompt_template.py
  • src/ai_company/engine/prompt.py
src/ai_company/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains

Files:

  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/errors.py
  • src/ai_company/engine/prompt_template.py
  • src/ai_company/engine/prompt.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Use asyncio_mode = 'auto' for async tests — no manual @pytest.mark.asyncio needed
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider), never real vendor model IDs

Files:

  • tests/unit/engine/test_prompt.py
  • tests/unit/engine/conftest.py
🧠 Learnings (27)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/prompts/**/*.yaml : YAML prompt templates should be loaded at runtime from `src/prompts/` directory
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` rather than inline strings

Applied to files:

  • src/ai_company/observability/events.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR level with context before raising

Applied to files:

  • src/ai_company/engine/errors.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/engine/test_prompt.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state

Applied to files:

  • tests/unit/engine/test_prompt.py
  • src/ai_company/engine/prompt.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • CLAUDE.md
  • .claude/skills/pre-pr-review/SKILL.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: For trivial/docs-only changes, use `/pre-pr-review quick` to skip agents but still run automated checks

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Always create a PR for issue work. When implementing changes for a GitHub issue, create a branch and open a pull request. Do not wait to be asked.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Pre-commit hooks enforce: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch (main), ruff check+format, gitleaks

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Never create a PR directly — always use `/pre-pr-review` command to create PRs with automated checks and review agents

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: No bypassing CI - never use `git push --no-verify` or modify test coverage thresholds to make tests pass. If tests fail, fix the actual issue. Pre-push hooks exist to catch problems before they reach CI.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Never defer work—do not suggest "this can be done later" or "consider for a future PR". Complete all requested changes fully.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: After a PR exists, use `/aurelio-review-pr` to handle external reviewer feedback

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Never defer work. Do not suggest 'this can be done later' or 'consider for a future PR'. Complete all requested changes fully.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Use branch naming format `<type>/<slug>` from main

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: After every push, you MUST check that CI passes. If CI fails, fix the issue immediately and push again until all checks are green. Never walk away from a failing CI pipeline.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • .claude/skills/pre-pr-review/SKILL.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest in `tests/unit/` with `test_*.py` naming convention

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-03-05T14:30:10.714Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T14:30:10.714Z
Learning: Applies to tests/**/*.py : Use vendor-agnostic fake model IDs/names in tests (e.g. `test-haiku-001`, `test-provider`), never real vendor model IDs

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

  • tests/unit/engine/conftest.py
🪛 LanguageTool
.claude/skills/pre-pr-review/SKILL.md

[style] ~203-~203: Since ownership is already implied, this phrasing may be redundant.
Context: ...layer):** 1. Driver subclass implements its own retry/backoff logic instead of relying ...

(PRP_OWN)


[style] ~207-~207: This phrase is redundant. Consider using “outside”.
Context: .... asyncio.sleep used for retry delays outside of RetryHandler (MAJOR) **Hard rules (a...

(OUTSIDE_OF)


[style] ~212-~212: Consider using the typographical ellipsis character here instead.
Context: ...8. Manual retry/backoff patterns (e.g., for attempt in range(...), while retries > 0, time.sleep in...

(ELLIPSIS)

🔇 Additional comments (17)
CLAUDE.md (1)

114-118: LGTM!

The new "Post-Implementation" section provides clear workflow guidance that complements the existing "Pre-PR Review" section, ensuring developers know to branch/commit/push before using /pre-pr-review.

src/ai_company/engine/errors.py (1)

1-9: LGTM!

Clean, minimal error hierarchy with appropriate docstrings. The exception definitions follow the guideline that pure data models and enums don't require logging—the calling code in prompt.py is responsible for logging before raising these errors.

.claude/skills/pre-pr-review/SKILL.md (1)

196-218: LGTM!

The expanded resilience-audit scope from provider-only to "any code" is a valuable improvement. The new hard rules (6-8) for catching broad exception handling and manual retry patterns across the entire codebase will help maintain resilience consistency. The renumbered soft rules (9-12) provide good guidance for non-retryable error classification.

src/ai_company/engine/prompt_template.py (2)

68-72: LGTM!

Excellent defensive programming—the runtime validation ensures AUTONOMY_INSTRUCTIONS stays in sync with the SeniorityLevel enum. If a new level is added to the enum, this will fail fast at import time with a clear error message rather than silently producing incomplete prompts.


76-160: LGTM!

The Jinja2 template is well-structured with:

  • Clear section headers matching the sections tuple tracked in SystemPrompt
  • Conditional rendering with {% if %} for optional sections (task, tools, company)
  • Proper budget formatting with "%.2f" | format()
  • Immutable Final[str] declaration preventing accidental modification
src/ai_company/engine/__init__.py (1)

1-21: LGTM!

Clean public API surface with appropriate re-exports. The __all__ list is alphabetically sorted and includes the essential symbols for system prompt construction. The module correctly follows the guideline that re-export __init__.py files don't require logging.

tests/unit/engine/conftest.py (2)

30-33: LGTM!

Fixtures correctly use vendor-agnostic fake model IDs ("test-provider", "test-model-001") as required by the coding guidelines, ensuring tests aren't coupled to external providers.


36-131: LGTM!

Comprehensive fixtures with:

  • Proper docstrings explaining each fixture's purpose
  • Immutable tuples for collection fields (traits, primary, acceptance_criteria, departments)
  • Realistic data structures that exercise all prompt template sections
  • Appropriate scope (function-level by default)
src/ai_company/observability/events.py (1)

131-139: LGTM!

New event constants follow the established domain.noun.verb naming convention and cover the complete prompt construction lifecycle (start, success, token_trimmed, error) plus custom template handling (loaded, failed). The noqa: S105 comment on line 136 correctly suppresses a false positive for the "token" substring in "token_trimmed".

tests/unit/engine/test_prompt.py (8)

37-54: LGTM!

Well-structured test class with clear test method naming. The minimal agent test validates all essential properties of the returned SystemPrompt.


71-108: LGTM!

Excellent test demonstrating that different personality configurations produce distinct prompts. The inline agent creation uses vendor-agnostic model IDs ("test", "test-001") as required by coding guidelines.


256-322: LGTM!

Good coverage of seniority-based autonomy instructions:

  • Tests verify specific phrases for junior ("Follow instructions carefully", "seek approval")
  • Tests verify senior autonomy ("Take ownership")
  • Tests verify C-suite scope ("company-wide authority", "vision")
  • Meta-test confirms all 8 seniority levels have unique instruction text

383-404: LGTM!

Good test for custom token estimator injection. The inline CountingEstimator class verifies that the protocol is correctly used during prompt construction.


450-481: LGTM!

Immutability and metadata tests are well-designed:

  • Frozen model test validates that mutation raises ValidationError
  • Metadata test verifies all 5 expected keys with correct values from the agent

487-520: LGTM!

Logging tests use structlog.testing.capture_logs() correctly to verify that prompt.build.start, prompt.build.success, and prompt.build.token_trimmed events are emitted during prompt construction.


526-565: LGTM!

Error handling tests cover:

  • Invalid Jinja2 syntax raises PromptBuildError with descriptive message
  • Exception chaining preserves the original cause
  • Undefined filter at render time raises PromptBuildError

These tests ensure proper error propagation and debugging context.


571-613: LGTM!

The trimming priority test cleverly calculates a budget between "without company" and "full" sizes to verify that company is trimmed first while tools and task are preserved. This validates the documented priority order: company → tools → task.

Comment on lines +462 to +510
if max_tokens is not None and estimated > max_tokens:
trimmed_sections: list[str] = []

for section in _TRIMMABLE_SECTIONS:
if estimated <= max_tokens:
break

if section == _SECTION_COMPANY and company is not None:
company = None
elif section == _SECTION_TOOLS and available_tools:
available_tools = ()
elif section == _SECTION_TASK and task is not None:
task = None
else:
continue

trimmed_sections.append(section)
content, estimated = _render_and_estimate(
template_str,
agent,
role,
task,
available_tools,
company,
estimator,
)

if trimmed_sections:
logger.warning(
PROMPT_BUILD_TOKEN_TRIMMED,
agent_id=str(agent.id),
max_tokens=max_tokens,
estimated_tokens=estimated,
trimmed_sections=trimmed_sections,
)

sections = _compute_sections(
task=task,
available_tools=available_tools,
company=company,
)

return SystemPrompt(
content=content,
template_version=PROMPT_TEMPLATE_VERSION,
estimated_tokens=estimated,
sections=sections,
metadata=_build_metadata(agent),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not return prompts that still exceed max_tokens.

At Line 462, trimming is best-effort only; if the prompt is still over budget after all trimmable sections are removed, the function still returns success. This breaks token-limit enforcement and can fail downstream provider calls.

🔧 Proposed fix
     if max_tokens is not None and estimated > max_tokens:
         trimmed_sections: list[str] = []

         for section in _TRIMMABLE_SECTIONS:
             if estimated <= max_tokens:
                 break
@@
         if trimmed_sections:
             logger.warning(
                 PROMPT_BUILD_TOKEN_TRIMMED,
                 agent_id=str(agent.id),
                 max_tokens=max_tokens,
                 estimated_tokens=estimated,
                 trimmed_sections=trimmed_sections,
             )
+        if estimated > max_tokens:
+            logger.error(
+                PROMPT_BUILD_ERROR,
+                agent_id=str(agent.id),
+                max_tokens=max_tokens,
+                estimated_tokens=estimated,
+                trimmed_sections=trimmed_sections,
+            )
+            msg = (
+                f"Prompt exceeds max_tokens={max_tokens} after trimming "
+                f"(estimated={estimated})"
+            )
+            raise PromptBuildError(msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/prompt.py` around lines 462 - 510, The function
currently best-effort trims sections (_TRIMMABLE_SECTIONS via _SECTION_COMPANY,
_SECTION_TOOLS, _SECTION_TASK) using _render_and_estimate but may still return a
SystemPrompt that exceeds max_tokens; change the logic after the trimming loop
to enforce the limit: if estimated > max_tokens then log
PROMPT_BUILD_TOKEN_TRIMMED with details (agent_id, max_tokens, estimated,
trimmed_sections) and raise a clear exception (e.g., ValueError) instead of
returning success so callers cannot receive an over-budget prompt; ensure the
exception message includes agent.id, max_tokens and estimated_tokens to aid
debugging.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Source fixes:
- Validate max_tokens > 0 at API boundary (raises PromptBuildError)
- Add PROMPT_BUILD_BUDGET_EXCEEDED warning when prompt still exceeds
  max_tokens after all trimmable sections removed
- Move PROMPT_CUSTOM_TEMPLATE_LOADED log after syntax validation
- Add read-only docstring warning to metadata field
- Add security comment explaining SandboxedEnvironment choice
- Add comment explaining intentional early-fail double-parse
- Extract _trim_sections helper from _render_with_trimming (now <50 lines)
- Remove blank line in tool list template (sparse output)

Test improvements:
- Add test for catch-all exception wrapping path
- Add test for all-sections-trimmed / budget-exceeded scenario
- Add test for tools-trimmed-before-task cascade level
- Add test for agent with default/empty personality
- Add test for task with zero budget and no deadline
- Add tests for max_tokens=0 and negative validation
- Add test for budget-exceeded warning log
- Assert company departments in prompt content
- Derive budget strings from fixture values instead of hardcoding
- Assert trimmed_sections content in trim log test

Minor:
- Fix misleading "prompt injection" wording in fixture docstring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aureliolo Aureliolo requested a review from Copilot March 5, 2026 18:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +82
You are **{{ agent_name }}**, a {{ agent_level }} {{ agent_role }} \
in the {{ agent_department }} department.
{% if role_description %}
**Role**: {{ role_description }}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing backslash in the template is a literal character inside a triple-quoted string, so it will likely appear in the rendered prompt and/or produce unintended formatting. Remove the literal \\ and rely on normal wrapping, or use Jinja2 whitespace control ({%- ... -%}) if the intent is to avoid newlines.

Suggested change
You are **{{ agent_name }}**, a {{ agent_level }} {{ agent_role }} \
in the {{ agent_department }} department.
{% if role_description %}
**Role**: {{ role_description }}
You are **{{ agent_name }}**, a {{ agent_level }} {{ agent_role }} in the {{ agent_department }} department.
{% if role_description %}
**Role**: {{ role_description }}
**Role**: {{ role_description }}

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +159
adds additional sections.

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trimming implementation removes the entire tools section by setting available_tools = (), not just 'tool descriptions'. Either update the docstring to match current behavior (company → tools → task), or implement a tools-only description trim (e.g., keep tool names but blank descriptions) to align with the documented contract.

Suggested change
adds additional sections.
sections are progressively trimmed in order: company context, tools
section, task details.

Copilot uses AI. Check for mistakes.
from ai_company.providers.models import ToolDefinition

logger = get_logger(__name__)

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the default Jinja2 Undefined, custom templates that reference missing variables typically render empty strings instead of failing, which can silently produce incomplete system prompts. Consider configuring the environment to raise on undefined variables (e.g., undefined=StrictUndefined) so template typos reliably surface as PromptBuildError.

Copilot uses AI. Check for mistakes.

def _resolve_template(custom_template: str | None) -> str:
"""Resolve the template string to use for rendering.

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom-template loaded/failed events don’t include correlation fields (e.g., agent_id, agent_name), making them harder to tie back to a specific build attempt. Consider binding agent context at the start of build_system_prompt() (e.g., logger.bind(...)) or passing agent_id/agent_name into _resolve_template() and including them in these log calls.

Copilot uses AI. Check for mistakes.
Comment on lines +251 to +254
The template string to render.

Raises:
PromptBuildError: If custom template syntax is invalid.
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom-template loaded/failed events don’t include correlation fields (e.g., agent_id, agent_name), making them harder to tie back to a specific build attempt. Consider binding agent context at the start of build_system_prompt() (e.g., logger.bind(...)) or passing agent_id/agent_name into _resolve_template() and including them in these log calls.

Copilot uses AI. Check for mistakes.

def _render_template(template_str: str, context: dict[str, Any]) -> str:
"""Render a Jinja2 template string with the given context.

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Render failures are logged without agent context, and build_system_prompt() intentionally re-raises PromptBuildError without adding another log, so the only error event may lack agent_id/agent_name. To improve observability/debugging, include agent identifiers in this log entry (by binding logger context earlier or by passing identifiers into _render_template()).

Suggested change
# Include agent identifiers in the log when available in the context
metadata = context.get("metadata", {}) if isinstance(context, dict) else {}
agent_id = metadata.get("agent_id")
agent_name = metadata.get("name")
logger.exception(
PROMPT_BUILD_ERROR,
error=str(exc),
agent_id=agent_id,
agent_name=agent_name,
)

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +81
metadata: dict[str, str] = Field(
description="Agent identity metadata (treat as read-only)",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SystemPrompt declares model_config = ConfigDict(frozen=True) and is documented as an "immutable result type." However, metadata: dict[str, str] is a mutable dict. Pydantic's frozen=True prevents field reassignment but does not prevent in-place mutation like prompt.metadata["agent_id"] = "hijacked".

For consistent immutability, use a read-only mapping type:

from collections.abc import Mapping

metadata: Mapping[str, str] = Field(
    description="Agent identity metadata (treat as read-only)",
)

Then update _build_metadata() to return dict (which is a Mapping), and the call at line 594 will work unchanged since Mapping is the interface.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/prompt.py
Line: 79-81

Comment:
`SystemPrompt` declares `model_config = ConfigDict(frozen=True)` and is documented as an "immutable result type." However, `metadata: dict[str, str]` is a mutable dict. Pydantic's `frozen=True` prevents field reassignment but does **not** prevent in-place mutation like `prompt.metadata["agent_id"] = "hijacked"`.

For consistent immutability, use a read-only mapping type:

```python
from collections.abc import Mapping

metadata: Mapping[str, str] = Field(
    description="Agent identity metadata (treat as read-only)",
)
```

Then update `_build_metadata()` to return `dict` (which is a `Mapping`), and the call at line 594 will work unchanged since `Mapping` is the interface.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +492 to +517
_, estimated = _render_and_estimate(
template_str,
agent,
role,
task,
available_tools,
company,
estimator,
)

if trimmed_sections:
logger.warning(
PROMPT_BUILD_TOKEN_TRIMMED,
agent_id=str(agent.id),
max_tokens=max_tokens,
estimated_tokens=estimated,
trimmed_sections=trimmed_sections,
)

if estimated > max_tokens:
logger.warning(
PROMPT_BUILD_BUDGET_EXCEEDED,
agent_id=str(agent.id),
max_tokens=max_tokens,
estimated_tokens=estimated,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_trim_sections calls _render_and_estimate at lines 492–500 solely to compute estimated for the warning logs. Then _render_with_trimming calls _render_and_estimate again at lines 573–580 with the exact same inputs to get the content for the returned SystemPrompt.

This causes a Jinja2 re-render on every trimming path. The first loop iteration (lines 469–477) also re-renders the same unchanged state that was already computed before trimming.

Return estimated as part of _trim_sections's return tuple to eliminate the redundant render:

def _trim_sections(...) -> tuple[Task | None, tuple[ToolDefinition, ...], Company | None, int]:
    # ... (loop as before)
    _, estimated = _render_and_estimate(...)
    # ... (logging as before)
    return task, available_tools, company, estimated  # Return estimated

Then in _render_with_trimming (lines 562–581):

if max_tokens is not None and estimated > max_tokens:
    task, available_tools, company, estimated = _trim_sections(...)
    # (no second _render_and_estimate call needed)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/prompt.py
Line: 492-517

Comment:
`_trim_sections` calls `_render_and_estimate` at lines 492–500 solely to compute `estimated` for the warning logs. Then `_render_with_trimming` calls `_render_and_estimate` **again** at lines 573–580 with the exact same inputs to get the `content` for the returned `SystemPrompt`.

This causes a Jinja2 re-render on every trimming path. The first loop iteration (lines 469–477) also re-renders the same unchanged state that was already computed before trimming.

Return `estimated` as part of `_trim_sections`'s return tuple to eliminate the redundant render:

```python
def _trim_sections(...) -> tuple[Task | None, tuple[ToolDefinition, ...], Company | None, int]:
    # ... (loop as before)
    _, estimated = _render_and_estimate(...)
    # ... (logging as before)
    return task, available_tools, company, estimated  # Return estimated
```

Then in `_render_with_trimming` (lines 562–581):

```python
if max_tokens is not None and estimated > max_tokens:
    task, available_tools, company, estimated = _trim_sections(...)
    # (no second _render_and_estimate call needed)
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +184 to +202
if max_tokens is not None and max_tokens <= 0:
msg = f"max_tokens must be > 0, got {max_tokens}"
logger.error(
PROMPT_BUILD_ERROR,
agent_id=str(agent.id),
agent_name=agent.name,
max_tokens=max_tokens,
)
raise PromptBuildError(msg)

logger.info(
PROMPT_BUILD_START,
agent_id=str(agent.id),
agent_name=agent.name,
has_task=task is not None,
tool_count=len(available_tools),
has_company=company is not None,
has_custom_template=custom_template is not None,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROMPT_BUILD_ERROR is logged at line 186–191 before PROMPT_BUILD_START is reached at line 194. If max_tokens <= 0, the exception at line 192 prevents PROMPT_BUILD_START from being emitted, creating an orphaned error event in the log stream.

Log-based monitoring and dashboards that correlate start → error pairs will silently miss this failure mode. Move the start event before the max_tokens validation check to ensure all error paths are paired with their start event.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/prompt.py
Line: 184-202

Comment:
`PROMPT_BUILD_ERROR` is logged at line 186–191 before `PROMPT_BUILD_START` is reached at line 194. If `max_tokens <= 0`, the exception at line 192 prevents `PROMPT_BUILD_START` from being emitted, creating an orphaned error event in the log stream.

Log-based monitoring and dashboards that correlate start → error pairs will silently miss this failure mode. Move the start event before the max_tokens validation check to ensure all error paths are paired with their start event.

How can I resolve this? If you propose a fix, please make it concise.

@Aureliolo Aureliolo merged commit 934dd85 into main Mar 5, 2026
11 checks passed
@Aureliolo Aureliolo deleted the feat/prompt-builder branch March 5, 2026 19:00
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
This was referenced Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement personality injection and system prompt construction

2 participants