Conversation
Remove the `disallow_untyped_defs = false` override for `tests.*` in pyproject.toml and fix all 225 mypy errors across 24 test files: - Add type parameters to 65 ModelFactory subclasses in 7 conftest files - Change 88 enum equality checks to use `.value` in 5 test_enums files - Add `-> None` return annotations to ~140 test functions across 20 files - Remove 6 stale `# type: ignore` comments - Replace string literals with enum values for arg-type errors - Add `isinstance` type narrowing for index/attr-defined errors - Add `Callable[..., Path]` for fixtures with default parameters - Add full type signatures to _StubDriver helper methods Closes #87
|
Caution Review failedThe pull request is closed. ℹ️ Recent review infoConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (6)
📝 WalkthroughSummary by CodeRabbit
WalkthroughRemoved the tests.* mypy override and applied widespread static-typing updates across the test suite: added return and parameter annotations, parameterized many ModelFactory subclasses with concrete generics, replaced string enum comparisons with .value, and introduced Protocol/TYPE_CHECKING guards for several fixture factories. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the type safety and maintainability of the test suite by enabling strict Mypy checks. It addresses a large number of previously ignored type errors, improving code quality and reducing the likelihood of runtime issues. The comprehensive typing updates ensure that the test codebase adheres to higher standards, making it easier to understand, refactor, and extend in the future without compromising correctness. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is an excellent and extensive refactoring to enforce strict mypy checks on the test suite. The changes across numerous files correctly address a wide range of typing issues, including adding generic parameters to ModelFactory subclasses, providing explicit type hints for test functions and fixtures, and fixing enum comparisons. The use of type: ignore for intentional type violations in tests is appropriate. Overall, this is a high-quality contribution that significantly improves the project's type safety and maintainability. I have no further comments as the changes are thorough and well-executed.
There was a problem hiding this comment.
Pull request overview
This PR enforces strict mypy checking across the tests/ tree by removing the previous tests.* override and updating test code to satisfy mypy --strict, helping prevent untyped tests from accumulating going forward.
Changes:
- Removed the
disallow_untyped_defs = falsemypy override fortests.*inpyproject.toml. - Added/adjusted type annotations throughout tests (return types, fixture args, factory generics, enum comparisons).
- Refined some test assertions and helper typing to resolve strict mypy errors.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_smoke.py | Removes an unnecessary mypy ignore for pytest marker config typing. |
| tests/unit/templates/test_schema.py | Adds missing return types and stronger typing for template schema tests. |
| tests/unit/templates/test_renderer.py | Adds return type annotations and types fixture callables for renderer tests. |
| tests/unit/templates/test_presets.py | Adds explicit -> None return annotations across preset tests. |
| tests/unit/templates/test_loader.py | Adds return types and fixture callable typing for template loader tests. |
| tests/unit/templates/conftest.py | Updates fixture return types and typing imports for template tests. |
| tests/unit/providers/test_registry.py | Adds typed stub provider hook signatures and return types. |
| tests/unit/providers/test_protocol.py | Adjusts mypy ignore code for a partial-provider abstractness test. |
| tests/unit/providers/test_enums.py | Fixes enum comparisons to use .value to satisfy strict typing. |
| tests/unit/providers/drivers/test_mappers.py | Adds return types and narrows dict/list assertions for mapper helpers. |
| tests/unit/providers/drivers/test_litellm_driver.py | Adds richer typing to streaming helpers/tests and exception tests. |
| tests/unit/providers/conftest.py | Parameterizes ModelFactory subclasses with model type arguments. |
| tests/unit/observability/test_processors.py | Removes an unnecessary mypy ignore for mixed-key dict typing. |
| tests/unit/observability/test_enums.py | Fixes enum comparisons to use .value to satisfy strict typing. |
| tests/unit/observability/test_correlation.py | Adds/adjusts typing ignore for Any-return from structlog context access. |
| tests/unit/observability/conftest.py | Parameterizes ModelFactory subclasses with model type arguments. |
| tests/unit/core/test_enums.py | Fixes enum comparisons to use .value and avoids overlap issues. |
| tests/unit/core/conftest.py | Parameterizes ModelFactory subclasses with model type arguments. |
| tests/unit/config/test_utils.py | Adds explicit -> None return types throughout config utility tests. |
| tests/unit/config/test_schema.py | Adds explicit return types and resolves strict typing errors in config schema tests. |
| tests/unit/config/test_loader.py | Adds explicit return types and types fixtures/monkeypatch usage. |
| tests/unit/config/test_errors.py | Adds explicit -> None return types across config error tests. |
| tests/unit/config/test_defaults.py | Adds explicit -> None return types across defaults tests. |
| tests/unit/config/conftest.py | Parameterizes config ModelFactory subclasses and adjusts fixture callable typing. |
| tests/unit/communication/test_enums.py | Fixes enum comparisons to use .value to satisfy strict typing. |
| tests/unit/communication/test_config.py | Adds a mypy ignore for an intentional strict-type validation test case. |
| tests/unit/communication/conftest.py | Parameterizes ModelFactory subclasses and updates enum-typed fixture values. |
| tests/unit/budget/test_enums.py | Fixes enum comparisons to use .value and updates membership assertions. |
| tests/unit/budget/test_config.py | Adds mypy ignores for intentional strict-type validation test cases. |
| tests/unit/budget/conftest.py | Parameterizes ModelFactory subclasses with model type arguments. |
| pyproject.toml | Removes mypy override that relaxed typing rules for tests.*. |
Comments suppressed due to low confidence (1)
tests/unit/providers/test_protocol.py:251
- The mypy ignore was switched to
empty-body, but this method also uses an ellipsis body. Instead of suppressing the error, prefer a minimal concrete body (e.g., raisingNotImplementedError) so the test doesn’t rely on ignores and still demonstrates the class remains abstract due to missing hooks.
class _PartialProvider(BaseCompletionProvider):
async def _do_complete( # type: ignore[empty-body]
self,
messages: list[ChatMessage],
model: str,
**kwargs: object,
) -> CompletionResponse: ...
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_valid_minimal( | ||
| self, | ||
| make_template_dict: Callable[..., dict[str, Any]], | ||
| ) -> None: |
There was a problem hiding this comment.
Callable is imported only under if TYPE_CHECKING, but it is referenced in runtime-evaluated annotations (Python 3.14 / PEP 649). If anything calls inspect.get_annotations() / typing.get_type_hints() on these tests, annotation evaluation will raise NameError. Import Callable at runtime (and add # noqa: TC003 if needed) instead of gating it behind TYPE_CHECKING.
| def test_default_variables_applied( | ||
| self, | ||
| tmp_template_file: Callable[[str, str], Path], | ||
| ): | ||
| tmp_template_file: Callable[..., Path], | ||
| ) -> None: |
There was a problem hiding this comment.
Callable/Path are imported only under if TYPE_CHECKING, but they’re used in function annotations which are lazily evaluated at runtime in Python 3.14 (PEP 649). This can raise NameError if annotations are introspected. Import these names at runtime (optionally with # noqa: TC003) rather than only under TYPE_CHECKING.
| def test_user_template_overrides_builtin( | ||
| self, | ||
| tmp_path: Path, | ||
| tmp_template_file: Callable[[str, str], Path], | ||
| ): | ||
| tmp_template_file: Callable[..., Path], | ||
| ) -> None: |
There was a problem hiding this comment.
Callable is imported only under if TYPE_CHECKING, but it’s referenced in runtime-evaluated annotations (Python 3.14 / PEP 649). Import Callable at runtime (and silence ruff with # noqa: TC003 if necessary) to avoid NameError when annotations are inspected.
| @pytest.fixture | ||
| def make_template_dict() -> Callable[..., dict[str, Any]]: | ||
| """Factory fixture for building template kwargs dicts.""" | ||
| return _make_template_dict | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def tmp_template_file(tmp_path: Path) -> Callable[[str, str], Path]: | ||
| def tmp_template_file(tmp_path: Path) -> Callable[..., Path]: | ||
| """Factory fixture for writing a temporary template YAML file.""" |
There was a problem hiding this comment.
Callable and Path are only imported under TYPE_CHECKING but are used in fixture annotations. With Python 3.14 lazy annotation evaluation (PEP 649), inspecting these annotations can raise NameError. Import them at runtime (using # noqa: TC003 where appropriate) instead of gating them behind TYPE_CHECKING.
| async def _do_complete( | ||
| self, | ||
| messages: list[ChatMessage], | ||
| model: str, | ||
| *, | ||
| tools: list[ToolDefinition] | None = None, | ||
| config: CompletionConfig | None = None, | ||
| ) -> CompletionResponse: |
There was a problem hiding this comment.
ChatMessage/ToolDefinition/CompletionConfig/etc. are imported only under if TYPE_CHECKING, but they’re used in method annotations. Under Python 3.14 (PEP 649), those annotations are evaluated at runtime when accessed and will raise NameError unless the names exist in module globals. Import these types at runtime (add # noqa: TC003 / # noqa: TC001 as needed) rather than only under TYPE_CHECKING.
| async def _failing_stream() -> AsyncIterator[MagicMock]: | ||
| yield make_stream_chunk(content="Hi") | ||
| raise _litellm.Timeout( | ||
| raise _litellm.Timeout( # type: ignore[attr-defined] | ||
| message="Stream timed out", |
There was a problem hiding this comment.
AsyncIterator is imported only under if TYPE_CHECKING, but it’s referenced in a runtime-evaluated return annotation (Python 3.14 / PEP 649). Import AsyncIterator at runtime (and add # noqa: TC003 if needed) to avoid NameError when annotations are inspected.
tests/unit/config/test_loader.py
Outdated
| def test_explicit_path(self, tmp_config_file: Callable[..., Path]) -> None: | ||
| path = tmp_config_file(MINIMAL_VALID_YAML) |
There was a problem hiding this comment.
Callable is imported only under if TYPE_CHECKING, but it’s referenced in test method annotations. With Python 3.14 lazy annotations (PEP 649), inspecting annotations can raise NameError unless Callable exists at runtime. Import Callable at runtime (use # noqa: TC003 if you need to silence ruff).
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tests/unit/templates/test_schema.py (2)
25-287:⚠️ Potential issue | 🟠 MajorAdd a 30-second timeout marker for this module’s tests.
I don’t see an explicit timeout marker on the touched tests. Add a module-level timeout so all cases in this file are bounded.
Suggested change
import pytest from pydantic import ValidationError +pytestmark = pytest.mark.timeout(30) + from ai_company.core.enums import CompanyType, SeniorityLevelAs per coding guidelines
tests/**/*.py: Set 30-second timeout per test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/templates/test_schema.py` around lines 25 - 287, Add a module-level pytest timeout marker by defining pytestmark = pytest.mark.timeout(30) near the top of the test module so every test in this file (including TestTemplateVariable, TestTemplateAgentConfig, TestTemplateDepartmentConfig, TestTemplateMetadata, and TestCompanyTemplate) gets a 30-second limit; ensure pytest is imported and place the pytestmark assignment at module scope (not inside a class or function).
25-287: 🛠️ Refactor suggestion | 🟠 MajorPublic test classes/methods touched here are still missing Google-style docstrings.
Please add concise docstrings to the touched public test classes/functions (or explicitly scope out D-rules for tests if that is the intended policy).
Suggested pattern
`@pytest.mark.unit` class TestTemplateVariable: + """Unit tests for TemplateVariable.""" + def test_valid_minimal(self) -> None: + """Creates a minimal variable and validates defaults.""" v = TemplateVariable(name="my_var")As per coding guidelines
**/*.py: Use Google style docstrings on all public classes and functions, enforced by ruff D rules.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/templates/test_schema.py` around lines 25 - 287, Add concise Google-style docstrings to the public test classes and top-level test functions in this file (e.g., TestTemplateAgentConfig, TestTemplateDepartmentConfig, TestTemplateMetadata, TestCompanyTemplate and the module-level test functions like test_valid_minimal/test_valid_full/test_blank_name_rejected for TemplateVariable) describing the purpose of each test class/function; alternatively, if tests are intentionally exempt, add an explicit D-rule exemption for these tests per project policy. Ensure each docstring follows Google style (one-line summary and optional brief elaboration) so ruff D rules pass.tests/unit/config/conftest.py (1)
141-147: 🧹 Nitpick | 🔵 TrivialConsider preserving precise Callable signature.
The change from
Callable[[str, str], Path]toCallable[..., Path]loses parameter type information. Since_createhas a well-defined signature (content: str, name: str = "config.yaml"), a more precise type would be better for callers.♻️ Suggested fix for precise typing
`@pytest.fixture` -def tmp_config_file(tmp_path: Path) -> Callable[..., Path]: +def tmp_config_file(tmp_path: Path) -> Callable[[str, str], Path]: def _create(content: str, name: str = "config.yaml") -> Path:Alternatively, if the issue is the default argument, consider:
from typing import Protocol class ConfigFileCreator(Protocol): def __call__(self, content: str, name: str = ...) -> Path: ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/config/conftest.py` around lines 141 - 147, The fixture tmp_config_file currently types the factory as Callable[..., Path], losing parameter info; update its return type to a precise Callable[[str, str], Path] (matching the inner _create(content: str, name: str = "config.yaml") -> Path signature) or introduce a Protocol like ConfigFileCreator with def __call__(self, content: str, name: str = ...) -> Path and use that instead; adjust the annotation on tmp_config_file to reference the chosen precise type and ensure typing imports (Callable or Protocol) are present so callers get exact parameter typing for _create.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/core/test_enums.py`:
- Around line 155-156: The test test_strenum_equality_with_string no longer
asserts StrEnum native equality because it compares SeniorityLevel.JUNIOR.value
to "junior"; restore the original intent by asserting SeniorityLevel.JUNIOR ==
"junior" so the member directly compares to a string, and if mypy complains add
a narrow type-ignore (e.g., on the assertion line) rather than changing the test
to compare .value.
In `@tests/unit/templates/conftest.py`:
- Line 123: Define a Protocol named TemplateFileFactory in conftest.py (e.g.,
class TemplateFileFactory(Protocol): def __call__(self, content: str, name: str
= "test_template.yaml") -> Path: ...) and import Protocol from typing; change
the tmp_template_file fixture return annotation from Callable[..., Path] to
TemplateFileFactory; then update the fixture parameter type annotations in
test_renderer.py, test_loader.py, and test_presets.py (and any other test
modules that consume the fixture) to use TemplateFileFactory instead of
Callable[..., Path] so mypy can enforce the stricter (content: str, name: str =
...) -> Path signature while keeping runtime behavior unchanged.
---
Outside diff comments:
In `@tests/unit/config/conftest.py`:
- Around line 141-147: The fixture tmp_config_file currently types the factory
as Callable[..., Path], losing parameter info; update its return type to a
precise Callable[[str, str], Path] (matching the inner _create(content: str,
name: str = "config.yaml") -> Path signature) or introduce a Protocol like
ConfigFileCreator with def __call__(self, content: str, name: str = ...) -> Path
and use that instead; adjust the annotation on tmp_config_file to reference the
chosen precise type and ensure typing imports (Callable or Protocol) are present
so callers get exact parameter typing for _create.
In `@tests/unit/templates/test_schema.py`:
- Around line 25-287: Add a module-level pytest timeout marker by defining
pytestmark = pytest.mark.timeout(30) near the top of the test module so every
test in this file (including TestTemplateVariable, TestTemplateAgentConfig,
TestTemplateDepartmentConfig, TestTemplateMetadata, and TestCompanyTemplate)
gets a 30-second limit; ensure pytest is imported and place the pytestmark
assignment at module scope (not inside a class or function).
- Around line 25-287: Add concise Google-style docstrings to the public test
classes and top-level test functions in this file (e.g.,
TestTemplateAgentConfig, TestTemplateDepartmentConfig, TestTemplateMetadata,
TestCompanyTemplate and the module-level test functions like
test_valid_minimal/test_valid_full/test_blank_name_rejected for
TemplateVariable) describing the purpose of each test class/function;
alternatively, if tests are intentionally exempt, add an explicit D-rule
exemption for these tests per project policy. Ensure each docstring follows
Google style (one-line summary and optional brief elaboration) so ruff D rules
pass.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (31)
pyproject.tomltests/unit/budget/conftest.pytests/unit/budget/test_config.pytests/unit/budget/test_enums.pytests/unit/communication/conftest.pytests/unit/communication/test_config.pytests/unit/communication/test_enums.pytests/unit/config/conftest.pytests/unit/config/test_defaults.pytests/unit/config/test_errors.pytests/unit/config/test_loader.pytests/unit/config/test_schema.pytests/unit/config/test_utils.pytests/unit/core/conftest.pytests/unit/core/test_enums.pytests/unit/observability/conftest.pytests/unit/observability/test_correlation.pytests/unit/observability/test_enums.pytests/unit/observability/test_processors.pytests/unit/providers/conftest.pytests/unit/providers/drivers/test_litellm_driver.pytests/unit/providers/drivers/test_mappers.pytests/unit/providers/test_enums.pytests/unit/providers/test_protocol.pytests/unit/providers/test_registry.pytests/unit/templates/conftest.pytests/unit/templates/test_loader.pytests/unit/templates/test_presets.pytests/unit/templates/test_renderer.pytests/unit/templates/test_schema.pytests/unit/test_smoke.py
💤 Files with no reviewable changes (1)
- pyproject.toml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do not usefrom __future__ import annotations— Python 3.14 has PEP 649
Use PEP 758 except syntax:except A, B:(no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, enforced by mypy strict mode
Use Google style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects instead of mutating existing ones — enforce immutability
Use Pydantic v2 withBaseModel,model_validator, andConfigDict
Keep line length to 88 characters, enforced by ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files
Files:
tests/unit/communication/test_enums.pytests/unit/core/test_enums.pytests/unit/templates/conftest.pytests/unit/providers/test_protocol.pytests/unit/config/test_defaults.pytests/unit/observability/test_processors.pytests/unit/test_smoke.pytests/unit/providers/conftest.pytests/unit/templates/test_renderer.pytests/unit/providers/test_enums.pytests/unit/core/conftest.pytests/unit/communication/conftest.pytests/unit/budget/conftest.pytests/unit/config/test_utils.pytests/unit/observability/test_enums.pytests/unit/config/test_errors.pytests/unit/budget/test_enums.pytests/unit/budget/test_config.pytests/unit/communication/test_config.pytests/unit/observability/test_correlation.pytests/unit/config/conftest.pytests/unit/templates/test_schema.pytests/unit/providers/drivers/test_mappers.pytests/unit/config/test_loader.pytests/unit/config/test_schema.pytests/unit/providers/drivers/test_litellm_driver.pytests/unit/templates/test_presets.pytests/unit/observability/conftest.pytests/unit/templates/test_loader.pytests/unit/providers/test_registry.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Useasyncio_mode = 'auto'in pytest — no manual@pytest.mark.asyncioneeded
Set 30-second timeout per test
Files:
tests/unit/communication/test_enums.pytests/unit/core/test_enums.pytests/unit/templates/conftest.pytests/unit/providers/test_protocol.pytests/unit/config/test_defaults.pytests/unit/observability/test_processors.pytests/unit/test_smoke.pytests/unit/providers/conftest.pytests/unit/templates/test_renderer.pytests/unit/providers/test_enums.pytests/unit/core/conftest.pytests/unit/communication/conftest.pytests/unit/budget/conftest.pytests/unit/config/test_utils.pytests/unit/observability/test_enums.pytests/unit/config/test_errors.pytests/unit/budget/test_enums.pytests/unit/budget/test_config.pytests/unit/communication/test_config.pytests/unit/observability/test_correlation.pytests/unit/config/conftest.pytests/unit/templates/test_schema.pytests/unit/providers/drivers/test_mappers.pytests/unit/config/test_loader.pytests/unit/config/test_schema.pytests/unit/providers/drivers/test_litellm_driver.pytests/unit/templates/test_presets.pytests/unit/observability/conftest.pytests/unit/templates/test_loader.pytests/unit/providers/test_registry.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T10:09:25.209Z
Learning: Applies to **/*.py : Add type hints to all public functions, enforced by mypy strict mode
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Use type hints where appropriate. Use Pydantic models for data validation in `src/memory/story_state.py`, dataclasses in `src/settings.py`.
📚 Learning: 2026-03-01T10:09:25.209Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T10:09:25.209Z
Learning: Applies to tests/**/*.py : Use pytest markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`
Applied to files:
tests/unit/test_smoke.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/unit/templates/test_renderer.pytests/unit/config/test_schema.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Use type hints where appropriate. Use Pydantic models for data validation in `src/memory/story_state.py`, dataclasses in `src/settings.py`.
Applied to files:
tests/unit/config/test_loader.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Applied to files:
tests/unit/config/test_schema.pytests/unit/providers/drivers/test_litellm_driver.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.
Applied to files:
tests/unit/config/test_schema.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : In agent tests, mock Ollama API calls using `unittest.mock` and patch `agents.base.ollama.Client`
Applied to files:
tests/unit/providers/drivers/test_litellm_driver.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Always mock Ollama API calls in tests - tests should not require a running Ollama instance; use `unittest.mock` for mocking (`patch`, `MagicMock`); mock the Ollama client with: `patch("agents.base.ollama.Client")`
Applied to files:
tests/unit/providers/drivers/test_litellm_driver.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama API calls in tests to avoid requiring a running Ollama instance
Applied to files:
tests/unit/providers/drivers/test_litellm_driver.py
🧬 Code graph analysis (21)
tests/unit/communication/test_enums.py (1)
src/ai_company/communication/enums.py (6)
MessageType(6-19)MessagePriority(22-32)ChannelType(35-46)AttachmentType(49-60)CommunicationPattern(63-72)MessageBusBackend(75-84)
tests/unit/core/test_enums.py (1)
src/ai_company/core/enums.py (10)
AgentStatus(24-29)CostTier(57-69)CompanyType(72-82)TaskStatus(122-143)TaskType(146-154)Priority(157-163)Complexity(166-172)ArtifactType(175-180)ProjectStatus(183-190)SeniorityLevel(6-21)
tests/unit/providers/test_protocol.py (2)
tests/unit/providers/test_registry.py (1)
_do_complete(56-64)src/ai_company/providers/base.py (1)
_do_complete(124-142)
tests/unit/config/test_defaults.py (2)
src/ai_company/config/defaults.py (1)
default_config_dict(6-28)src/ai_company/config/schema.py (1)
RootConfig(204-328)
tests/unit/providers/conftest.py (1)
src/ai_company/providers/models.py (8)
TokenUsage(12-42)ToolDefinition(45-70)ToolCall(73-95)ToolResult(98-111)ChatMessage(114-186)CompletionConfig(189-230)CompletionResponse(233-282)StreamChunk(285-363)
tests/unit/templates/test_renderer.py (5)
src/ai_company/templates/loader.py (1)
load_template(143-181)src/ai_company/templates/renderer.py (1)
render_template(40-86)src/ai_company/config/schema.py (1)
RootConfig(204-328)tests/unit/templates/conftest.py (1)
tmp_template_file(123-131)src/ai_company/templates/errors.py (1)
TemplateRenderError(14-20)
tests/unit/providers/test_enums.py (1)
src/ai_company/providers/enums.py (3)
MessageRole(6-12)FinishReason(15-22)StreamEventType(25-32)
tests/unit/communication/conftest.py (2)
src/ai_company/communication/enums.py (3)
AttachmentType(49-60)MessagePriority(22-32)ChannelType(35-46)src/ai_company/communication/message.py (2)
Attachment(19-30)Message(88-138)
tests/unit/config/test_utils.py (1)
src/ai_company/config/utils.py (1)
deep_merge(30-54)
tests/unit/observability/test_enums.py (1)
src/ai_company/observability/enums.py (3)
LogLevel(6-17)RotationStrategy(20-29)SinkType(32-41)
tests/unit/budget/test_enums.py (3)
src/ai_company/budget/enums.py (1)
BudgetAlertLevel(6-16)tests/unit/observability/test_enums.py (1)
test_membership(30-32)tests/unit/providers/test_enums.py (1)
test_membership(28-30)
tests/unit/budget/test_config.py (1)
src/ai_company/budget/config.py (2)
BudgetAlertConfig(13-60)AutoDowngradeConfig(63-127)
tests/unit/communication/test_config.py (1)
src/ai_company/communication/config.py (1)
LoopPreventionConfig(209-246)
tests/unit/config/conftest.py (1)
src/ai_company/config/schema.py (6)
ProviderModelConfig(17-49)ProviderConfig(52-95)RoutingRuleConfig(98-124)RoutingConfig(127-149)AgentConfig(152-201)RootConfig(204-328)
tests/unit/templates/test_schema.py (3)
src/ai_company/templates/schema.py (2)
TemplateVariable(12-74)TemplateAgentConfig(77-110)src/ai_company/core/enums.py (2)
SeniorityLevel(6-21)CompanyType(72-82)tests/unit/templates/conftest.py (1)
make_template_dict(117-119)
tests/unit/providers/drivers/test_mappers.py (3)
src/ai_company/providers/models.py (2)
ChatMessage(114-186)ToolDefinition(45-70)src/ai_company/providers/enums.py (2)
MessageRole(6-12)FinishReason(15-22)src/ai_company/providers/drivers/mappers.py (4)
messages_to_dicts(19-29)tools_to_dicts(62-72)map_finish_reason(103-123)extract_tool_calls(126-164)
tests/unit/config/test_schema.py (1)
src/ai_company/config/schema.py (6)
ProviderModelConfig(17-49)ProviderConfig(52-95)RoutingRuleConfig(98-124)RoutingConfig(127-149)AgentConfig(152-201)RootConfig(204-328)
tests/unit/providers/drivers/test_litellm_driver.py (2)
src/ai_company/providers/models.py (1)
StreamChunk(285-363)src/ai_company/providers/errors.py (4)
ModelNotFoundError(107-110)ProviderError(22-69)RateLimitError(78-104)AuthenticationError(72-75)
tests/unit/templates/test_presets.py (1)
src/ai_company/templates/presets.py (2)
get_personality_preset(67-84)generate_auto_name(87-103)
tests/unit/observability/conftest.py (2)
src/ai_company/observability/config.py (3)
RotationConfig(21-47)SinkConfig(50-112)LogConfig(115-192)src/ai_company/observability/enums.py (1)
RotationStrategy(20-29)
tests/unit/templates/test_loader.py (2)
src/ai_company/templates/loader.py (3)
list_builtin_templates(134-140)load_template(143-181)LoadedTemplate(65-76)tests/unit/templates/conftest.py (1)
tmp_template_file(123-131)
🔇 Additional comments (48)
tests/unit/observability/test_processors.py (1)
99-103: LGTM!Good improvement—adding the explicit
dict[str | int, str]annotation is cleaner than suppressing the assignment error entirely. The remaining# type: ignorecomments on lines 101-102 are appropriately scoped with specific error codes (arg-type,index) and justified for this edge case test that intentionally passes non-conforming input to verify runtime behavior.tests/unit/templates/test_schema.py (1)
25-287: Typing updates look consistent and strict-mypy friendly.The added
-> Noneannotations and typed fixture callable usage are coherent and keep behavior unchanged while improving static safety.Based on learnings: Applies to **/*.py : Add type hints to all public functions, enforced by mypy strict mode.
tests/unit/observability/test_correlation.py (1)
199-201: LGTM!The
# type: ignore[no-any-return]is appropriate here. Theget_contextvars()method returnsdict[str, Any], so the dictionary lookup yieldsAny, while the function declares-> str. For test code asserting known values, this suppression is reasonable.tests/unit/test_smoke.py (1)
35-35: LGTM!Good improvement—replacing the type ignore comment with an explicit
list[str]annotation is cleaner and provides better type safety.tests/unit/providers/test_protocol.py (1)
244-254: LGTM!The
# type: ignore[empty-body]annotation is appropriate for this test case. The method intentionally uses...as its body to verify that partial ABC implementations are correctly rejected at instantiation time.tests/unit/communication/test_enums.py (1)
23-30: LGTM!Consistent use of
.valuefor enum string comparisons throughout. This pattern is more explicit and satisfies strict mypy checking while maintaining test clarity.Also applies to: 42-45, 60-62, 71-73, 82-85, 103-106
tests/unit/budget/test_config.py (1)
51-54: LGTM!The
# type: ignore[arg-type]annotations are appropriate here. These tests intentionally pass float values to verify Pydantic's strict integer validation rejects them at runtime. The type ignore correctly acknowledges the static type violation while preserving the runtime validation test.Also applies to: 153-156
tests/unit/observability/test_enums.py (1)
24-28: LGTM!Consistent use of
.valuefor enum string comparisons and the[m.value for m in Enum]pattern for membership checks. This aligns with the PR-wide approach to satisfy strict mypy checking.Also applies to: 31-32, 49-50, 67-68
tests/unit/config/test_errors.py (1)
14-168: LGTM!All test methods now have explicit
-> Nonereturn type annotations, satisfying strict mypy requirements.tests/unit/budget/test_enums.py (1)
25-28: LGTM!Consistent use of
.valuefor enum assertions and the[m.value for m in BudgetAlertLevel]pattern for membership checks, aligning with the PR-wide typing improvements.Also applies to: 32-33
tests/unit/core/test_enums.py (2)
163-164: Membership check could remain simpler for StrEnum.For
StrEnum, the direct membership check"senior" in SeniorityLevelworks because StrEnum's__contains__accepts string values. The list comprehension approach works but is less idiomatic for testing StrEnum's designed behavior.This is acceptable for strict typing compliance, but noting that the original test was valid StrEnum usage.
89-144: LGTM on value-based assertions.These tests now explicitly verify the
.valueattribute of each enum member, which is a valid approach for testing enum string values and satisfies strict mypy checks.tests/unit/providers/test_enums.py (2)
22-30: LGTM on provider enum value tests.The
.valuecomparisons are consistent with the PR-wide pattern for strict mypy compliance. Tests correctly verify enum string values.
49-78: LGTM on FinishReason and StreamEventType tests.Value-based assertions are consistent and correct.
tests/unit/communication/conftest.py (2)
30-91: LGTM on generic factory typing.The
ModelFactory[T]parameterization improves type inference for factory outputs. This is a clean approach that aligns with polyfactory's generic support.
97-123: LGTM on enum-based attachment construction.Using
AttachmentType.ARTIFACTinstead of the string literal"artifact"properly aligns with theAttachmentmodel's type annotation (type: AttachmentType) and provides better type safety.tests/unit/communication/test_config.py (1)
433-438: LGTM on type ignore for intentional type violation.The
# type: ignore[arg-type]is appropriate here since the test intentionally passes an invalid value (Falsefor aLiteral[True]field) to verify that Pydantic's runtime validation correctly rejects it. This is the correct pattern for testing validation of type-constrained fields.tests/unit/providers/drivers/test_litellm_driver.py (6)
6-12: LGTM on TYPE_CHECKING pattern.Using
TYPE_CHECKINGto guard theAsyncIteratorimport is the correct approach for type-only imports that avoid runtime overhead.
72-80: LGTM on helper function typing.The
_collect_streamfunction now has precise type hints:chunks: list[MagicMock]for input andlist[StreamChunk]for output, improving clarity for test readers.
397-406: LGTM on None-safety improvements.Adding explicit
assert tc0 is not Noneandassert tc1 is not Nonechecks before accessing.tool_call_deltaattributes is both type-safe and provides clearer test failure messages if the values are unexpectedly None.
527-532: Type ignores for litellm exceptions are reasonable.The
# type: ignore[attr-defined]comments on litellm exception instantiation and header assignment are acceptable workarounds for litellm's dynamic exception classes that may not be fully typed in stubs.
633-639: LGTM on async generator typing.The
AsyncIterator[MagicMock]return type annotation for_failing_streamis correct and helps mypy understand the generator's yield type.
791-797: LGTM on helper function return type.Adding
-> dict[str, str]return type to_litellm_exc_kwargsis a clean improvement.tests/unit/observability/conftest.py (1)
19-47: LGTM on observability factory typing.The generic
ModelFactory[T]parameterization and explicit default values improve type safety and provide sensible defaults for tests. TheSinkConfigdefaults andLogConfig.sinksinitialization align with the model definitions.tests/unit/providers/conftest.py (1)
24-105: LGTM on provider factory typing.The
ModelFactory[T]generic parameterization is consistent across all factories. The explicit attribute defaults (likefinish_reason = FinishReason.STOP, capability flags, etc.) provide well-typed test data that aligns with model requirements.The
# noqa: RUF012comments are appropriate for mutable class attributes in factory definitions.tests/unit/config/conftest.py (1)
27-60: LGTM on config factory typing.The generic
ModelFactory[T]parameterization and removal of the# type: ignore[assignment]comment indicate improved type inference. The factory definitions are consistent with the PR-wide pattern.tests/unit/budget/conftest.py (1)
30-93: LGTM! Consistent generic typing for all budget-related factories.The
ModelFactory[T]generic parameters are correctly applied to all 11 factory classes, improving type safety and enabling better IDE support. The patternclass XFactory(ModelFactory[X]): __model__ = Xis the correct polyfactory idiom.tests/unit/providers/test_registry.py (3)
3-20: LGTM! Well-structured TYPE_CHECKING imports.The
TYPE_CHECKINGguard correctly isolates imports used only for type annotations, avoiding runtime import overhead. This is the recommended pattern for typing-only dependencies.
56-80: LGTM! Complete type signatures for stub driver methods.The
_StubDriverhelper class now has full type signatures for all abstract method implementations, which satisfies mypy strict mode requirements and documents the expected interface clearly.
88-280: LGTM! Consistent-> Noneannotations and local variable type hints.All test methods properly declare
-> Nonereturn types, and local variables are annotated where it aids type inference. The typing is consistent throughout the test classes.tests/unit/core/conftest.py (1)
44-137: LGTM! Comprehensive generic typing for all core model factories.All 20 factory classes (Skill, Authority, SeniorityInfo, Role, CustomRole, PersonalityConfig, SkillSet, ModelConfig, MemoryConfig, ToolPermissions, AgentIdentity, Team, Department, CompanyConfig, HRRegistry, Company, ExpectedArtifact, Artifact, AcceptanceCriterion, Task, Project) are correctly parameterized with their model types.
tests/unit/config/test_utils.py (1)
10-68: LGTM! Consistent-> Nonereturn annotations.All 11 test methods in
TestDeepMergenow have explicit-> Nonereturn type annotations, satisfying mypy strict mode requirements.tests/unit/providers/drivers/test_mappers.py (3)
24-102: LGTM! Return type annotations for all test methods.All test methods in
TestMessagesToDictsnow have-> Nonereturn annotations.
58-68: LGTM! Type-safe assertion restructuring.The
isinstancechecks enable mypy to verify type safety when accessing nested dictionary fields. This is the correct approach when dealing withdict[str, object]return types where values need narrowing before access.
110-292: LGTM! Consistent typing throughout remaining test classes.
TestToolsToDicts,TestMapFinishReason, andTestExtractToolCallsall have proper-> Noneannotations and appropriateisinstancetype narrowing where needed.tests/unit/config/test_loader.py (2)
4-9: LGTM! TYPE_CHECKING guard for Callable import.Correctly uses
TYPE_CHECKINGto guard theCallableimport fromcollections.abc, as it's only needed for static type analysis.
45-654: LGTM! Comprehensive typing for all test methods.All test methods across all test classes have:
- Explicit
-> Nonereturn annotations- Properly typed fixture parameters (
tmp_path: Path,tmp_config_file: Callable[..., Path],monkeypatch: pytest.MonkeyPatch)The typing is consistent and follows best practices for pytest tests.
tests/unit/config/test_schema.py (5)
30-74: LGTM! Return type annotations for TestProviderModelConfig.All test methods properly annotated with
-> None.
81-129: LGTM! Return type annotations for TestProviderConfig.All test methods properly annotated with
-> None.
136-200: LGTM! Return type annotations for TestRoutingRuleConfig and TestRoutingConfig.All test methods properly annotated with
-> None.
207-247: LGTM! Return type annotations for TestAgentConfig.All test methods properly annotated with
-> None.
254-417: LGTM! Return type annotations for TestRootConfig.All test methods properly annotated with
-> None. The# type: ignore[arg-type]on line 326 is appropriate since that test intentionally passes dict literals to verify Pydantic's coercion behavior.tests/unit/config/test_defaults.py (1)
11-41: LGTM! Consistent-> Nonereturn annotations.All 5 test methods in
TestDefaultConfigDictnow have explicit-> Nonereturn type annotations, completing the strict mypy compliance for this test module.tests/unit/templates/test_renderer.py (2)
61-62: No additional comment for these fixture-annotation call sites.This is the same fixture-typing concern already raised at
tests/unit/templates/conftest.pyLine 123 (root cause).Also applies to: 70-71, 79-80, 88-89, 97-98, 106-107, 170-171
24-24: Good strict-typing cleanup on test signatures.The explicit
-> Noneannotations are consistent and non-behavioral.Also applies to: 31-31, 38-38, 47-47, 123-123, 129-129, 137-137, 151-151, 156-156
tests/unit/templates/test_presets.py (1)
14-14: Looks good.These signature annotations improve mypy strictness without affecting test behavior.
Also applies to: 20-20, 24-24, 28-28, 34-34, 38-38, 47-47, 52-52, 56-56, 61-61, 66-66, 71-71
tests/unit/templates/test_loader.py (2)
78-79: No additional comment for these fixture-annotation call sites.Covered by the root-cause typing recommendation in
tests/unit/templates/conftest.pyLine 123.Also applies to: 143-144, 152-153, 165-166, 173-174, 181-182
39-39: LGTM on these signature updates.
-> Noneannotations are consistent and maintain behavior.Also applies to: 44-44, 49-49, 58-58, 63-63, 69-69, 99-99, 105-105, 109-109, 116-116, 120-120, 159-159, 193-193, 198-198
tests/unit/templates/conftest.py
Outdated
|
|
||
| @pytest.fixture | ||
| def tmp_template_file(tmp_path: Path) -> Callable[[str, str], Path]: | ||
| def tmp_template_file(tmp_path: Path) -> Callable[..., Path]: |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
# First, let's check if the file exists and view the relevant code
cat -n tests/unit/templates/conftest.py | head -150 | tail -40Repository: Aureliolo/ai-company
Length of output: 1418
🏁 Script executed:
# Also check the structure of the tests/unit/templates directory
ls -la tests/unit/templates/ 2>/dev/null || echo "Directory not found"Repository: Aureliolo/ai-company
Length of output: 556
🏁 Script executed:
# Run the verification script from the review comment to check call site compatibility
cd tests/unit/templates && python - <<'PY'
import ast
from pathlib import Path
root = Path(".")
violations = []
for file in root.rglob("*.py"):
try:
tree = ast.parse(file.read_text(encoding="utf-8"))
for node in ast.walk(tree):
if isinstance(node, ast.Call) and isinstance(node.func, ast.Name) and node.func.id == "tmp_template_file":
positional = len(node.args)
kw = {k.arg for k in node.keywords if k.arg}
valid = (
(positional in (1, 2) and not kw)
or (positional == 1 and kw <= {"name"})
or (positional == 0 and "content" in kw and kw <= {"content", "name"})
)
if not valid:
violations.append((str(file), node.lineno, positional, sorted(kw)))
except Exception as e:
print(f"Error parsing {file}: {e}")
if violations:
print("Incompatible call sites:")
for v in violations:
print(v)
else:
print("All call sites are compatible with __call__(content: str, name: str = ...).")
PYRepository: Aureliolo/ai-company
Length of output: 139
🏁 Script executed:
# Check all usages of tmp_template_file in the test directory
rg -n 'tmp_template_file' tests/unit/templates --type py -B 2 -A 2Repository: Aureliolo/ai-company
Length of output: 7812
Use a Protocol-based callable type to preserve strict type checking for tmp_template_file.
The current Callable[..., Path] annotation weakens mypy's ability to validate argument shapes at call sites. Since all actual calls are compatible with the stricter signature (content: str, name: str = "test_template.yaml") -> Path, a Protocol type strengthens type safety across the module.
This requires three changes:
- Define
TemplateFileFactoryProtocol in conftest.py - Change the fixture return type from
Callable[..., Path]toTemplateFileFactory(line 123) - Update fixture parameter declarations in test files (test_renderer.py, test_loader.py, and test_presets.py) to use
TemplateFileFactoryinstead ofCallable[..., Path]
All 14+ call sites are already compatible with this stricter contract.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/templates/conftest.py` at line 123, Define a Protocol named
TemplateFileFactory in conftest.py (e.g., class TemplateFileFactory(Protocol):
def __call__(self, content: str, name: str = "test_template.yaml") -> Path: ...)
and import Protocol from typing; change the tmp_template_file fixture return
annotation from Callable[..., Path] to TemplateFileFactory; then update the
fixture parameter type annotations in test_renderer.py, test_loader.py, and
test_presets.py (and any other test modules that consume the fixture) to use
TemplateFileFactory instead of Callable[..., Path] so mypy can enforce the
stricter (content: str, name: str = ...) -> Path signature while keeping runtime
behavior unchanged.
Restore direct StrEnum-to-string assertions that were changed to .value comparisons during the mypy strict enforcement. The original assertions test StrEnum's native string equality behavior, which is the defining feature being verified. Use type: ignore[comparison-overlap] where mypy flags the StrEnum == str comparison. - test_strenum_equality_with_string: restore direct enum == "string" - test_membership: restore "string" in __members__.values() pattern
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/core/test_enums.py`:
- Around line 163-164: The membership assertion in test_strenum_membership is
misleading because SeniorityLevel.__members__.values() yields enum members, not
string names; update the test to either assert "SENIOR" in
SeniorityLevel.__members__.keys() to check the attribute name, or assert
"senior" in {m.value for m in SeniorityLevel.__members__.values()} (or directly
compare to SeniorityLevel.SENIOR.value) so the assertion clearly matches the
intended string value; modify the test_strenum_membership accordingly.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
tests/unit/budget/test_enums.pytests/unit/core/test_enums.pytests/unit/observability/test_enums.pytests/unit/providers/test_enums.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do not usefrom __future__ import annotations— Python 3.14 has PEP 649
Use PEP 758 except syntax:except A, B:(no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, enforced by mypy strict mode
Use Google style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects instead of mutating existing ones — enforce immutability
Use Pydantic v2 withBaseModel,model_validator, andConfigDict
Keep line length to 88 characters, enforced by ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files
Files:
tests/unit/providers/test_enums.pytests/unit/observability/test_enums.pytests/unit/core/test_enums.pytests/unit/budget/test_enums.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Useasyncio_mode = 'auto'in pytest — no manual@pytest.mark.asyncioneeded
Set 30-second timeout per test
Files:
tests/unit/providers/test_enums.pytests/unit/observability/test_enums.pytests/unit/core/test_enums.pytests/unit/budget/test_enums.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T10:09:25.209Z
Learning: Applies to **/*.py : Add type hints to all public functions, enforced by mypy strict mode
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Use type hints where appropriate. Use Pydantic models for data validation in `src/memory/story_state.py`, dataclasses in `src/settings.py`.
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
🧬 Code graph analysis (4)
tests/unit/providers/test_enums.py (1)
src/ai_company/providers/enums.py (3)
MessageRole(6-12)FinishReason(15-22)StreamEventType(25-32)
tests/unit/observability/test_enums.py (1)
src/ai_company/observability/enums.py (3)
LogLevel(6-17)RotationStrategy(20-29)SinkType(32-41)
tests/unit/core/test_enums.py (1)
src/ai_company/core/enums.py (10)
AgentStatus(24-29)CostTier(57-69)CompanyType(72-82)TaskStatus(122-143)TaskType(146-154)Priority(157-163)Complexity(166-172)ArtifactType(175-180)ProjectStatus(183-190)SeniorityLevel(6-21)
tests/unit/budget/test_enums.py (1)
src/ai_company/budget/enums.py (1)
BudgetAlertLevel(6-16)
🔇 Additional comments (5)
tests/unit/budget/test_enums.py (1)
25-28: LGTM!Using
.valuefor these assertions is appropriate since the test is explicitly namedtest_values_are_stringsand is verifying the underlying string values of the enum members.tests/unit/providers/test_enums.py (1)
23-26: LGTM!The
.valueassertions correctly test the underlying string values of theStrEnummembers. The pattern is consistent across all three enum test classes.Also applies to: 50-54, 74-78
tests/unit/observability/test_enums.py (1)
24-28: LGTM!The
.valueassertions appropriately test the underlying string values forLogLevel,RotationStrategy, andSinkTypeenums.Also applies to: 49-50, 67-68
tests/unit/core/test_enums.py (2)
90-144: LGTM!Using
.valuefor these enum string value assertions is appropriate and consistent with the test method names (e.g.,test_agent_status_values,test_cost_tier_values).
155-156: Good fix — preserves the intended StrEnum behavior test.Using direct
StrEnum == strcomparison with# type: ignore[comparison-overlap]correctly testsStrEnum's native string equality behavior, which is the defining characteristic this test should verify. This addresses the previous review feedback appropriately.
| def test_strenum_membership(self) -> None: | ||
| assert "senior" in [m.value for m in SeniorityLevel] | ||
| assert "senior" in SeniorityLevel.__members__.values() |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Membership check is semantically misleading.
SeniorityLevel.__members__.values() returns enum members (e.g., SeniorityLevel.SENIOR), not string names. This check works only because StrEnum members compare equal to their string values. For clarity, consider using __members__ keys (the attribute names) or explicitly testing against values:
♻️ Suggested alternatives for clarity
Option 1: Check member names (attribute names)
def test_strenum_membership(self) -> None:
- assert "senior" in SeniorityLevel.__members__.values()
+ assert "SENIOR" in SeniorityLevel.__members__Option 2: Explicitly check values
def test_strenum_membership(self) -> None:
- assert "senior" in SeniorityLevel.__members__.values()
+ assert "senior" in [m.value for m in SeniorityLevel]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_strenum_membership(self) -> None: | |
| assert "senior" in [m.value for m in SeniorityLevel] | |
| assert "senior" in SeniorityLevel.__members__.values() | |
| def test_strenum_membership(self) -> None: | |
| assert "senior" in [m.value for m in SeniorityLevel] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/core/test_enums.py` around lines 163 - 164, The membership
assertion in test_strenum_membership is misleading because
SeniorityLevel.__members__.values() yields enum members, not string names;
update the test to either assert "SENIOR" in SeniorityLevel.__members__.keys()
to check the attribute name, or assert "senior" in {m.value for m in
SeniorityLevel.__members__.values()} (or directly compare to
SeniorityLevel.SENIOR.value) so the assertion clearly matches the intended
string value; modify the test_strenum_membership accordingly.
- Define TemplateFileFactory Protocol in templates/conftest.py with precise (content: str, name: str = ...) -> Path signature - Define ConfigFileFactory Protocol in config/conftest.py with same precise signature - Replace Callable[..., Path] with Protocol types across all consuming test files for stricter parameter type checking - Add pytestmark = pytest.mark.timeout(30) to templates/test_schema.py for consistency with all other test modules
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (1)
tests/unit/providers/drivers/test_mappers.py:62
tcis first used for aToolCallinstance, then reused for adictextracted fromraw_tool_calls. Reusing the same name for different types in the same test makes the assertions harder to follow; consider renaming the second variable to reflect it’s a raw/dict tool call.
tc = raw_tool_calls[0]
assert isinstance(tc, dict)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if TYPE_CHECKING: | ||
| from collections.abc import Callable | ||
| from pathlib import Path | ||
| from .conftest import TemplateFileFactory | ||
|
|
There was a problem hiding this comment.
TemplateFileFactory is imported only under if TYPE_CHECKING but referenced by runtime annotations in this file. Given the repo’s Python 3.14 + PEP 649 setup, this can break if annotations are evaluated. Import it at runtime (and silence Ruff with # noqa: TC00x if necessary).
| if TYPE_CHECKING: | ||
| from .conftest import ConfigFileFactory | ||
|
|
There was a problem hiding this comment.
ConfigFileFactory is imported only under if TYPE_CHECKING but used in runtime annotations later in this file (e.g., tmp_config_file: ConfigFileFactory). Under Python 3.14 + PEP 649 lazy annotations, evaluating annotations can fail with NameError. Prefer importing it at runtime and suppressing Ruff with # noqa: TC00x as needed.
| from typing import TYPE_CHECKING | ||
|
|
||
| import pytest | ||
|
|
||
| from ai_company.config.schema import ProviderConfig, ProviderModelConfig | ||
| from ai_company.providers.base import BaseCompletionProvider | ||
|
|
||
| if TYPE_CHECKING: | ||
| from collections.abc import AsyncIterator | ||
|
|
||
| from ai_company.providers.capabilities import ModelCapabilities | ||
| from ai_company.providers.models import ( | ||
| ChatMessage, | ||
| CompletionConfig, | ||
| CompletionResponse, | ||
| StreamChunk, | ||
| ToolDefinition, | ||
| ) |
There was a problem hiding this comment.
Several types used in runtime annotations (AsyncIterator, ChatMessage, CompletionResponse, etc.) are imported only inside if TYPE_CHECKING. In this repo’s Python 3.14 + PEP 649 configuration, later annotation evaluation can fail when those names aren’t available at runtime. Prefer runtime imports here (and use # noqa: TC001/TC003 like in src/ai_company/providers/protocol.py).
| from typing import TYPE_CHECKING | |
| import pytest | |
| from ai_company.config.schema import ProviderConfig, ProviderModelConfig | |
| from ai_company.providers.base import BaseCompletionProvider | |
| if TYPE_CHECKING: | |
| from collections.abc import AsyncIterator | |
| from ai_company.providers.capabilities import ModelCapabilities | |
| from ai_company.providers.models import ( | |
| ChatMessage, | |
| CompletionConfig, | |
| CompletionResponse, | |
| StreamChunk, | |
| ToolDefinition, | |
| ) | |
| from collections.abc import AsyncIterator # noqa: TC003 | |
| import pytest | |
| from ai_company.config.schema import ProviderConfig, ProviderModelConfig | |
| from ai_company.providers.base import BaseCompletionProvider | |
| from ai_company.providers.capabilities import ModelCapabilities # noqa: TC001 | |
| from ai_company.providers.models import ( # noqa: TC001 | |
| ChatMessage, | |
| CompletionConfig, | |
| CompletionResponse, | |
| StreamChunk, | |
| ToolDefinition, | |
| ) |
|
|
||
| if TYPE_CHECKING: | ||
| from collections.abc import AsyncIterator | ||
|
|
There was a problem hiding this comment.
AsyncIterator is imported only under if TYPE_CHECKING, but it’s referenced in a runtime return annotation later in this file (_failing_stream() -> AsyncIterator[...]). With Python 3.14 + PEP 649 lazy annotations, evaluating annotations can fail if AsyncIterator isn’t in module globals. Import it at runtime and use # noqa: TC003 if needed.
| if TYPE_CHECKING: | |
| from collections.abc import AsyncIterator | |
| from collections.abc import AsyncIterator # noqa: TC003 |
| if TYPE_CHECKING: | ||
| from collections.abc import Callable | ||
| from pathlib import Path | ||
|
|
There was a problem hiding this comment.
Callable/Path are imported only under if TYPE_CHECKING, but they’re referenced in runtime annotations in this module (e.g., TemplateFileFactory, make_template_dict, tmp_template_file). With the repo’s Python 3.14 + PEP 649 lazy annotations setup, evaluating annotations (e.g., typing.get_type_hints) can raise NameError because those names aren’t in module globals. Prefer importing these at runtime and silencing Ruff with # noqa: TC00x (see tests/unit/providers/test_protocol.py:3 / src/ai_company/providers/base.py:10).
| if TYPE_CHECKING: | ||
| from pathlib import Path | ||
|
|
There was a problem hiding this comment.
Path is imported only under if TYPE_CHECKING, but it’s used in runtime annotations (ConfigFileFactory, tmp_config_file). In this repo (Python 3.14 + PEP 649, no __future__.annotations), annotations may be evaluated later and will fail if Path isn’t in module globals. Import Path at runtime and use # noqa: TC003 if needed to satisfy Ruff.
| if TYPE_CHECKING: | ||
| from collections.abc import Callable | ||
| from .conftest import TemplateFileFactory | ||
|
|
There was a problem hiding this comment.
TemplateFileFactory is imported only under if TYPE_CHECKING but is used in runtime type annotations for test method parameters. With PEP 649 lazy annotations (and no from __future__ import annotations), evaluating annotations can raise NameError. Prefer importing TemplateFileFactory at runtime (and add # noqa: TC00x if Ruff flags it).
| if TYPE_CHECKING: | ||
| from collections.abc import Callable | ||
|
|
There was a problem hiding this comment.
Callable is imported only under if TYPE_CHECKING, but it’s used in runtime annotations (e.g., make_template_dict: Callable[..., ...]). With the repo’s Python 3.14 + PEP 649 approach, this can raise NameError if annotations are evaluated. Prefer a runtime import for Callable (with # noqa: TC003 if Ruff complains).
## Summary - **mypy on tests/**: CI now type-checks `tests/` alongside `src/` (PR #89 enforced strict mypy locally but CI only ran on `src/`) - **Secret scanning**: New `secret-scan.yml` workflow runs gitleaks on push/PR + weekly Monday 3am UTC (gitleaks is skipped in pre-commit CI) - **Codecov integration**: Replaces coverage artifact uploads with Codecov for PR comments, trend tracking, and badges - **Dependency review hardened**: Added license allow-list (permissive only) and PR comment summaries - **Dependabot tuned**: Added commit-message prefixes (`chore`/`ci`), increased PR limit to 10, kept daily schedule - **Auto-merge removed**: Deleted `dependabot-auto-merge.yml` — no auto-merging of anything - **Security hardening**: Top-level `permissions: {}` deny-all, per-job `contents: read`, `persist-credentials: false` on all checkouts - **Smarter concurrency**: Only cancels stale PR runs, not main branch pushes - **Manual trigger**: Added `workflow_dispatch` for manual CI runs from GitHub UI ## Test plan - [ ] CI runs successfully on this PR (lint, type-check, test jobs) - [ ] Verify mypy catches type errors in `tests/` files - [ ] Verify Codecov posts coverage comment on PR (requires `CODECOV_TOKEN` secret) - [ ] Verify secret-scan workflow appears in Actions tab - [ ] Verify dependency-review posts comment on PR - [ ] Verify no auto-merge workflow exists - [ ] Check Dependabot Settings page shows no errors after merge --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
disallow_untyped_defs = falsemypy override fortests.*and fix all 225 mypy errors across 31 files so strict typing is enforced on tests going forwardModelFactorysubclasses, fix 88 enum comparison-overlap checks, add-> Noneto ~140 test functions, and resolve all remaining arg-type/attr-defined/index/call-arg errorsTest plan
uv run mypy src/ tests/— 0 errors (131 source files)uv run pytest tests/ -m unit— 1292 passeduv run ruff check src/ tests/— all checks passedCloses #87
🤖 Generated with Claude Code