Skip to content

test: fix Hypothesis fuzzing infra and speed up slow unit tests#1044

Merged
Aureliolo merged 4 commits intomainfrom
test/hypothesis-fuzzing-session
Apr 3, 2026
Merged

test: fix Hypothesis fuzzing infra and speed up slow unit tests#1044
Aureliolo merged 4 commits intomainfrom
test/hypothesis-fuzzing-session

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Ran a dedicated Hypothesis fuzzing session (5 seeds x 10k examples + 500k extreme run across all 46 @given test files, ~3M+ total inputs). Found 2 real bugs and 3 infra issues, fixed all of them, then optimized 9 slow unit tests.

Fuzzing Fixes

  • BM25 tokenizer Unicode edge case: BM25Tokenizer.encode('\ufc5e') returned empty vector for Arabic ligature U+FC5E (Unicode category "L") that decomposes entirely into combining marks under NFKC normalization. Added @example + assume() to handle degenerate NFKC inputs.
  • Flaky setup property test: test_description_normalization_invariants hit HTTP 429 after ~100 rapid POST requests due to rate limiter. Bumped API test fixture rate limit to 1M req/min. Also refactored the test to call normalize_description() directly instead of full HTTP round-trip (13s -> 0.25s).
  • Fuzz profile incompatible with pytest-timeout: Added suppress_health_check=list(HealthCheck) to fuzz profile so Hypothesis doesn't abandon slow tests. Made wall-clock guardrail skip when fuzz profile is active.
  • Added extreme profile: 500k examples for overnight fuzzing sessions.
  • Updated CLAUDE.md: Fuzz command uses --timeout=0 and drops -k properties to cover all 46 @given files, not just the 12 *_properties.py files.

Test Performance Improvements

9 tests brought under 1s:

Test Before After
test_timeout_kills_process 5.74s 0.52s
test_description_normalization_invariants ~13s 0.25s
test_inputs_are_not_mutated 2.30s 0.58s
test_override_values_win_for_non_dict 1.71s 0.65s
test_identity_merge_with_empty 1.48s 0.56s
test_result_keys_are_union 1.33s 0.51s
test_format (stagnation) 1.41s 0.17s
test_determinism (stagnation) 1.25s 0.69s
test_run_loop_continues_on_probe_error 1.22s 0.21s

Fixes: reduced property test strategy complexity (max_leaves 20->5), reduced max_examples overrides (100/200->50), shortened subprocess timeout command, bypassed health prober interval for test speed.

Test Plan

  • Full unit suite: 13,166 passed, 0 failed
  • Fuzz profile (10k examples, no seed): 13,132 passed, 0 failed
  • Extreme profile (500k examples): all property tests passed
  • mypy: 0 errors
  • ruff: 0 errors

Files Changed

  • CLAUDE.md -- fuzz command update
  • tests/conftest.py -- fuzz/extreme profiles, wall-clock guardrail
  • tests/unit/api/conftest.py -- rate limit override for property tests
  • tests/unit/api/controllers/test_setup.py -- refactored to test normalize_description directly
  • tests/unit/memory/test_sparse.py -- @example + assume() for Unicode edge case
  • tests/unit/config/test_utils_properties.py -- reduced strategy complexity
  • tests/unit/engine/stagnation/test_properties.py -- reduced max_examples
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py -- faster timeout command
  • tests/unit/providers/test_health_prober.py -- bypass interval for test speed

- Fix BM25 tokenizer test: add @example for Unicode ligature U+FC5E
  that decomposes to combining marks under NFKC, producing empty vector
- Fix flaky setup test: bump API test rate limit to 1M req/min so
  Hypothesis 10k+ examples don't hit 429
- Add suppress_health_check to fuzz profile so Hypothesis doesn't
  abandon slow tests before reaching max_examples
- Add extreme profile (500k examples) for overnight fuzzing sessions
- Make wall-clock guardrail skip when fuzz profile is active
- Update CLAUDE.md: fuzz command uses --timeout=0 and drops -k
  properties to cover all 46 files with @given decorators
- Refactor test_description_normalization_invariants to test
  normalize_description directly instead of full HTTP round-trip
  (13s -> 0.25s)
- Reduce subprocess timeout test: ping -n 3 -> python sleep,
  killed after 0.5s (5.74s -> 0.52s)
- Reduce deep_merge property test strategy complexity:
  max_leaves 20->5, max_size 8->5, max_examples 100->50 (2.3s -> 0.58s)
- Reduce stagnation fingerprint property tests: max_examples
  200->50 (1.4s -> 0.17s)
- Fix health prober test: bypass interval_seconds=1 to avoid
  real 1s sleep between probe cycles (1.22s -> 0.21s)
Copilot AI review requested due to automatic review settings April 3, 2026 19:21
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 17ac093b-afd9-4ce1-b777-1aa675ee6bd6

📥 Commits

Reviewing files that changed from the base of the PR and between a0b4991 and 7d20397.

📒 Files selected for processing (4)
  • CLAUDE.md
  • tests/conftest.py
  • tests/unit/providers/test_health_prober.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py

Walkthrough

Updated test harness and documentation to change fuzzing invocation to use HYPOTHESIS_PROFILE=fuzz with --timeout=0, register a new extreme Hypothesis profile, and set profile deadlines to None. Tests now suppress health checks and use a local Hypothesis database when fuzz profiles are active; a _FUZZ_PROFILE_ACTIVE flag disables the unit-test wall-clock failure guardrail under those profiles. Additional test adjustments include increasing test API rate limits, reducing property-test generation sizes, extracting a normalization function into unit tests, adding a Unicode edge-case example, tightening probe timing in a prober test, and changing subprocess timeout commands.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.00% which is insufficient. The required threshold is 40.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test: fix Hypothesis fuzzing infra and speed up slow unit tests' directly summarizes the main changes: fixing fuzzing infrastructure and optimizing test performance.
Description check ✅ Passed The description comprehensively covers the pull request changes, including the fuzzing session results, bugs discovered, infrastructure fixes, test performance improvements, and a detailed test plan.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the property-based testing infrastructure by introducing an 'extreme' Hypothesis profile for deep fuzzing and optimizing existing property tests to improve execution speed. Key changes include increasing API rate limits for tests, disabling wall-clock limits during fuzzing sessions, and refactoring specific tests to reduce overhead. Feedback highlights that the wall-clock limit bypass should be extended to include the newly added 'extreme' profile to prevent false failures during long-running sessions.

# Disabled for fuzz profile where 10k examples per test routinely
# exceed the limit.
_UNIT_TEST_WALL_CLOCK_LIMIT = 8.0 # seconds
_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") == "fuzz"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _FUZZ_PROFILE_ACTIVE check should also include the newly added extreme profile. Since the extreme profile runs 500,000 examples per test, it will almost certainly exceed the 8-second wall-clock limit for unit tests, causing them to fail incorrectly when this profile is active.

Suggested change
_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") == "fuzz"
_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") in ("fuzz", "extreme")

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 7d20397.
Ensure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Python test suite’s Hypothesis fuzzing infrastructure and reduces runtime of several slow unit/property tests, primarily by adjusting Hypothesis profiles/strategies and removing expensive HTTP round-trips in a property test.

Changes:

  • Added/updated Hypothesis profiles (fuzz, new extreme) and adjusted the unit-test wall-clock guardrail behavior during fuzzing.
  • Refactored test_description_normalization_invariants to test normalize_description() directly and raised API test fixture rate limits to avoid 429s during fuzz runs.
  • Reduced property-test workload/complexity and sped up a few slow unit tests (subprocess timeout test, health prober loop test).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
CLAUDE.md Updates documented fuzzing command; needs to mention the new extreme profile to match tests/conftest.py.
tests/conftest.py Adds extreme profile and suppresses health checks for fuzzing; adjusts wall-clock guardrail to skip under fuzz.
tests/unit/api/conftest.py Raises API rate limits in test config to prevent 429s during high-volume Hypothesis runs.
tests/unit/api/controllers/test_setup.py Property test now calls normalize_description() directly instead of doing HTTP POST per example.
tests/unit/memory/test_sparse.py Adds explicit Unicode example and assumptions to handle degenerate NFKC tokenization cases.
tests/unit/config/test_utils_properties.py Reduces strategy sizes and max_examples to speed up deep-merge/to-float property tests.
tests/unit/engine/stagnation/test_properties.py Lowers max_examples for two fingerprint property tests.
tests/unit/tools/sandbox/test_subprocess_sandbox.py Speeds up timeout test by shortening the long-running command.
tests/unit/providers/test_health_prober.py Speeds up prober loop test by bypassing the interval sleep via private state mutation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 115 to +139
@@ -115,7 +132,11 @@ def pytest_runtest_teardown(item: pytest.Item) -> None:
if start is None:
return
elapsed = time.monotonic() - start
if item.get_closest_marker("unit") and elapsed > _UNIT_TEST_WALL_CLOCK_LIMIT:
if (
not _FUZZ_PROFILE_ACTIVE
and item.get_closest_marker("unit")
and elapsed > _UNIT_TEST_WALL_CLOCK_LIMIT
):
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wall-clock guardrail skip is keyed only off HYPOTHESIS_PROFILE == "fuzz", but this PR also adds an extreme profile intended for long-running runs. As written, running with HYPOTHESIS_PROFILE=extreme will still fail unit tests for exceeding _UNIT_TEST_WALL_CLOCK_LIMIT, which seems contrary to the intent of adding an overnight fuzz profile. Consider broadening the check to treat both fuzz and extreme (or any non-ci profile) as exempt, or derive the flag from the loaded Hypothesis profile rather than a single env-var value.

Copilot uses AI. Check for mistakes.
Comment on lines 308 to 313
if os.name == "nt":
result = await subprocess_sandbox.execute(
command="cmd",
args=("/c", "ping", "-n", "10", "127.0.0.1"),
command="python",
args=("-c", "import time; time.sleep(5)"),
timeout=0.5,
)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows this test now shells out to command="python", which can be missing on PATH depending on how the test environment is set up (e.g. only py launcher available). Since the test suite is already running under a specific interpreter, it’s more reliable to invoke that interpreter (e.g. via sys.executable) to avoid PATH-related flakes.

Copilot uses AI. Check for mistakes.
Comment on lines +345 to +347
# Bypass the validated minimum so wait_for times out
# immediately between probe cycles instead of sleeping 1s.
prober._interval = 0
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test reaches into ProviderHealthProber’s private state (prober._interval = 0) and intentionally violates the constructor’s validation (interval_seconds must be >= 1). That makes the test more brittle to refactors (e.g. renaming the attribute, adding invariants) and slightly changes behavior inside _probe_all (eligibility checks use _interval). Prefer avoiding private mutation here (e.g. keep interval_seconds=1 and mock/patch the wait/sleep between cycles so the loop advances without a real 1s delay).

Copilot uses AI. Check for mistakes.
CLAUDE.md Outdated
Comment on lines +174 to +175
- **Property-based testing**: Python uses [Hypothesis](https://hypothesis.readthedocs.io/) (`@given` + `@settings`), React uses [fast-check](https://fast-check.dev/) (`fc.assert` + `fc.property`), Go uses native `testing.F` fuzz functions (`Fuzz*`). Hypothesis profiles configured in `tests/conftest.py`: `ci` (deterministic, `max_examples=10` + `derandomize=True` -- fixed seed per test, same inputs every run), `dev` (1000 examples), `fuzz` (10,000 examples, no deadline -- for dedicated fuzzing sessions). Controlled via `HYPOTHESIS_PROFILE` env var. `.hypothesis/` is gitignored. Failing examples are persisted to `~/.synthorg/hypothesis-examples/` (write-only shared DB, survives worktree deletion) via `_WriteOnlyDatabase` in `tests/conftest.py`.
- **Hypothesis workflow**: CI runs 10 deterministic examples per property test (`derandomize=True` -- same inputs every run, no flakes). Random fuzzing runs locally: `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n 8 -k properties` (quick, 1000 examples) or `HYPOTHESIS_PROFILE=fuzz uv run python -m pytest tests/ -m unit -n 8 -k properties` (deep, 10,000 examples, no deadline). When Hypothesis finds a failure, it is a **real bug** -- the shrunk example is saved to `~/.synthorg/hypothesis-examples/` for analysis but is **not replayed** automatically (that would block all test runs). Do NOT just rerun and move on. Read the failing example from the output, fix the underlying bug, and add an explicit `@example(...)` decorator to the test so the case is permanently covered in CI.
- **Hypothesis workflow**: CI runs 10 deterministic examples per property test (`derandomize=True` -- same inputs every run, no flakes). Random fuzzing runs locally: `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n 8 -k properties` (quick, 1000 examples) or `HYPOTHESIS_PROFILE=fuzz uv run python -m pytest tests/ -m unit -n 8 --timeout=0` (deep, 10,000 examples, no deadline, all `@given` tests -- `--timeout=0` disables the 30s per-test limit that would kill long-running property tests; `-k properties` is intentionally omitted to cover all 46 files with `@given`, not just the 12 `*_properties.py` files). When Hypothesis finds a failure, it is a **real bug** -- the shrunk example is saved to `~/.synthorg/hypothesis-examples/` for analysis but is **not replayed** automatically (that would block all test runs). Do NOT just rerun and move on. Read the failing example from the output, fix the underlying bug, and add an explicit `@example(...)` decorator to the test so the case is permanently covered in CI.
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testing docs still list Hypothesis profiles as only ci, dev, and fuzz, but tests/conftest.py now also registers an extreme profile. Updating this section to mention extreme (and when to use it) will keep CLAUDE.md consistent with the actual test configuration.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.67%. Comparing base (97c6c92) to head (7d20397).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1044      +/-   ##
==========================================
- Coverage   90.67%   90.67%   -0.01%     
==========================================
  Files         697      697              
  Lines       39593    39593              
  Branches     3960     3960              
==========================================
- Hits        35902    35901       -1     
- Misses       3000     3001       +1     
  Partials      691      691              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/conftest.py (1)

118-139: ⚠️ Potential issue | 🟠 Major

Include extreme in the wall-clock guardrail bypass

Line 118 only treats HYPOTHESIS_PROFILE=="fuzz" as long-running. With the new extreme profile (Line 97), Line 135 still enforces the 8s unit-test limit, so extreme runs will fail prematurely.

Proposed fix
-_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") == "fuzz"
+_LONG_RUNNING_HYPOTHESIS_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") in {
+    "fuzz",
+    "extreme",
+}
@@
-    if (
-        not _FUZZ_PROFILE_ACTIVE
+    if (
+        not _LONG_RUNNING_HYPOTHESIS_PROFILE_ACTIVE
         and item.get_closest_marker("unit")
         and elapsed > _UNIT_TEST_WALL_CLOCK_LIMIT
     ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/conftest.py` around lines 118 - 139, The wall-clock bypass currently
checks only _FUZZ_PROFILE_ACTIVE, so tests run under the new "extreme"
Hypothesis profile still hit the 8s unit limit; update the guard in
pytest_runtest_teardown to treat the "extreme" profile as long-running as
well—either define a second flag (e.g., _EXTREME_PROFILE_ACTIVE) or replace
_FUZZ_PROFILE_ACTIVE with a combined check (e.g., profile in ("fuzz",
"extreme")) and use that combined flag in the condition that gates
_UNIT_TEST_WALL_CLOCK_LIMIT for pytest_runtest_teardown (referencing
_FUZZ_PROFILE_ACTIVE, pytest_runtest_teardown, and _UNIT_TEST_WALL_CLOCK_LIMIT).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/providers/test_health_prober.py`:
- Around line 345-347: The test mutates private validated state by setting
prober._interval = 0; instead, update the HealthProber implementation or the
test to avoid touching internals: add a constructor parameter (e.g.,
min_interval_override or allow_test_interval) to the HealthProber class so tests
can create an instance with interval < 1 without breaking validation, or alter
the test to use unittest.mock.patch.object to mock only the instance's
asyncio.wait_for/asyncio.sleep behavior (targeting the prober instance methods
used in HealthProber.run) and/or mock time.monotonic() for deterministic timing;
change references to prober._interval in the test to pass a small interval via
the new constructor arg or rely on the instance-scoped mock instead of directly
mutating _interval.

---

Outside diff comments:
In `@tests/conftest.py`:
- Around line 118-139: The wall-clock bypass currently checks only
_FUZZ_PROFILE_ACTIVE, so tests run under the new "extreme" Hypothesis profile
still hit the 8s unit limit; update the guard in pytest_runtest_teardown to
treat the "extreme" profile as long-running as well—either define a second flag
(e.g., _EXTREME_PROFILE_ACTIVE) or replace _FUZZ_PROFILE_ACTIVE with a combined
check (e.g., profile in ("fuzz", "extreme")) and use that combined flag in the
condition that gates _UNIT_TEST_WALL_CLOCK_LIMIT for pytest_runtest_teardown
(referencing _FUZZ_PROFILE_ACTIVE, pytest_runtest_teardown, and
_UNIT_TEST_WALL_CLOCK_LIMIT).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7add0104-fa73-4a76-93b6-26952326c251

📥 Commits

Reviewing files that changed from the base of the PR and between 97c6c92 and a0b4991.

📒 Files selected for processing (9)
  • CLAUDE.md
  • tests/conftest.py
  • tests/unit/api/conftest.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/config/test_utils_properties.py
  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/memory/test_sparse.py
  • tests/unit/providers/test_health_prober.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Agent
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
  • GitHub Check: Dependency Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14
All public functions and classes must have type hints and Google-style docstrings. Type hints are required; mypy runs in strict mode.
Enforce immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement.
Use frozen Pydantic models for config/identity; use mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields. Use @computed_field for derived values instead of storing redundant fields. Use NotBlankStr for all identifier/name fields.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code. Prefer structured concurrency over bare create_task.
Functions must be under 50 lines, files under 800 lines. Handle errors explicitly, never silently swallow them. Validate at system boundaries (user input, external APIs, config files).
Every module with business logic must have: from synthorg.observability import get_logger then logger = get_logger(__name__). Never use import logging or print() in application code.
All provider calls go through BaseCompletionProvider which applies retry and rate limiting automatically. Never implement retry logic in driver subclasses or calling code. Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig.
Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow. Maintain 80% minimum coverage. Always run pytest with -n 8 for parallelism; never sequentially.

Files:

  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/config/test_utils_properties.py
  • tests/unit/api/conftest.py
  • tests/unit/memory/test_sparse.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/providers/test_health_prober.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conftest.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use asyncio_mode = "auto" (set in pyproject.toml). Do not use real vendor names (Anthropic, OpenAI, Claude, GPT) — use generic names like example-provider, test-provider, large/medium/small. Property-based testing uses Hypothesis with profiles: ci (10 examples, deterministic), dev (1000 examples), fuzz (10,000 examples). Control via HYPOTHESIS_PROFILE env var.
Never skip or ignore flaky tests. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/config/test_utils_properties.py
  • tests/unit/api/conftest.py
  • tests/unit/memory/test_sparse.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/providers/test_health_prober.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conftest.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/config/test_utils_properties.py
  • tests/unit/api/conftest.py
  • tests/unit/memory/test_sparse.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/providers/test_health_prober.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conftest.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Always read the relevant docs/design/ page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why before proceeding.

Files:

  • CLAUDE.md
🧠 Learnings (33)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T18:36:23.190Z
Learning: Applies to tests/**/*.py : Use `asyncio_mode = "auto"` (set in pyproject.toml). Do not use real vendor names (Anthropic, OpenAI, Claude, GPT) — use generic names like `example-provider`, `test-provider`, `large`/`medium`/`small`. Property-based testing uses Hypothesis with profiles: `ci` (10 examples, deterministic), `dev` (1000 examples), `fuzz` (10,000 examples). Control via `HYPOTHESIS_PROFILE` env var.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing: `given` + `settings`; dev profile has 1000 examples (via `HYPOTHESIS_PROFILE=dev env var`), ci profile (default) has 50
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Run property tests with `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties`
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing in Python with `given` + `settings`. Control via `HYPOTHESIS_PROFILE` env var (dev: 1000 examples, ci: 200 examples).
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/config/test_utils_properties.py
  • tests/unit/api/controllers/test_setup.py
  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-03-31T16:58:29.857Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing: `given` + `settings`; dev profile has 1000 examples (via `HYPOTHESIS_PROFILE=dev env var`), ci profile (default) has 50

Applied to files:

  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/config/test_utils_properties.py
  • tests/unit/api/controllers/test_setup.py
  • tests/conftest.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)

Applied to files:

  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/config/test_utils_properties.py
  • tests/unit/api/controllers/test_setup.py
  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing in Python with `given` + `settings`. Control via `HYPOTHESIS_PROFILE` env var (dev: 1000 examples, ci: 200 examples).

Applied to files:

  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/config/test_utils_properties.py
  • tests/unit/api/controllers/test_setup.py
  • tests/conftest.py
📚 Learning: 2026-04-03T18:36:23.190Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T18:36:23.190Z
Learning: Applies to tests/**/*.py : Use `asyncio_mode = "auto"` (set in pyproject.toml). Do not use real vendor names (Anthropic, OpenAI, Claude, GPT) — use generic names like `example-provider`, `test-provider`, `large`/`medium`/`small`. Property-based testing uses Hypothesis with profiles: `ci` (10 examples, deterministic), `dev` (1000 examples), `fuzz` (10,000 examples). Control via `HYPOTHESIS_PROFILE` env var.

Applied to files:

  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/config/test_utils_properties.py
  • tests/unit/memory/test_sparse.py
  • tests/unit/api/controllers/test_setup.py
  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-03-31T16:58:29.857Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Run property tests with `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties`

Applied to files:

  • tests/unit/config/test_utils_properties.py
  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-03-16T19:13:36.562Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:36.562Z
Learning: Applies to src/synthorg/providers/**/*.py : RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Set `RetryConfig` and `RateLimiterConfig` per-provider in `ProviderConfig`.

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-03-31T21:07:37.469Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.469Z
Learning: Applies to src/synthorg/providers/**/*.py : Set `RetryConfig` and `RateLimiterConfig` per-provider in `ProviderConfig`

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests.

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/api/**/*.py : REST API: Litestar framework, controllers with guards, channels for WebSocket, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint. RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation).

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/synthorg/api/**/*.py : Litestar API must include setup wizard, auth/, auto-wiring, and lifecycle management

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/api/**/*.py : Authentication uses JWT + API key. Approval gate integration for high-risk operations.

Applied to files:

  • tests/unit/api/conftest.py
📚 Learning: 2026-04-02T20:17:33.300Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T20:17:33.300Z
Learning: Applies to tests/**/*.py : 30-second global timeout per test (set in `pyproject.toml`); non-default overrides like `timeout(60)` are allowed

Applied to files:

  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conftest.py
📚 Learning: 2026-03-16T07:02:24.531Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:02:24.531Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conftest.py
📚 Learning: 2026-04-02T08:47:46.313Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T08:47:46.313Z
Learning: Run unit tests with `uv run python -m pytest tests/ -m unit -n auto`; integration tests with `-m integration -n auto`; e2e tests with `-m e2e -n auto`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-04-02T08:47:46.313Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T08:47:46.313Z
Learning: Run full pytest suite with coverage: `uv run python -m pytest tests/ -n auto --cov=synthorg --cov-fail-under=80`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-04-03T18:36:23.190Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T18:36:23.190Z
Learning: Applies to **/*.py : Mark tests with `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow`. Maintain 80% minimum coverage. Always run pytest with `-n 8` for parallelism; never sequentially.

Applied to files:

  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-03-31T14:31:11.894Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:31:11.894Z
Learning: Export OpenAPI schema with `uv run python scripts/export_openapi.py` (required before docs build)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Test markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Coverage: 80% minimum. Async: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Timeout: 30 seconds per test. Parallelism: `pytest-xdist` via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.

Applied to files:

  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-04-02T08:47:46.313Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T08:47:46.313Z
Learning: Lint Python code with `uv run ruff check src/ tests/`; auto-fix with `--fix`; format with `uv run ruff format src/ tests/`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src,tests}/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small` as aliases

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src/synthorg/**/*.py,tests/**/*.py,web/src/**/*.{ts,tsx}} : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to {src,tests,web,cli,site}/**/*.{py,ts,tsx,go,astro} : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to tests/**/*.py : Flaky tests: NEVER skip, dismiss, or ignore flaky tests -- always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-04-03T18:36:23.190Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T18:36:23.190Z
Learning: Applies to tests/**/*.py : Never skip or ignore flaky tests. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic. For tasks that must block indefinitely, use `asyncio.Event().wait()` instead of `asyncio.sleep(large_number)`.

Applied to files:

  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/conftest.py
🔇 Additional comments (8)
tests/unit/engine/stagnation/test_properties.py (1)

58-58: Targeted runtime reduction looks good.

Lowering max_examples here keeps the properties intact while reducing hot-path unit test cost.

Also applies to: 74-74

tests/unit/tools/sandbox/test_subprocess_sandbox.py (1)

310-311: Timeout test speedup is solid.

The shorter blocking commands keep the same timed_out/success assertions and materially cut test runtime.

Also applies to: 317-317

tests/unit/config/test_utils_properties.py (1)

38-41: Good balance of coverage vs runtime.

Reducing recursive breadth and example counts here is a reasonable optimization for these property tests without weakening the core invariants being asserted.

Also applies to: 47-47, 55-55, 63-63, 69-69, 87-87, 100-100, 115-115, 122-122, 131-131

tests/unit/api/conftest.py (1)

18-18: Fixture update is appropriate for fuzz/property throughput.

Setting a high API test rate limit in root_config is a clean way to remove artificial 429 noise from high-volume property runs.

Also applies to: 217-224

CLAUDE.md (1)

41-41: Deep-fuzz command docs are now consistent and clearer.

The updated command and explanatory text correctly align on “all @given tests” plus disabled per-test timeout for long fuzz sessions.

Also applies to: 175-175

tests/unit/api/controllers/test_setup.py (1)

14-14: Great simplification of the normalization property test.

Switching this invariant check to the pure function keeps correctness intent while eliminating expensive API-path overhead already covered by explicit endpoint tests.

Also applies to: 18-18, 162-173

tests/unit/memory/test_sparse.py (1)

3-4: Excellent fuzz-regression hardening for Unicode normalization edge cases.

The explicit example plus degenerate-input guard captures the known NFKC corner case while keeping the non-empty-vector property strict for valid tokenizable inputs.

Also applies to: 7-7, 210-210, 218-227

tests/conftest.py (1)

86-105: Long-running Hypothesis profile setup is well-structured

The fuzz/extreme profile configuration and shared database wiring are clean and aligned with the intended deep-fuzz workflow.

Comment on lines +345 to +347
# Bypass the validated minimum so wait_for times out
# immediately between probe cycles instead of sleeping 1s.
prober._interval = 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify constructor validation and runtime _interval usage path.
rg -n -C3 '\binterval_seconds\b|self\._interval|wait_for\(' \
  src/synthorg/providers/health_prober.py \
  tests/unit/providers/test_health_prober.py

Repository: Aureliolo/synthorg

Length of output: 7442


Avoid mutating validated private state; use proper deterministic mocking instead.

Directly setting _interval = 0 bypasses constructor validation (which requires interval_seconds >= 1) and makes this test brittle to internal refactors. However, the proposed fix of patching synthorg.providers.health_prober.asyncio.wait_for at module level is problematic: it would also affect the test's own await asyncio.wait_for(done_event.wait(), timeout=10) call on line 366, causing it to fail unexpectedly.

Instead, consider:

  • Adding a constructor parameter to allow overriding the minimum interval for testing (e.g., allow_test_interval=True)
  • Or using unittest.mock.patch.object to mock only the instance's timeout behavior in the run loop
  • Or passing a very small interval (requires relaxing the >= 1 constraint for test scenarios)

Per coding guidelines, timing-sensitive tests should mock time.monotonic() and asyncio.sleep() for determinism, not mutate internals.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/providers/test_health_prober.py` around lines 345 - 347, The test
mutates private validated state by setting prober._interval = 0; instead, update
the HealthProber implementation or the test to avoid touching internals: add a
constructor parameter (e.g., min_interval_override or allow_test_interval) to
the HealthProber class so tests can create an instance with interval < 1 without
breaking validation, or alter the test to use unittest.mock.patch.object to mock
only the instance's asyncio.wait_for/asyncio.sleep behavior (targeting the
prober instance methods used in HealthProber.run) and/or mock time.monotonic()
for deterministic timing; change references to prober._interval in the test to
pass a small interval via the new constructor arg or rely on the instance-scoped
mock instead of directly mutating _interval.

- Include 'extreme' profile in wall-clock guardrail bypass
- Use sys.executable instead of 'python' in subprocess timeout test
- Use patch.object for prober interval instead of direct mutation
- Document extreme profile in CLAUDE.md testing section
@Aureliolo Aureliolo merged commit 1111602 into main Apr 3, 2026
10 of 11 checks passed
@Aureliolo Aureliolo deleted the test/hypothesis-fuzzing-session branch April 3, 2026 19:33
Aureliolo added a commit that referenced this pull request Apr 4, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.6.1](v0.6.0...v0.6.1)
(2026-04-04)


### Features

* capability-aware prompt profiles for model tier adaptation
([#1047](#1047))
([67650c5](67650c5)),
closes [#805](#805)
* implement procedural memory auto-generation from agent failures
([#1048](#1048))
([55f5206](55f5206)),
closes [#420](#420)
* implement quality scoring Layers 2+3 -- LLM judge and human override
([#1057](#1057))
([4a8adfe](4a8adfe)),
closes [#230](#230)
* token-based personality trimming via
PromptProfile.max_personality_tokens
([#1059](#1059))
([75afd52](75afd52)),
closes [#1045](#1045)
* workflow execution lifecycle + editor improvements
([#1058](#1058))
([7b54262](7b54262)),
closes [#1029](#1029)
[#1042](#1042)


### Refactoring

* **web:** address complexity and logging issues in dashboard
([#1056](#1056))
([ada997b](ada997b)),
closes [#1055](#1055)


### Documentation

* comprehensive documentation refresh
([#1050](#1050))
([c7a4259](c7a4259))


### Tests

* fix Hypothesis fuzzing infra and speed up slow unit tests
([#1044](#1044))
([1111602](1111602))


### Maintenance

* add text=auto catch-all to .gitattributes
([#1051](#1051))
([fc65d72](fc65d72))
* bump defu from 6.1.4 to 6.1.6 in /site
([#1062](#1062))
([f0cc439](f0cc439))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants