test: fix Hypothesis fuzzing infra and speed up slow unit tests by Aureliolo · Pull Request #1044 · Aureliolo/synthorg

Aureliolo · 2026-04-03T19:21:39Z

Summary

Ran a dedicated Hypothesis fuzzing session (5 seeds x 10k examples + 500k extreme run across all 46 @given test files, ~3M+ total inputs). Found 2 real bugs and 3 infra issues, fixed all of them, then optimized 9 slow unit tests.

Fuzzing Fixes

BM25 tokenizer Unicode edge case: BM25Tokenizer.encode('\ufc5e') returned empty vector for Arabic ligature U+FC5E (Unicode category "L") that decomposes entirely into combining marks under NFKC normalization. Added @example + assume() to handle degenerate NFKC inputs.
Flaky setup property test: test_description_normalization_invariants hit HTTP 429 after ~100 rapid POST requests due to rate limiter. Bumped API test fixture rate limit to 1M req/min. Also refactored the test to call normalize_description() directly instead of full HTTP round-trip (13s -> 0.25s).
Fuzz profile incompatible with pytest-timeout: Added suppress_health_check=list(HealthCheck) to fuzz profile so Hypothesis doesn't abandon slow tests. Made wall-clock guardrail skip when fuzz profile is active.
Added extreme profile: 500k examples for overnight fuzzing sessions.
Updated CLAUDE.md: Fuzz command uses --timeout=0 and drops -k properties to cover all 46 @given files, not just the 12 *_properties.py files.

Test Performance Improvements

9 tests brought under 1s:

Test	Before	After
`test_timeout_kills_process`	5.74s	0.52s
`test_description_normalization_invariants`	~13s	0.25s
`test_inputs_are_not_mutated`	2.30s	0.58s
`test_override_values_win_for_non_dict`	1.71s	0.65s
`test_identity_merge_with_empty`	1.48s	0.56s
`test_result_keys_are_union`	1.33s	0.51s
`test_format` (stagnation)	1.41s	0.17s
`test_determinism` (stagnation)	1.25s	0.69s
`test_run_loop_continues_on_probe_error`	1.22s	0.21s

Fixes: reduced property test strategy complexity (max_leaves 20->5), reduced max_examples overrides (100/200->50), shortened subprocess timeout command, bypassed health prober interval for test speed.

Test Plan

Full unit suite: 13,166 passed, 0 failed
Fuzz profile (10k examples, no seed): 13,132 passed, 0 failed
Extreme profile (500k examples): all property tests passed
mypy: 0 errors
ruff: 0 errors

Files Changed

CLAUDE.md -- fuzz command update
tests/conftest.py -- fuzz/extreme profiles, wall-clock guardrail
tests/unit/api/conftest.py -- rate limit override for property tests
tests/unit/api/controllers/test_setup.py -- refactored to test normalize_description directly
tests/unit/memory/test_sparse.py -- @example + assume() for Unicode edge case
tests/unit/config/test_utils_properties.py -- reduced strategy complexity
tests/unit/engine/stagnation/test_properties.py -- reduced max_examples
tests/unit/tools/sandbox/test_subprocess_sandbox.py -- faster timeout command
tests/unit/providers/test_health_prober.py -- bypass interval for test speed

@example

- Fix BM25 tokenizer test: add @example for Unicode ligature U+FC5E that decomposes to combining marks under NFKC, producing empty vector - Fix flaky setup test: bump API test rate limit to 1M req/min so Hypothesis 10k+ examples don't hit 429 - Add suppress_health_check to fuzz profile so Hypothesis doesn't abandon slow tests before reaching max_examples - Add extreme profile (500k examples) for overnight fuzzing sessions - Make wall-clock guardrail skip when fuzz profile is active - Update CLAUDE.md: fuzz command uses --timeout=0 and drops -k properties to cover all 46 files with @given decorators

- Refactor test_description_normalization_invariants to test normalize_description directly instead of full HTTP round-trip (13s -> 0.25s) - Reduce subprocess timeout test: ping -n 3 -> python sleep, killed after 0.5s (5.74s -> 0.52s) - Reduce deep_merge property test strategy complexity: max_leaves 20->5, max_size 8->5, max_examples 100->50 (2.3s -> 0.58s) - Reduce stagnation fingerprint property tests: max_examples 200->50 (1.4s -> 0.17s) - Fix health prober test: bypass interval_seconds=1 to avoid real 1s sleep between probe cycles (1.22s -> 0.21s)

coderabbitai · 2026-04-03T19:21:55Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 17ac093b-afd9-4ce1-b777-1aa675ee6bd6

📥 Commits

Reviewing files that changed from the base of the PR and between a0b4991 and 7d20397.

📒 Files selected for processing (4)

CLAUDE.md
tests/conftest.py
tests/unit/providers/test_health_prober.py
tests/unit/tools/sandbox/test_subprocess_sandbox.py

Walkthrough

Updated test harness and documentation to change fuzzing invocation to use HYPOTHESIS_PROFILE=fuzz with --timeout=0, register a new extreme Hypothesis profile, and set profile deadlines to None. Tests now suppress health checks and use a local Hypothesis database when fuzz profiles are active; a _FUZZ_PROFILE_ACTIVE flag disables the unit-test wall-clock failure guardrail under those profiles. Additional test adjustments include increasing test API rate limits, reducing property-test generation sizes, extracting a normalization function into unit tests, adding a Unicode edge-case example, tightening probe timing in a prober test, and changing subprocess timeout commands.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.00% which is insufficient. The required threshold is 40.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'test: fix Hypothesis fuzzing infra and speed up slow unit tests' directly summarizes the main changes: fixing fuzzing infrastructure and optimizing test performance.
Description check	✅ Passed	The description comprehensively covers the pull request changes, including the fuzzing session results, bugs discovered, infrastructure fixes, test performance improvements, and a detailed test plan.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request enhances the property-based testing infrastructure by introducing an 'extreme' Hypothesis profile for deep fuzzing and optimizing existing property tests to improve execution speed. Key changes include increasing API rate limits for tests, disabling wall-clock limits during fuzzing sessions, and refactoring specific tests to reduce overhead. Feedback highlights that the wall-clock limit bypass should be extended to include the newly added 'extreme' profile to prevent false failures during long-running sessions.

gemini-code-assist · 2026-04-03T19:23:21Z

tests/conftest.py

+# Disabled for fuzz profile where 10k examples per test routinely
+# exceed the limit.
 _UNIT_TEST_WALL_CLOCK_LIMIT = 8.0  # seconds
+_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") == "fuzz"


The _FUZZ_PROFILE_ACTIVE check should also include the newly added extreme profile. Since the extreme profile runs 500,000 examples per test, it will almost certainly exceed the 8-second wall-clock limit for unit tests, causing them to fail incorrectly when this profile is active.

Suggested change

_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") == "fuzz"

_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") in ("fuzz", "extreme")

github-actions · 2026-04-03T19:23:55Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 7d20397.

Ensure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copilot

Pull request overview

Updates the Python test suite’s Hypothesis fuzzing infrastructure and reduces runtime of several slow unit/property tests, primarily by adjusting Hypothesis profiles/strategies and removing expensive HTTP round-trips in a property test.

Changes:

Added/updated Hypothesis profiles (fuzz, new extreme) and adjusted the unit-test wall-clock guardrail behavior during fuzzing.
Refactored test_description_normalization_invariants to test normalize_description() directly and raised API test fixture rate limits to avoid 429s during fuzz runs.
Reduced property-test workload/complexity and sped up a few slow unit tests (subprocess timeout test, health prober loop test).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`CLAUDE.md`	Updates documented fuzzing command; needs to mention the new `extreme` profile to match `tests/conftest.py`.
`tests/conftest.py`	Adds `extreme` profile and suppresses health checks for fuzzing; adjusts wall-clock guardrail to skip under fuzz.
`tests/unit/api/conftest.py`	Raises API rate limits in test config to prevent 429s during high-volume Hypothesis runs.
`tests/unit/api/controllers/test_setup.py`	Property test now calls `normalize_description()` directly instead of doing HTTP POST per example.
`tests/unit/memory/test_sparse.py`	Adds explicit Unicode example and assumptions to handle degenerate NFKC tokenization cases.
`tests/unit/config/test_utils_properties.py`	Reduces strategy sizes and `max_examples` to speed up deep-merge/to-float property tests.
`tests/unit/engine/stagnation/test_properties.py`	Lowers `max_examples` for two fingerprint property tests.
`tests/unit/tools/sandbox/test_subprocess_sandbox.py`	Speeds up timeout test by shortening the long-running command.
`tests/unit/providers/test_health_prober.py`	Speeds up prober loop test by bypassing the interval sleep via private state mutation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T19:26:22Z

tests/conftest.py

@@ -115,7 +132,11 @@ def pytest_runtest_teardown(item: pytest.Item) -> None:
    if start is None:
        return
    elapsed = time.monotonic() - start
-    if item.get_closest_marker("unit") and elapsed > _UNIT_TEST_WALL_CLOCK_LIMIT:
+    if (
+        not _FUZZ_PROFILE_ACTIVE
+        and item.get_closest_marker("unit")
+        and elapsed > _UNIT_TEST_WALL_CLOCK_LIMIT
+    ):


The wall-clock guardrail skip is keyed only off HYPOTHESIS_PROFILE == "fuzz", but this PR also adds an extreme profile intended for long-running runs. As written, running with HYPOTHESIS_PROFILE=extreme will still fail unit tests for exceeding _UNIT_TEST_WALL_CLOCK_LIMIT, which seems contrary to the intent of adding an overnight fuzz profile. Consider broadening the check to treat both fuzz and extreme (or any non-ci profile) as exempt, or derive the flag from the loaded Hypothesis profile rather than a single env-var value.

Copilot · 2026-04-03T19:26:22Z

tests/unit/tools/sandbox/test_subprocess_sandbox.py

        if os.name == "nt":
            result = await subprocess_sandbox.execute(
-                command="cmd",
-                args=("/c", "ping", "-n", "10", "127.0.0.1"),
+                command="python",
+                args=("-c", "import time; time.sleep(5)"),
                timeout=0.5,
            )


On Windows this test now shells out to command="python", which can be missing on PATH depending on how the test environment is set up (e.g. only py launcher available). Since the test suite is already running under a specific interpreter, it’s more reliable to invoke that interpreter (e.g. via sys.executable) to avoid PATH-related flakes.

Copilot · 2026-04-03T19:26:22Z

tests/unit/providers/test_health_prober.py

+        # Bypass the validated minimum so wait_for times out
+        # immediately between probe cycles instead of sleeping 1s.
+        prober._interval = 0


This test reaches into ProviderHealthProber’s private state (prober._interval = 0) and intentionally violates the constructor’s validation (interval_seconds must be >= 1). That makes the test more brittle to refactors (e.g. renaming the attribute, adding invariants) and slightly changes behavior inside _probe_all (eligibility checks use _interval). Prefer avoiding private mutation here (e.g. keep interval_seconds=1 and mock/patch the wait/sleep between cycles so the loop advances without a real 1s delay).

Copilot · 2026-04-03T19:26:22Z

CLAUDE.md

 - **Property-based testing**: Python uses [Hypothesis](https://hypothesis.readthedocs.io/) (`@given` + `@settings`), React uses [fast-check](https://fast-check.dev/) (`fc.assert` + `fc.property`), Go uses native `testing.F` fuzz functions (`Fuzz*`). Hypothesis profiles configured in `tests/conftest.py`: `ci` (deterministic, `max_examples=10` + `derandomize=True` -- fixed seed per test, same inputs every run), `dev` (1000 examples), `fuzz` (10,000 examples, no deadline -- for dedicated fuzzing sessions). Controlled via `HYPOTHESIS_PROFILE` env var. `.hypothesis/` is gitignored. Failing examples are persisted to `~/.synthorg/hypothesis-examples/` (write-only shared DB, survives worktree deletion) via `_WriteOnlyDatabase` in `tests/conftest.py`.
- **Hypothesis workflow**: CI runs 10 deterministic examples per property test (`derandomize=True` -- same inputs every run, no flakes). Random fuzzing runs locally: `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n 8 -k properties` (quick, 1000 examples) or `HYPOTHESIS_PROFILE=fuzz uv run python -m pytest tests/ -m unit -n 8 -k properties` (deep, 10,000 examples, no deadline). When Hypothesis finds a failure, it is a **real bug** -- the shrunk example is saved to `~/.synthorg/hypothesis-examples/` for analysis but is **not replayed** automatically (that would block all test runs). Do NOT just rerun and move on. Read the failing example from the output, fix the underlying bug, and add an explicit `@example(...)` decorator to the test so the case is permanently covered in CI.
+- **Hypothesis workflow**: CI runs 10 deterministic examples per property test (`derandomize=True` -- same inputs every run, no flakes). Random fuzzing runs locally: `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n 8 -k properties` (quick, 1000 examples) or `HYPOTHESIS_PROFILE=fuzz uv run python -m pytest tests/ -m unit -n 8 --timeout=0` (deep, 10,000 examples, no deadline, all `@given` tests -- `--timeout=0` disables the 30s per-test limit that would kill long-running property tests; `-k properties` is intentionally omitted to cover all 46 files with `@given`, not just the 12 `*_properties.py` files). When Hypothesis finds a failure, it is a **real bug** -- the shrunk example is saved to `~/.synthorg/hypothesis-examples/` for analysis but is **not replayed** automatically (that would block all test runs). Do NOT just rerun and move on. Read the failing example from the output, fix the underlying bug, and add an explicit `@example(...)` decorator to the test so the case is permanently covered in CI.


The testing docs still list Hypothesis profiles as only ci, dev, and fuzz, but tests/conftest.py now also registers an extreme profile. Updating this section to mention extreme (and when to use it) will keep CLAUDE.md consistent with the actual test configuration.

codecov · 2026-04-03T19:27:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.67%. Comparing base (97c6c92) to head (7d20397).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1044      +/-   ##
==========================================
- Coverage   90.67%   90.67%   -0.01%     
==========================================
  Files         697      697              
  Lines       39593    39593              
  Branches     3960     3960              
==========================================
- Hits        35902    35901       -1     
- Misses       3000     3001       +1     
  Partials      691      691

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/conftest.py (1)

118-139: ⚠️ Potential issue | 🟠 Major

Include extreme in the wall-clock guardrail bypass

Line 118 only treats HYPOTHESIS_PROFILE=="fuzz" as long-running. With the new extreme profile (Line 97), Line 135 still enforces the 8s unit-test limit, so extreme runs will fail prematurely.

Proposed fix

-_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") == "fuzz"
+_LONG_RUNNING_HYPOTHESIS_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") in {
+    "fuzz",
+    "extreme",
+}
@@
-    if (
-        not _FUZZ_PROFILE_ACTIVE
+    if (
+        not _LONG_RUNNING_HYPOTHESIS_PROFILE_ACTIVE
         and item.get_closest_marker("unit")
         and elapsed > _UNIT_TEST_WALL_CLOCK_LIMIT
     ):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/conftest.py` around lines 118 - 139, The wall-clock bypass currently
checks only _FUZZ_PROFILE_ACTIVE, so tests run under the new "extreme"
Hypothesis profile still hit the 8s unit limit; update the guard in
pytest_runtest_teardown to treat the "extreme" profile as long-running as
well—either define a second flag (e.g., _EXTREME_PROFILE_ACTIVE) or replace
_FUZZ_PROFILE_ACTIVE with a combined check (e.g., profile in ("fuzz",
"extreme")) and use that combined flag in the condition that gates
_UNIT_TEST_WALL_CLOCK_LIMIT for pytest_runtest_teardown (referencing
_FUZZ_PROFILE_ACTIVE, pytest_runtest_teardown, and _UNIT_TEST_WALL_CLOCK_LIMIT).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/providers/test_health_prober.py`:
- Around line 345-347: The test mutates private validated state by setting
prober._interval = 0; instead, update the HealthProber implementation or the
test to avoid touching internals: add a constructor parameter (e.g.,
min_interval_override or allow_test_interval) to the HealthProber class so tests
can create an instance with interval < 1 without breaking validation, or alter
the test to use unittest.mock.patch.object to mock only the instance's
asyncio.wait_for/asyncio.sleep behavior (targeting the prober instance methods
used in HealthProber.run) and/or mock time.monotonic() for deterministic timing;
change references to prober._interval in the test to pass a small interval via
the new constructor arg or rely on the instance-scoped mock instead of directly
mutating _interval.

---

Outside diff comments:
In `@tests/conftest.py`:
- Around line 118-139: The wall-clock bypass currently checks only
_FUZZ_PROFILE_ACTIVE, so tests run under the new "extreme" Hypothesis profile
still hit the 8s unit limit; update the guard in pytest_runtest_teardown to
treat the "extreme" profile as long-running as well—either define a second flag
(e.g., _EXTREME_PROFILE_ACTIVE) or replace _FUZZ_PROFILE_ACTIVE with a combined
check (e.g., profile in ("fuzz", "extreme")) and use that combined flag in the
condition that gates _UNIT_TEST_WALL_CLOCK_LIMIT for pytest_runtest_teardown
(referencing _FUZZ_PROFILE_ACTIVE, pytest_runtest_teardown, and
_UNIT_TEST_WALL_CLOCK_LIMIT).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7add0104-fa73-4a76-93b6-26952326c251

📥 Commits

Reviewing files that changed from the base of the PR and between 97c6c92 and a0b4991.

📒 Files selected for processing (9)

CLAUDE.md
tests/conftest.py
tests/unit/api/conftest.py
tests/unit/api/controllers/test_setup.py
tests/unit/config/test_utils_properties.py
tests/unit/engine/stagnation/test_properties.py
tests/unit/memory/test_sparse.py
tests/unit/providers/test_health_prober.py
tests/unit/tools/sandbox/test_subprocess_sandbox.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Agent
GitHub Check: Test (Python 3.14)
GitHub Check: Analyze (python)
GitHub Check: Dependency Review

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14
All public functions and classes must have type hints and Google-style docstrings. Type hints are required; mypy runs in strict mode.
Enforce immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement.
Use frozen Pydantic models for config/identity; use mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields. Use @computed_field for derived values instead of storing redundant fields. Use NotBlankStr for all identifier/name fields.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code. Prefer structured concurrency over bare create_task.
Functions must be under 50 lines, files under 800 lines. Handle errors explicitly, never silently swallow them. Validate at system boundaries (user input, external APIs, config files).
Every module with business logic must have: from synthorg.observability import get_logger then logger = get_logger(__name__). Never use import logging or print() in application code.
All provider calls go through BaseCompletionProvider which applies retry and rate limiting automatically. Never implement retry logic in driver subclasses or calling code. Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig.
Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow. Maintain 80% minimum coverage. Always run pytest with -n 8 for parallelism; never sequentially.

Files:

tests/unit/engine/stagnation/test_properties.py
tests/unit/config/test_utils_properties.py
tests/unit/api/conftest.py
tests/unit/memory/test_sparse.py
tests/unit/api/controllers/test_setup.py
tests/unit/providers/test_health_prober.py
tests/unit/tools/sandbox/test_subprocess_sandbox.py
tests/conftest.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use asyncio_mode = "auto" (set in pyproject.toml). Do not use real vendor names (Anthropic, OpenAI, Claude, GPT) — use generic names like example-provider, test-provider, large/medium/small. Property-based testing uses Hypothesis with profiles: ci (10 examples, deterministic), dev (1000 examples), fuzz (10,000 examples). Control via HYPOTHESIS_PROFILE env var.
Never skip or ignore flaky tests. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

tests/unit/engine/stagnation/test_properties.py
tests/unit/config/test_utils_properties.py
tests/unit/api/conftest.py
tests/unit/memory/test_sparse.py
tests/unit/api/controllers/test_setup.py
tests/unit/providers/test_health_prober.py
tests/unit/tools/sandbox/test_subprocess_sandbox.py
tests/conftest.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/engine/stagnation/test_properties.py
tests/unit/config/test_utils_properties.py
tests/unit/api/conftest.py
tests/unit/memory/test_sparse.py
tests/unit/api/controllers/test_setup.py
tests/unit/providers/test_health_prober.py
tests/unit/tools/sandbox/test_subprocess_sandbox.py
tests/conftest.py

**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Always read the relevant docs/design/ page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why before proceeding.

Files:

CLAUDE.md

🧠 Learnings (33)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T18:36:23.190Z
Learning: Applies to tests/**/*.py : Use `asyncio_mode = "auto"` (set in pyproject.toml). Do not use real vendor names (Anthropic, OpenAI, Claude, GPT) — use generic names like `example-provider`, `test-provider`, `large`/`medium`/`small`. Property-based testing uses Hypothesis with profiles: `ci` (10 examples, deterministic), `dev` (1000 examples), `fuzz` (10,000 examples). Control via `HYPOTHESIS_PROFILE` env var.

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing: `given` + `settings`; dev profile has 1000 examples (via `HYPOTHESIS_PROFILE=dev env var`), ci profile (default) has 50

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Run property tests with `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties`

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing in Python with `given` + `settings`. Control via `HYPOTHESIS_PROFILE` env var (dev: 1000 examples, ci: 200 examples).

📚 Learning: 2026-03-15T18:28:13.207Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

tests/unit/engine/stagnation/test_properties.py
tests/unit/config/test_utils_properties.py
tests/unit/api/controllers/test_setup.py
CLAUDE.md
tests/conftest.py

📚 Learning: 2026-03-31T16:58:29.857Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing: `given` + `settings`; dev profile has 1000 examples (via `HYPOTHESIS_PROFILE=dev env var`), ci profile (default) has 50

Applied to files:

tests/unit/engine/stagnation/test_properties.py
tests/unit/config/test_utils_properties.py
tests/unit/api/controllers/test_setup.py
tests/conftest.py

📚 Learning: 2026-03-15T18:42:17.990Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)

Applied to files:

tests/unit/engine/stagnation/test_properties.py
tests/unit/config/test_utils_properties.py
tests/unit/api/controllers/test_setup.py
CLAUDE.md
tests/conftest.py

📚 Learning: 2026-03-17T06:30:14.180Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing in Python with `given` + `settings`. Control via `HYPOTHESIS_PROFILE` env var (dev: 1000 examples, ci: 200 examples).

Applied to files:

tests/unit/engine/stagnation/test_properties.py
tests/unit/config/test_utils_properties.py
tests/unit/api/controllers/test_setup.py
tests/conftest.py

📚 Learning: 2026-04-03T18:36:23.190Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T18:36:23.190Z
Learning: Applies to tests/**/*.py : Use `asyncio_mode = "auto"` (set in pyproject.toml). Do not use real vendor names (Anthropic, OpenAI, Claude, GPT) — use generic names like `example-provider`, `test-provider`, `large`/`medium`/`small`. Property-based testing uses Hypothesis with profiles: `ci` (10 examples, deterministic), `dev` (1000 examples), `fuzz` (10,000 examples). Control via `HYPOTHESIS_PROFILE` env var.

Applied to files:

tests/unit/engine/stagnation/test_properties.py
tests/unit/config/test_utils_properties.py
tests/unit/memory/test_sparse.py
tests/unit/api/controllers/test_setup.py
CLAUDE.md
tests/conftest.py

📚 Learning: 2026-03-31T16:58:29.857Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Run property tests with `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties`

Applied to files:

tests/unit/config/test_utils_properties.py
CLAUDE.md
tests/conftest.py

📚 Learning: 2026-03-16T19:13:36.562Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:36.562Z
Learning: Applies to src/synthorg/providers/**/*.py : RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-03-20T11:18:48.128Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Set `RetryConfig` and `RateLimiterConfig` per-provider in `ProviderConfig`.

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-03-31T21:07:37.469Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.469Z
Learning: Applies to src/synthorg/providers/**/*.py : Set `RetryConfig` and `RateLimiterConfig` per-provider in `ProviderConfig`

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-03-20T08:28:32.845Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-03-19T07:12:14.508Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests.

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-03-19T07:12:14.508Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/api/**/*.py : REST API: Litestar framework, controllers with guards, channels for WebSocket, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint. RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation).

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-03-26T15:18:16.848Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/synthorg/api/**/*.py : Litestar API must include setup wizard, auth/, auto-wiring, and lifecycle management

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-03-15T19:14:27.144Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/api/**/*.py : Authentication uses JWT + API key. Approval gate integration for high-risk operations.

Applied to files:

tests/unit/api/conftest.py

📚 Learning: 2026-04-02T20:17:33.300Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T20:17:33.300Z
Learning: Applies to tests/**/*.py : 30-second global timeout per test (set in `pyproject.toml`); non-default overrides like `timeout(60)` are allowed

Applied to files:

tests/unit/tools/sandbox/test_subprocess_sandbox.py
tests/conftest.py

📚 Learning: 2026-03-16T07:02:24.531Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:02:24.531Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

tests/unit/tools/sandbox/test_subprocess_sandbox.py
tests/conftest.py

📚 Learning: 2026-04-02T08:47:46.313Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T08:47:46.313Z
Learning: Run unit tests with `uv run python -m pytest tests/ -m unit -n auto`; integration tests with `-m integration -n auto`; e2e tests with `-m e2e -n auto`

Applied to files:

CLAUDE.md

📚 Learning: 2026-04-02T08:47:46.313Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T08:47:46.313Z
Learning: Run full pytest suite with coverage: `uv run python -m pytest tests/ -n auto --cov=synthorg --cov-fail-under=80`

Applied to files:

CLAUDE.md

📚 Learning: 2026-04-03T18:36:23.190Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T18:36:23.190Z
Learning: Applies to **/*.py : Mark tests with `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow`. Maintain 80% minimum coverage. Always run pytest with `-n 8` for parallelism; never sequentially.

Applied to files:

CLAUDE.md
tests/conftest.py

📚 Learning: 2026-03-31T14:31:11.894Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:31:11.894Z
Learning: Export OpenAPI schema with `uv run python scripts/export_openapi.py` (required before docs build)

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-17T22:08:13.456Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Test markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Coverage: 80% minimum. Async: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Timeout: 30 seconds per test. Parallelism: `pytest-xdist` via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.

Applied to files:

CLAUDE.md
tests/conftest.py

📚 Learning: 2026-04-02T08:47:46.313Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T08:47:46.313Z
Learning: Lint Python code with `uv run ruff check src/ tests/`; auto-fix with `--fix`; format with `uv run ruff format src/ tests/`

Applied to files:

CLAUDE.md

📚 Learning: 2026-04-01T15:36:39.993Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src,tests}/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small` as aliases

Applied to files:

CLAUDE.md

📚 Learning: 2026-04-01T15:36:39.993Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src/synthorg/**/*.py,tests/**/*.py,web/src/**/*.{ts,tsx}} : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small`

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T18:28:13.207Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to {src,tests,web,cli,site}/**/*.{py,ts,tsx,go,astro} : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-31T21:07:37.470Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-16T07:22:28.134Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

CLAUDE.md
tests/conftest.py

📚 Learning: 2026-04-01T15:36:39.993Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to tests/**/*.py : Flaky tests: NEVER skip, dismiss, or ignore flaky tests -- always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

CLAUDE.md

📚 Learning: 2026-04-03T18:36:23.190Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T18:36:23.190Z
Learning: Applies to tests/**/*.py : Never skip or ignore flaky tests. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic. For tasks that must block indefinitely, use `asyncio.Event().wait()` instead of `asyncio.sleep(large_number)`.

Applied to files:

CLAUDE.md
tests/conftest.py

📚 Learning: 2026-03-20T21:44:04.528Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

tests/conftest.py

🔇 Additional comments (8)

tests/unit/engine/stagnation/test_properties.py (1)

58-58: Targeted runtime reduction looks good.

Lowering max_examples here keeps the properties intact while reducing hot-path unit test cost.

Also applies to: 74-74

tests/unit/tools/sandbox/test_subprocess_sandbox.py (1)

310-311: Timeout test speedup is solid.

The shorter blocking commands keep the same timed_out/success assertions and materially cut test runtime.

Also applies to: 317-317

tests/unit/config/test_utils_properties.py (1)

38-41: Good balance of coverage vs runtime.

Reducing recursive breadth and example counts here is a reasonable optimization for these property tests without weakening the core invariants being asserted.

Also applies to: 47-47, 55-55, 63-63, 69-69, 87-87, 100-100, 115-115, 122-122, 131-131

tests/unit/api/conftest.py (1)

18-18: Fixture update is appropriate for fuzz/property throughput.

Setting a high API test rate limit in root_config is a clean way to remove artificial 429 noise from high-volume property runs.

Also applies to: 217-224

CLAUDE.md (1)

41-41: Deep-fuzz command docs are now consistent and clearer.

The updated command and explanatory text correctly align on “all @given tests” plus disabled per-test timeout for long fuzz sessions.

Also applies to: 175-175

tests/unit/api/controllers/test_setup.py (1)

14-14: Great simplification of the normalization property test.

Switching this invariant check to the pure function keeps correctness intent while eliminating expensive API-path overhead already covered by explicit endpoint tests.

Also applies to: 18-18, 162-173

tests/unit/memory/test_sparse.py (1)

3-4: Excellent fuzz-regression hardening for Unicode normalization edge cases.

The explicit example plus degenerate-input guard captures the known NFKC corner case while keeping the non-empty-vector property strict for valid tokenizable inputs.

Also applies to: 7-7, 210-210, 218-227

tests/conftest.py (1)

86-105: Long-running Hypothesis profile setup is well-structured

The fuzz/extreme profile configuration and shared database wiring are clean and aligned with the intended deep-fuzz workflow.

coderabbitai · 2026-04-03T19:27:25Z

tests/unit/providers/test_health_prober.py

+        # Bypass the validated minimum so wait_for times out
+        # immediately between probe cycles instead of sleeping 1s.
+        prober._interval = 0


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Verify constructor validation and runtime _interval usage path. rg -n -C3 '\binterval_seconds\b|self\._interval|wait_for\(' \ src/synthorg/providers/health_prober.py \ tests/unit/providers/test_health_prober.py

Repository: Aureliolo/synthorg

Length of output: 7442

Avoid mutating validated private state; use proper deterministic mocking instead.

Directly setting _interval = 0 bypasses constructor validation (which requires interval_seconds >= 1) and makes this test brittle to internal refactors. However, the proposed fix of patching synthorg.providers.health_prober.asyncio.wait_for at module level is problematic: it would also affect the test's own await asyncio.wait_for(done_event.wait(), timeout=10) call on line 366, causing it to fail unexpectedly.

Instead, consider:

Adding a constructor parameter to allow overriding the minimum interval for testing (e.g., allow_test_interval=True)

Or using unittest.mock.patch.object to mock only the instance's timeout behavior in the run loop

Or passing a very small interval (requires relaxing the >= 1 constraint for test scenarios)

Per coding guidelines, timing-sensitive tests should mock time.monotonic() and asyncio.sleep() for determinism, not mutate internals.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/providers/test_health_prober.py` around lines 345 - 347, The test mutates private validated state by setting prober._interval = 0; instead, update the HealthProber implementation or the test to avoid touching internals: add a constructor parameter (e.g., min_interval_override or allow_test_interval) to the HealthProber class so tests can create an instance with interval < 1 without breaking validation, or alter the test to use unittest.mock.patch.object to mock only the instance's asyncio.wait_for/asyncio.sleep behavior (targeting the prober instance methods used in HealthProber.run) and/or mock time.monotonic() for deterministic timing; change references to prober._interval in the test to pass a small interval via the new constructor arg or rely on the instance-scoped mock instead of directly mutating _interval.

- Include 'extreme' profile in wall-clock guardrail bypass - Use sys.executable instead of 'python' in subprocess timeout test - Use patch.object for prober interval instead of direct mutation - Document extreme profile in CLAUDE.md testing section

🤖 I have created a release *beep* *boop* --- ## [0.6.1](v0.6.0...v0.6.1) (2026-04-04) ### Features * capability-aware prompt profiles for model tier adaptation ([#1047](#1047)) ([67650c5](67650c5)), closes [#805](#805) * implement procedural memory auto-generation from agent failures ([#1048](#1048)) ([55f5206](55f5206)), closes [#420](#420) * implement quality scoring Layers 2+3 -- LLM judge and human override ([#1057](#1057)) ([4a8adfe](4a8adfe)), closes [#230](#230) * token-based personality trimming via PromptProfile.max_personality_tokens ([#1059](#1059)) ([75afd52](75afd52)), closes [#1045](#1045) * workflow execution lifecycle + editor improvements ([#1058](#1058)) ([7b54262](7b54262)), closes [#1029](#1029) [#1042](#1042) ### Refactoring * **web:** address complexity and logging issues in dashboard ([#1056](#1056)) ([ada997b](ada997b)), closes [#1055](#1055) ### Documentation * comprehensive documentation refresh ([#1050](#1050)) ([c7a4259](c7a4259)) ### Tests * fix Hypothesis fuzzing infra and speed up slow unit tests ([#1044](#1044)) ([1111602](1111602)) ### Maintenance * add text=auto catch-all to .gitattributes ([#1051](#1051)) ([fc65d72](fc65d72)) * bump defu from 6.1.4 to 6.1.6 in /site ([#1062](#1062)) ([f0cc439](f0cc439)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Aureliolo added 3 commits April 3, 2026 20:54

fix: remove unused type-ignore comment in health prober test

a0b4991

Copilot AI review requested due to automatic review settings April 3, 2026 19:21

Aureliolo temporarily deployed to ci April 3, 2026 19:21 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Aureliolo April 3, 2026 19:22 View session

gemini-code-assist bot reviewed Apr 3, 2026

View reviewed changes

Copilot AI reviewed Apr 3, 2026

View reviewed changes

coderabbitai bot requested changes Apr 3, 2026

View reviewed changes

fix: address PR review feedback

7d20397

- Include 'extreme' profile in wall-clock guardrail bypass - Use sys.executable instead of 'python' in subprocess timeout test - Use patch.object for prober interval instead of direct mutation - Document extreme profile in CLAUDE.md testing section

Aureliolo merged commit 1111602 into main Apr 3, 2026
10 of 11 checks passed

Aureliolo deleted the test/hypothesis-fuzzing-session branch April 3, 2026 19:33

Aureliolo temporarily deployed to ci April 3, 2026 19:33 — with GitHub Actions Inactive

Aureliolo mentioned this pull request Apr 3, 2026

chore(main): release 0.6.1 #1046

Merged

	_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") == "fuzz"
	_FUZZ_PROFILE_ACTIVE = os.environ.get("HYPOTHESIS_PROFILE") in ("fuzz", "extreme")

Conversation

Aureliolo commented Apr 3, 2026

Summary

Fuzzing Fixes

Test Performance Improvements

Test Plan

Files Changed

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading

codecov bot commented Apr 3, 2026 •

edited

Loading