perf(test): speed up test suite -- reduce Hypothesis examples and eliminate real sleeps by Aureliolo · Pull Request #557 · Aureliolo/synthorg

Aureliolo · 2026-03-18T21:14:17Z

Summary

Lower Hypothesis CI profile from 200 to 50 max_examples -- property tests find bugs in the first ~20 examples; 200 was overkill for CI feedback loops (only 3 tests in test_resolver.py use the global profile; 74/77 property tests have explicit per-test @settings)
Replace 8x asyncio.sleep(0.05) with asyncio.sleep(0) in 4 unit test files -- cooperative yielding preserves concurrency semantics without wasting wall-clock time
Replace 1x fixed sleep with bounded condition poll in test_task_engine_coverage.py -- polls _in_flight state up to 200 yields instead of a fixed 50ms delay
Update CLAUDE.md to reflect new CI profile (50 examples)

Test plan

All 9,432 tests pass (9 skipped -- symlinks/Docker/real LLM)
93.81% coverage (>80% threshold)
mypy strict: no issues in 1,123 source files
ruff lint + format: clean
Concurrency tests verified correct by async-concurrency-reviewer agent (traced full execution paths through production code)
Pre-push hooks pass (mypy + full unit suite)

Review coverage

Pre-reviewed by 3 agents (docs-consistency, test-quality, async-concurrency), 1 finding addressed (descriptive assertion message on poll loop).

🤖 Generated with Claude Code

…minate real sleeps Lower CI Hypothesis profile from 200 to 50 examples (property tests find bugs in the first ~20; 200 was overkill for CI feedback loops). Replace 8x asyncio.sleep(0.05) with asyncio.sleep(0) in unit tests -- cooperative yielding preserves concurrency semantics without wasting wall-clock time. Replace 1x fixed sleep with condition poll in test_task_engine_coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pre-reviewed by 3 agents, 1 finding addressed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-18T21:14:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fbdc1eaf-c536-45b3-93d0-462d813de5cb

📥 Commits

Reviewing files that changed from the base of the PR and between 0e52c47 and a16db4d.

📒 Files selected for processing (6)

CLAUDE.md
tests/conftest.py
tests/unit/communication/test_bus_memory.py
tests/unit/communication/test_dispatcher.py
tests/unit/engine/test_parallel.py
tests/unit/engine/test_task_engine_coverage.py

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test (Python 3.14)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) — PEP 758 except syntax enforced by ruff for Python 3.14.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).

Files:

tests/unit/communication/test_bus_memory.py
tests/unit/communication/test_dispatcher.py
tests/unit/engine/test_task_engine_coverage.py
tests/conftest.py
tests/unit/engine/test_parallel.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async testing: asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Tests must use test-provider, test-small-001, etc. instead of real vendor names.
Property-based testing in Python uses Hypothesis (@given + @settings). Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

tests/unit/communication/test_bus_memory.py
tests/unit/communication/test_dispatcher.py
tests/unit/engine/test_task_engine_coverage.py
tests/conftest.py
tests/unit/engine/test_parallel.py

🧠 Learnings (10)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

CLAUDE.md
tests/conftest.py

📚 Learning: 2026-03-18T20:21:08.353Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to tests/**/*.py : Property-based testing in Python uses Hypothesis (given + settings). Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

CLAUDE.md
tests/conftest.py

📚 Learning: 2026-03-18T20:21:08.353Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Parallelism: pytest-xdist via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-18T20:21:08.353Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to {src,tests,web,cli,site}/**/*.{py,ts,tsx,go,astro} : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Tests must use test-provider, test-small-001, etc. for vendor-agnostic test data.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-18T20:21:08.353Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to src/synthorg/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-18T20:21:08.353Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to tests/**/*.py : Tests must use test-provider, test-small-001, etc. instead of real vendor names.

Applied to files:

CLAUDE.md

🔇 Additional comments (10)

CLAUDE.md (1)

228-228: LGTM!

Documentation correctly updated to reflect the new Hypothesis CI profile setting (50 examples). This aligns with the tests/conftest.py changes.

tests/conftest.py (1)

10-14: LGTM!

Reducing max_examples from 200 to 50 for the CI profile is a reasonable trade-off for faster test runs. As the PR notes, most property tests specify per-test settings, and bugs are typically found within the first ~20 examples. The dev profile remains at 1000 for thorough local testing.

tests/unit/communication/test_dispatcher.py (1)

305-333: LGTM!

Using asyncio.sleep(0) instead of non-zero sleeps correctly yields control to the event loop, preserving the concurrency verification semantics. Both handlers still interleave (start before either ends) because gather schedules them concurrently and sleep(0) provides yield points for task switching.

tests/unit/engine/test_task_engine_coverage.py (1)

53-64: LGTM!

Replacing the fixed sleep with a bounded poll loop is the right approach for deterministic testing. The loop correctly:

Yields cooperatively with asyncio.sleep(0)

Has a bounded iteration count (200) to prevent hangs

Provides a clear assertion message on failure

The pattern aligns with the coding guideline to make timing-sensitive tests deterministic.

tests/unit/communication/test_bus_memory.py (4)

281-289: LGTM!

Using asyncio.sleep(0) correctly yields control, allowing the receiver() task to start and block on bus.receive() before unsubscriber() proceeds. The TaskGroup ensures both tasks are scheduled, and the yield point enables proper interleaving.

304-315: LGTM!

Same valid pattern as above — the zero-duration sleep ensures all three receiver tasks can start and block before the unsubscribe action.

397-406: LGTM!

The publisher yields with asyncio.sleep(0) to let the receiver start and block on bus.receive() before publishing the message. Correct concurrency pattern.

675-683: LGTM!

The stop_after_delay() helper correctly yields to allow the receive call to start before stopping the bus, ensuring the test validates that receive() returns None on shutdown.

tests/unit/engine/test_parallel.py (2)

248-258: LGTM!

The asyncio.sleep(0) is sufficient here because the test verifies semaphore-enforced concurrency limits, not timing-dependent behavior. The yield point still allows task interleaving while the max_concurrency=2 semaphore controls the actual concurrency bound.

651-685: LGTM!

Same pattern as the concurrency limit test — the zero-duration sleep provides a yield point while the semaphore enforces the max_concurrency limit. The test correctly verifies that in_progress from progress callbacks never exceeds the configured limit.

📝 Walkthrough

Summary by CodeRabbit

Documentation
- Updated testing guidance to prohibit skipping flaky tests; recommends deterministic testing through mocking instead.
Tests
- Refactored test timing by removing artificial delays to improve execution speed while maintaining async scheduling semantics.
Chores
- Reduced Hypothesis CI test examples from 200 to 50 for faster continuous integration runs.

Walkthrough

These changes optimize test execution by reducing Hypothesis test case counts in CI, replacing intentional sleep delays with zero-duration sleeps or polling mechanisms, and updating testing documentation regarding flaky test handling.

Changes

Cohort / File(s)	Summary
Testing Configuration & Documentation `CLAUDE.md`, `tests/conftest.py`	Updated Hypothesis CI profile to reduce max_examples from 200 to 50; development profile remains at 1000. Added guidance that flaky tests must not be skipped and recommends deterministic testing via mocking.
Test Sleep Optimizations `tests/unit/communication/test_bus_memory.py`, `tests/unit/communication/test_dispatcher.py`, `tests/unit/engine/test_parallel.py`	Replaced asyncio.sleep calls with zero-duration sleeps (sleep(0)) in concurrency and scheduling tests, accelerating execution while preserving async mechanics and assertion logic.
Test Polling Enhancement `tests/unit/engine/test_task_engine_coverage.py`	Replaced fixed asyncio.sleep(0.05) with polling loop that yields up to 200 times until target condition is met, improving test robustness and providing descriptive assertion messaging.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.84% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main changes: reducing Hypothesis examples and eliminating real sleeps in the test suite.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing all modifications, rationale, and validation results.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/test-suite-speedup

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch perf/test-suite-speedup

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-18T21:14:30Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

gemini-code-assist · 2026-03-18T21:14:38Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the efficiency of the test suite by optimizing property-based testing and asynchronous test execution. It reduces the number of examples run for Hypothesis in CI, and replaces fixed time delays with cooperative yielding or condition polling in several unit tests. These changes aim to provide faster feedback cycles in CI and enhance the determinism of timing-sensitive tests, without compromising test coverage or reliability.

Highlights

Test Performance: Reduced the number of examples for the Hypothesis CI profile from 200 to 50, significantly speeding up property-based tests in continuous integration.
Asynchronous Test Efficiency: Replaced fixed asyncio.sleep(0.05) calls with asyncio.sleep(0) in multiple unit tests, allowing cooperative yielding without introducing artificial wall-clock delays and improving test execution speed.
Deterministic Timing: Updated test_task_engine_coverage.py to use a bounded condition poll instead of a fixed sleep, enhancing test determinism and reducing potential flakiness in timing-sensitive scenarios.
Documentation Update: Reflected the new Hypothesis CI profile (50 examples) in the CLAUDE.md documentation to keep it current with the testing configuration.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request focuses on improving the performance of the test suite. The changes include reducing the number of examples for Hypothesis property-based tests in the CI profile, and replacing fixed-time asyncio.sleep() calls with asyncio.sleep(0) or a more robust polling mechanism. These are excellent changes that will speed up the CI feedback loop and make the tests more reliable. I have one minor suggestion to improve the maintainability of the new polling loop by avoiding a magic number.

gemini-code-assist · 2026-03-18T21:17:52Z

tests/unit/engine/test_task_engine_coverage.py

        )
-        await asyncio.sleep(0.05)
+        # Wait for the engine to enter _process_one and hit slow_save
+        for _ in range(200):


The polling limit 200 is a magic number. It's also hardcoded in the assertion message on line 62. To improve readability and maintainability, consider defining it as a constant at the start of the test method (e.g., MAX_POLL_YIELDS = 200) and using it in both the loop and in an f-string for the assertion message. This will ensure they stay in sync.

codecov · 2026-03-18T21:20:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.45%. Comparing base (0e52c47) to head (a16db4d).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #557   +/-   ##
=======================================
  Coverage   92.45%   92.45%           
=======================================
  Files         544      544           
  Lines       26783    26783           
  Branches     2554     2554           
=======================================
  Hits        24762    24762           
  Misses       1615     1615           
  Partials      406      406

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

🤖 I have created a release *beep* *boop* --- ## [0.3.5](v0.3.4...v0.3.5) (2026-03-18) ### Features * **api:** auto-wire backend services at startup ([#555](#555)) ([0e52c47](0e52c47)) ### Bug Fixes * **api:** resolve WebSocket 403 rejection ([#549](#549)) ([#556](#556)) ([60453d2](60453d2)) * **cli:** verify SLSA provenance via GitHub attestation API ([#548](#548)) ([91d4f79](91d4f79)), closes [#532](#532) ### Performance * **test:** speed up test suite -- reduce Hypothesis examples and eliminate real sleeps ([#557](#557)) ([d5f3a41](d5f3a41)) ### Refactoring * replace _ErrorResponseSpec NamedTuple with TypedDict ([#554](#554)) ([71cc6e1](71cc6e1)) ### Maintenance * **docker:** suppress pydantic v1 warning on Python 3.14 ([#552](#552)) ([cbe1f05](cbe1f05)), closes [#551](#551) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

Aureliolo and others added 2 commits March 18, 2026 21:49

perf(test): add descriptive assertion message to poll loop

a16db4d

Pre-reviewed by 3 agents, 1 finding addressed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Aureliolo temporarily deployed to ci March 18, 2026 21:14 — with GitHub Actions Inactive

coderabbitai bot approved these changes Mar 18, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

Aureliolo merged commit d5f3a41 into main Mar 18, 2026
25 checks passed

Aureliolo deleted the perf/test-suite-speedup branch March 18, 2026 21:20

Aureliolo mentioned this pull request Mar 18, 2026

chore(main): release 0.3.5 #553

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(test): speed up test suite -- reduce Hypothesis examples and eliminate real sleeps#557

perf(test): speed up test suite -- reduce Hypothesis examples and eliminate real sleeps#557
Aureliolo merged 2 commits intomainfrom
perf/test-suite-speedup

Aureliolo commented Mar 18, 2026

Uh oh!

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 18, 2026

Uh oh!

codecov bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aureliolo commented Mar 18, 2026

Summary

Test plan

Review coverage

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 18, 2026

Dependency Review

Scanned Files

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

codecov bot commented Mar 18, 2026 •

edited

Loading