Skip to content

perf(test): speed up test suite -- reduce Hypothesis examples and eliminate real sleeps#557

Merged
Aureliolo merged 2 commits intomainfrom
perf/test-suite-speedup
Mar 18, 2026
Merged

perf(test): speed up test suite -- reduce Hypothesis examples and eliminate real sleeps#557
Aureliolo merged 2 commits intomainfrom
perf/test-suite-speedup

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Lower Hypothesis CI profile from 200 to 50 max_examples -- property tests find bugs in the first ~20 examples; 200 was overkill for CI feedback loops (only 3 tests in test_resolver.py use the global profile; 74/77 property tests have explicit per-test @settings)
  • Replace 8x asyncio.sleep(0.05) with asyncio.sleep(0) in 4 unit test files -- cooperative yielding preserves concurrency semantics without wasting wall-clock time
  • Replace 1x fixed sleep with bounded condition poll in test_task_engine_coverage.py -- polls _in_flight state up to 200 yields instead of a fixed 50ms delay
  • Update CLAUDE.md to reflect new CI profile (50 examples)

Test plan

  • All 9,432 tests pass (9 skipped -- symlinks/Docker/real LLM)
  • 93.81% coverage (>80% threshold)
  • mypy strict: no issues in 1,123 source files
  • ruff lint + format: clean
  • Concurrency tests verified correct by async-concurrency-reviewer agent (traced full execution paths through production code)
  • Pre-push hooks pass (mypy + full unit suite)

Review coverage

Pre-reviewed by 3 agents (docs-consistency, test-quality, async-concurrency), 1 finding addressed (descriptive assertion message on poll loop).

🤖 Generated with Claude Code

Aureliolo and others added 2 commits March 18, 2026 21:49
…minate real sleeps

Lower CI Hypothesis profile from 200 to 50 examples (property tests find
bugs in the first ~20; 200 was overkill for CI feedback loops). Replace
8x asyncio.sleep(0.05) with asyncio.sleep(0) in unit tests -- cooperative
yielding preserves concurrency semantics without wasting wall-clock time.
Replace 1x fixed sleep with condition poll in test_task_engine_coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-reviewed by 3 agents, 1 finding addressed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fbdc1eaf-c536-45b3-93d0-462d813de5cb

📥 Commits

Reviewing files that changed from the base of the PR and between 0e52c47 and a16db4d.

📒 Files selected for processing (6)
  • CLAUDE.md
  • tests/conftest.py
  • tests/unit/communication/test_bus_memory.py
  • tests/unit/communication/test_dispatcher.py
  • tests/unit/engine/test_parallel.py
  • tests/unit/engine/test_task_engine_coverage.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) — PEP 758 except syntax enforced by ruff for Python 3.14.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Line length: 88 characters (ruff).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).

Files:

  • tests/unit/communication/test_bus_memory.py
  • tests/unit/communication/test_dispatcher.py
  • tests/unit/engine/test_task_engine_coverage.py
  • tests/conftest.py
  • tests/unit/engine/test_parallel.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async testing: asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Tests must use test-provider, test-small-001, etc. instead of real vendor names.
Property-based testing in Python uses Hypothesis (@given + @settings). Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

  • tests/unit/communication/test_bus_memory.py
  • tests/unit/communication/test_dispatcher.py
  • tests/unit/engine/test_task_engine_coverage.py
  • tests/conftest.py
  • tests/unit/engine/test_parallel.py
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to tests/**/*.py : Property-based testing in Python uses Hypothesis (given + settings). Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • CLAUDE.md
  • tests/conftest.py
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Parallelism: pytest-xdist via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to {src,tests,web,cli,site}/**/*.{py,ts,tsx,go,astro} : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Tests must use test-provider, test-small-001, etc. for vendor-agnostic test data.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to src/synthorg/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T20:21:08.353Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T20:21:08.353Z
Learning: Applies to tests/**/*.py : Tests must use test-provider, test-small-001, etc. instead of real vendor names.

Applied to files:

  • CLAUDE.md
🔇 Additional comments (10)
CLAUDE.md (1)

228-228: LGTM!

Documentation correctly updated to reflect the new Hypothesis CI profile setting (50 examples). This aligns with the tests/conftest.py changes.

tests/conftest.py (1)

10-14: LGTM!

Reducing max_examples from 200 to 50 for the CI profile is a reasonable trade-off for faster test runs. As the PR notes, most property tests specify per-test settings, and bugs are typically found within the first ~20 examples. The dev profile remains at 1000 for thorough local testing.

tests/unit/communication/test_dispatcher.py (1)

305-333: LGTM!

Using asyncio.sleep(0) instead of non-zero sleeps correctly yields control to the event loop, preserving the concurrency verification semantics. Both handlers still interleave (start before either ends) because gather schedules them concurrently and sleep(0) provides yield points for task switching.

tests/unit/engine/test_task_engine_coverage.py (1)

53-64: LGTM!

Replacing the fixed sleep with a bounded poll loop is the right approach for deterministic testing. The loop correctly:

  • Yields cooperatively with asyncio.sleep(0)
  • Has a bounded iteration count (200) to prevent hangs
  • Provides a clear assertion message on failure

The pattern aligns with the coding guideline to make timing-sensitive tests deterministic.

tests/unit/communication/test_bus_memory.py (4)

281-289: LGTM!

Using asyncio.sleep(0) correctly yields control, allowing the receiver() task to start and block on bus.receive() before unsubscriber() proceeds. The TaskGroup ensures both tasks are scheduled, and the yield point enables proper interleaving.


304-315: LGTM!

Same valid pattern as above — the zero-duration sleep ensures all three receiver tasks can start and block before the unsubscribe action.


397-406: LGTM!

The publisher yields with asyncio.sleep(0) to let the receiver start and block on bus.receive() before publishing the message. Correct concurrency pattern.


675-683: LGTM!

The stop_after_delay() helper correctly yields to allow the receive call to start before stopping the bus, ensuring the test validates that receive() returns None on shutdown.

tests/unit/engine/test_parallel.py (2)

248-258: LGTM!

The asyncio.sleep(0) is sufficient here because the test verifies semaphore-enforced concurrency limits, not timing-dependent behavior. The yield point still allows task interleaving while the max_concurrency=2 semaphore controls the actual concurrency bound.


651-685: LGTM!

Same pattern as the concurrency limit test — the zero-duration sleep provides a yield point while the semaphore enforces the max_concurrency limit. The test correctly verifies that in_progress from progress callbacks never exceeds the configured limit.


📝 Walkthrough

Summary by CodeRabbit

  • Documentation

    • Updated testing guidance to prohibit skipping flaky tests; recommends deterministic testing through mocking instead.
  • Tests

    • Refactored test timing by removing artificial delays to improve execution speed while maintaining async scheduling semantics.
  • Chores

    • Reduced Hypothesis CI test examples from 200 to 50 for faster continuous integration runs.

Walkthrough

These changes optimize test execution by reducing Hypothesis test case counts in CI, replacing intentional sleep delays with zero-duration sleeps or polling mechanisms, and updating testing documentation regarding flaky test handling.

Changes

Cohort / File(s) Summary
Testing Configuration & Documentation
CLAUDE.md, tests/conftest.py
Updated Hypothesis CI profile to reduce max_examples from 200 to 50; development profile remains at 1000. Added guidance that flaky tests must not be skipped and recommends deterministic testing via mocking.
Test Sleep Optimizations
tests/unit/communication/test_bus_memory.py, tests/unit/communication/test_dispatcher.py, tests/unit/engine/test_parallel.py
Replaced asyncio.sleep calls with zero-duration sleeps (sleep(0)) in concurrency and scheduling tests, accelerating execution while preserving async mechanics and assertion logic.
Test Polling Enhancement
tests/unit/engine/test_task_engine_coverage.py
Replaced fixed asyncio.sleep(0.05) with polling loop that yields up to 200 times until target condition is met, improving test robustness and providing descriptive assertion messaging.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main changes: reducing Hypothesis examples and eliminating real sleeps in the test suite.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing all modifications, rationale, and validation results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/test-suite-speedup
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch perf/test-suite-speedup
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the efficiency of the test suite by optimizing property-based testing and asynchronous test execution. It reduces the number of examples run for Hypothesis in CI, and replaces fixed time delays with cooperative yielding or condition polling in several unit tests. These changes aim to provide faster feedback cycles in CI and enhance the determinism of timing-sensitive tests, without compromising test coverage or reliability.

Highlights

  • Test Performance: Reduced the number of examples for the Hypothesis CI profile from 200 to 50, significantly speeding up property-based tests in continuous integration.
  • Asynchronous Test Efficiency: Replaced fixed asyncio.sleep(0.05) calls with asyncio.sleep(0) in multiple unit tests, allowing cooperative yielding without introducing artificial wall-clock delays and improving test execution speed.
  • Deterministic Timing: Updated test_task_engine_coverage.py to use a bounded condition poll instead of a fixed sleep, enhancing test determinism and reducing potential flakiness in timing-sensitive scenarios.
  • Documentation Update: Reflected the new Hypothesis CI profile (50 examples) in the CLAUDE.md documentation to keep it current with the testing configuration.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request focuses on improving the performance of the test suite. The changes include reducing the number of examples for Hypothesis property-based tests in the CI profile, and replacing fixed-time asyncio.sleep() calls with asyncio.sleep(0) or a more robust polling mechanism. These are excellent changes that will speed up the CI feedback loop and make the tests more reliable. I have one minor suggestion to improve the maintainability of the new polling loop by avoiding a magic number.

)
await asyncio.sleep(0.05)
# Wait for the engine to enter _process_one and hit slow_save
for _ in range(200):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The polling limit 200 is a magic number. It's also hardcoded in the assertion message on line 62. To improve readability and maintainability, consider defining it as a constant at the start of the test method (e.g., MAX_POLL_YIELDS = 200) and using it in both the loop and in an f-string for the assertion message. This will ensure they stay in sync.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.45%. Comparing base (0e52c47) to head (a16db4d).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #557   +/-   ##
=======================================
  Coverage   92.45%   92.45%           
=======================================
  Files         544      544           
  Lines       26783    26783           
  Branches     2554     2554           
=======================================
  Hits        24762    24762           
  Misses       1615     1615           
  Partials      406      406           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Aureliolo Aureliolo merged commit d5f3a41 into main Mar 18, 2026
25 checks passed
@Aureliolo Aureliolo deleted the perf/test-suite-speedup branch March 18, 2026 21:20
Aureliolo added a commit that referenced this pull request Mar 18, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.5](v0.3.4...v0.3.5)
(2026-03-18)


### Features

* **api:** auto-wire backend services at startup
([#555](#555))
([0e52c47](0e52c47))


### Bug Fixes

* **api:** resolve WebSocket 403 rejection
([#549](#549))
([#556](#556))
([60453d2](60453d2))
* **cli:** verify SLSA provenance via GitHub attestation API
([#548](#548))
([91d4f79](91d4f79)),
closes [#532](#532)


### Performance

* **test:** speed up test suite -- reduce Hypothesis examples and
eliminate real sleeps
([#557](#557))
([d5f3a41](d5f3a41))


### Refactoring

* replace _ErrorResponseSpec NamedTuple with TypedDict
([#554](#554))
([71cc6e1](71cc6e1))


### Maintenance

* **docker:** suppress pydantic v1 warning on Python 3.14
([#552](#552))
([cbe1f05](cbe1f05)),
closes [#551](#551)

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant