Skip to content

ci(coverage): enable full-ecosystem coverage build#987

Merged
kcenon merged 2 commits into
developfrom
ci/issue-953-coverage-full-build
Apr 17, 2026
Merged

ci(coverage): enable full-ecosystem coverage build#987
kcenon merged 2 commits into
developfrom
ci/issue-953-coverage-full-build

Conversation

@kcenon

@kcenon kcenon commented Apr 16, 2026

Copy link
Copy Markdown
Owner

Summary

Addresses Step 1 of epic #953 — "Fix the measurement (prerequisite)".

The coverage workflow currently disables BUILD_WITH_LOGGER_SYSTEM, BUILD_WITH_THREAD_SYSTEM, and BUILD_WITH_CONTAINER_SYSTEM, so large sections of protocol code that depend on those integrations are compiled out. The resulting 62.4% line coverage recorded on the epic is a structural lower bound, not a real measurement.

What changed

  • Checkout thread_system, logger_system, and container_system alongside common_system.
  • Replace the single Build common_system dependency step with a consolidated Build ecosystem dependencies step that builds all four in tier order. The pattern mirrors ci.yml's sanitizer job (L269-410), minus sanitizer-specific compiler flags.
  • Flip BUILD_WITH_LOGGER_SYSTEM, BUILD_WITH_THREAD_SYSTEM, BUILD_WITH_CONTAINER_SYSTEM from OFF to ON in the coverage CMake configuration.
  • Extend push and pull_request triggers to cover the develop branch so post-merge coverage runs on the integration branch without waiting for a release cut.

Deviation from epic body

Epic #953 asks for BUILD_MESSAGING_BRIDGE=ON as well. This PR keeps it OFF because enabling it compiles src/integration/messaging_bridge.cpp, which depends on messaging_system headers that no CI workflow in this repo currently checks out. Every other workflow (ci.yml, sanitizers.yml, performance-regression.yml) consistently keeps this flag OFF for the same reason. Enabling it would require first adding messaging_system checkout and install to every dependent workflow — a separate scope.

I'll add a follow-up comment to #953 documenting this.

Test plan

  • CI runs successfully on this PR against develop (coverage workflow will trigger because of the new trigger addition).
  • Resulting line/branch coverage percentages are recorded in the workflow step summary.
  • After merge, post a new-baseline comment on Expand unit test coverage from 40% to 80% #953 citing the measured line/branch coverage on develop.

Out of scope

  • Expanding tests to actually reach 80% — that's downstream narrow follow-ups per epic's Step 2.
  • Enabling BUILD_MESSAGING_BRIDGE — separate dependency work as noted above.

Relates to #953

The coverage workflow previously disabled BUILD_WITH_LOGGER_SYSTEM,
BUILD_WITH_THREAD_SYSTEM and BUILD_WITH_CONTAINER_SYSTEM, which
compiled out large sections of code that depend on these integrations.
The resulting coverage numbers were a structural lower bound rather
than a real measurement.

Mirror the ecosystem build pattern already used by ci.yml (sanitizer
job) to check out and build common_system, thread_system,
logger_system and container_system in tier order, then flip the three
BUILD_WITH_* flags to ON.

BUILD_MESSAGING_BRIDGE is kept OFF because it requires messaging_system
headers that are not checked out in CI, matching every other workflow
in this repo.

Also extend push/pull_request triggers to cover the develop branch so
post-merge coverage measurements happen on the integration branch.

Relates to #953 (Step 1: fix the measurement).
@kcenon

kcenon commented Apr 17, 2026

Copy link
Copy Markdown
Owner Author

CI Failure Analysis — Coverage Analysis hang

Symptom

Coverage Analysis check has been pending for ~1 hour with the Run tests step never completing. All other 8 checks pass.

Root cause

.github/workflows/coverage.yml has no timeout-minutes at any level (job or step), and the test step invokes:

ctest -C Debug --output-on-failure --verbose || true

with no per-test timeout. When a network test hangs on socket I/O (accept loop, connect wait), the job runs until GitHub's default 6-hour runner cap. The || true only masks the exit code — it cannot rescue a process that never returns.

Comparable jobs in this repo already set timeouts: ci.yml build jobs use 60 min, sanitizers.yml and fuzzing.yml use 30 min. Coverage was the only workflow left unbounded.

Fix plan

  1. Add timeout-minutes: 60 at the coverage job level (matches ci.yml build jobs; coverage-instrumented builds + 4 ecosystem checkouts justify the higher cap vs 30 min sanitizer).
  2. Add --timeout 300 to the ctest invocation so individual tests convert a hang into a visible TIMEOUT failure after 5 min instead of blocking the job.
  3. Keep || true — coverage reporting intent is to run regardless of test pass/fail.

Next

Cancel the stuck run, apply the fix on this branch, push, and re-verify.

Coverage Analysis had no timeout at any level, letting a hung network
test in ctest run until the 6-hour runner cap. Added job-level
timeout-minutes: 60 (matches ci.yml build jobs) and ctest --timeout 300
to convert per-test hangs into visible TIMEOUT failures.
@github-actions

Copy link
Copy Markdown
Contributor

Coverage Report

Metric Value
Line Coverage 63.7%
Branch Coverage 30.4%
Target 80% lines / 70% branches
Coverage Details

Full HTML report is available as a build artifact.

@kcenon kcenon merged commit a9d3f48 into develop Apr 17, 2026
8 checks passed
@kcenon kcenon deleted the ci/issue-953-coverage-full-build branch April 17, 2026 01:39
This was referenced Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant