Skip to content

docs(coverage): diagnose http2_client 0pp coverage delta after Round 1+2+3 (#1110)#1111

Merged
kcenon merged 1 commit into
developfrom
test/issue-1110-coverage-delta-diagnosis
May 8, 2026
Merged

docs(coverage): diagnose http2_client 0pp coverage delta after Round 1+2+3 (#1110)#1111
kcenon merged 1 commit into
developfrom
test/issue-1110-coverage-delta-diagnosis

Conversation

@kcenon

@kcenon kcenon commented May 7, 2026

Copy link
Copy Markdown
Owner

What

Adds a one-page diagnostic note at docs/coverage/http2_client_round-4-diagnosis.md that root-causes why 13 substrate-driven TEST_F merged in PR #1108 (Round 2, 6 tests) and PR #1109 (Round 3, 7 tests) produced exactly 0pp coverage delta on src/protocols/http2/http2_client.cpp (still 18.75% line / 9.91% branch, 108/576 LH/LF, 97/979 BRH/BRF — byte-identical pre/post each round).

Change Type

  • Documentation

Affected Components

  • docs/coverage/http2_client_round-4-diagnosis.md (new file, 195 lines)

Why

Issue #1110 explicitly required a diagnosis before any further "add N more TEST_F" round, because Rounds 1–3 each failed in the same way and a fourth round would predictably fail too without a structural fix. The diagnosis is the gate that unblocks #1106 and ultimately epic #953.

Related Issues

Where

File Change
docs/coverage/http2_client_round-4-diagnosis.md Add diagnostic note (new directory docs/coverage/)

No src/ changes. No test changes. No CMake changes.

How

Diagnosis (one line)

The 7 Round-3 dispatcher TEST_F did execute under coverage instrumentation but all 7 FAILED at the SETTINGS-handshake wait_for gate (5.1–5.4 s each, > the 3-second budget), never reaching the dispatcher branches they were designed to cover.

Evidence

Direct grep of run 25464342500 Run tests step stdout:

ctest # TEST_F Outcome
5886 ServerPingFrameDrivesHandlePingAndKeepsConnectionAlive FAILED 5.26 s
5887 ServerPingAckFrameIsAbsorbedSilently FAILED 5.30 s
5888 ServerGoawayFrameFlipsConnectionStateToDisconnected FAILED 5.35 s
5889 ServerWindowUpdateOnConnectionStreamExpandsWindow FAILED 5.38 s
5890 ServerWindowUpdateOnUnknownStreamIsSilentlyIgnored FAILED 5.34 s
5891 ServerRstStreamOnUnknownStreamIsSilentlyIgnored FAILED 5.28 s
5892 ServerUnknownFrameTypeIsHandledWithoutCrashing FAILED 5.12 s

The baseline fixture sentinel (ConnectCompletesSettingsExchangeWithMockPeer) and ~30 other Http2ClientHermeticTransportTest.* cases also FAILED in the same coverage run. This is fixture-wide, not test-specific.

Why ctest reported PASS to the workflow

Run tests is invoked permissively (no --output-on-failure strict gate) and the Coverage Analysis job is marked PASS as long as lcov capture succeeds. Per-test failures are silently absorbed.

Hypotheses ruled out

  1. CMake NETWORK_COVERAGE_BUILD over-scoping — refuted. Macro guards only lines 1257–1328 (two flaky tests). Dispatcher tests at 1361–1580 are outside the guard.
  2. Hit-line overlap with baseline — does not apply (baseline also fails under coverage).
  3. lcov filter drops second production SF — refuted by SF-record cross-check already in test(http2): expand http2_client.cpp error-branch coverage using Phase 2 substrate (round 2) #1106 Round-3 post-merge comment.

Why instrumentation breaks the handshake

-fprofile-arcs -ftest-coverage instruments every basic block; cumulative cost across OpenSSL TLS handshake + ASIO async I/O + mock_h2_server_peer framing pushes end-to-end SETTINGS exchange past the 3-second wait_for budget defined in tests/support/hermetic_transport_fixture.h. Wall-clock evidence: 5.1–5.4 s under coverage vs 100–300 ms in Debug/Release matrix.

Recommended fix (out of scope for this PR)

Round 5 sub-issue should land Option B — a coverage-only wait_for timeout multiplier (5×, so 3 s → 15 s when NETWORK_COVERAGE_BUILD is defined). Pro: dispatcher tests actually complete and contribute coverage. Con: lengthens coverage workflow runtime.

Round 5 acceptance criteria must require a measured >=20pp line / >=10pp branch delta, not merely 'tests pass under coverage', to break the 0pp pattern definitively.

Test Plan for Reviewers

  1. cat docs/coverage/http2_client_round-4-diagnosis.md — verify the diagnosis matches the evidence cited.
  2. Optional: gh run view 25464342500 --repo kcenon/network_system --log | grep -E 'ServerPing|ServerGoaway|ServerWindow|ServerRst|ServerUnknown' | grep FAILED — independently reproduce the per-test failure list.

Breaking Changes

None. Documentation-only PR.

Rollback

git revert is sufficient. No source code or build-system change to reverse.

…1+2+3

Closes #1110

Documents the root cause for why 13 substrate-driven TEST_F across
PR #1108 and PR #1109 produced exactly 0pp coverage delta on
src/protocols/http2/http2_client.cpp.

Diagnosis: the 7 Round-3 dispatcher TEST_F do execute under coverage
instrumentation (CMake macro is correctly scoped to only the two
original flaky tests), but all 7 fail at the SETTINGS-handshake
wait_for(predicate, 3s) gate because gcov instrumentation pushes the
end-to-end TLS+ASIO+frame-mock handshake past the 3-second budget on
the GitHub Actions ubuntu-latest runner. The same coverage-only
timing trap that motivated the NETWORK_COVERAGE_BUILD guard for the
two Round-2 positive-path tests applies to the entire
Http2ClientHermeticTransport fixture.

The Coverage Analysis workflow reports PASS despite the per-test
failures because ctest is invoked without --output-on-failure-strict
and the workflow only fails on lcov capture failure, not per-test
failure.

Hypotheses 1 (over-scoped macro), 2 (overlap with baseline), and 3
(lcov filter drops production SF) are all ruled out with evidence.
The actual root cause is upstream: handshake timing under
instrumentation.

Recommends Round 5 with Option B (coverage-only wait_for timeout
multiplier) as smallest change to break the 0pp pattern. Round 5
acceptance criteria should require a measured >=20pp line / +10pp
branch delta, not merely 'tests pass under coverage'.
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Coverage Report

Metric Value
Line Coverage 69.2%
Branch Coverage 34.5%
Target 80% lines / 70% branches
Coverage Details

Full HTML report is available as a build artifact.

@kcenon kcenon merged commit 8171e00 into develop May 8, 2026
9 checks passed
@kcenon kcenon deleted the test/issue-1110-coverage-delta-diagnosis branch May 8, 2026 01:58
kcenon added a commit that referenced this pull request May 8, 2026
…ixture (#1113)

Scale every hermetic-fixture wait_for budget by NETWORK_COVERAGE_TIMEOUT_MULTIPLIER
so Http2ClientHermeticTransportTest dispatcher cases can complete the SETTINGS
handshake under -fprofile-arcs -ftest-coverage instrumentation. PR #1111
diagnosis confirmed all 7 Round-3 dispatcher TEST_F (and ~30 other fixture-based
tests) failed at wait_for(settings_exchanged(), 3s): 5.1-5.4s observed under
coverage vs 100-300ms in Debug/Release, producing 0pp delta on
src/protocols/http2/http2_client.cpp for three rounds (PR #1108, PR #1109).

Changes:
- tests/support/hermetic_transport_fixture.h: define
  NETWORK_COVERAGE_TIMEOUT_MULTIPLIER (default 1), apply to wait_for default
  timeout argument.
- tests/support/hermetic_transport_fixture.cpp: scale make_loopback_tcp_pair
  accept-handler deadline by the multiplier.
- tests/support/CMakeLists.txt: define
  NETWORK_COVERAGE_TIMEOUT_MULTIPLIER=5 on network_test_support PUBLIC when
  ENABLE_COVERAGE is ON, so all consuming test executables inherit it at the
  point of header inclusion.

No src/ changes. No new TEST_F. Multiplier defaults to 1 outside coverage
builds; Debug/Release matrix observable behaviour is unchanged.

Closes #1112
kcenon added a commit that referenced this pull request May 21, 2026
* test(quic): expand connection.cpp branch coverage

Drives the private dispatch helpers of
src/protocols/quic/connection.cpp through a new test-only friend
class tests::support::quic_connection_test_access (gated by
NETWORK_ENABLE_TEST_INJECTION) so the std::visit arms of
handle_frame, plus process_frames / build_packet /
generate_ack_frame / handle_loss_detection_result /
generate_probe_packets / queue_frames_for_retransmission /
update_state, are reachable under -fprofile-arcs -ftest-coverage
without waiting on the TLS handshake (which exceeds wait_for(3s)
per PR #1111 diagnosis).

Coverage areas added:
- handle_frame std::visit arms: padding, ping, ack (with seeded
  sent_packets to exercise the erase loop), crypto, max_data,
  max_stream_data, max_streams (bidi + uni), connection_close
  (transport + application), handshake_done (client connected +
  server no-op), reset_stream / stop_sending unknown-stream arms,
  new_connection_id, retire_connection_id error propagation,
  data_blocked / stream_data_blocked / streams_blocked /
  path_challenge / path_response / new_token catch-all arm.
- process_frames empty + malformed payload dispatch.
- build_packet no-keys early-return for all encryption levels.
- generate_ack_frame nullopt vs populated arms (largest_received
  zero with ack_needed true exercises the OR-branch).
- get_pn_space switch arms for both const and non-const overloads
  including the initial / zero_rtt aliasing.
- update_state no-op when handshake not complete.
- close / close_application re-entry guards (already closing,
  draining, closed).
- enter_closing / enter_draining no-op arms.
- on_timeout drain-timeout, idle-timeout, no-timeout.
- next_timeout closed (nullopt) + draining (drain_deadline) arms.
- handle_loss_detection_result: none, pto_expired (asserts PING
  frame appended), packet_lost (asserts pending_frames grew),
  ecn::congestion_signal, ecn::ecn_failure.
- generate_probe_packets encryption-level cascade with and without
  pending crypto data.
- queue_frames_for_retransmission per-variant arms: padding / ack
  (not requeued), crypto (per level routed to matching queue),
  stream (skipped), ping / new_token / handshake_done (requeued),
  flow-control frames (requeued), reset_stream / stop_sending
  (requeued), new_connection_id / retire_connection_id (requeued),
  path_challenge / path_response (skipped), connection_close
  (skipped).
- start_handshake error guards (server-side rejection, non-idle
  state rejection) and init_server_handshake client-side rejection.

src/internal/protocols/quic/connection.h change is additive only:
a forward declaration of the probe and a friend declaration, both
gated by #if defined(NETWORK_ENABLE_TEST_INJECTION). Production
builds with BUILD_TESTS=OFF compile byte-identical.

Pattern mirrors PR #1116 (http2_client_test_access) and PR #1135
(quic_socket_test_access).

Closes #1145

* test(quic): extend connection branch coverage with derived keys

Round 7 follow-up to the initial PR commit. Local coverage delta
on the first round measured 62.6% -> 72.9% line / 34.4% -> 40.1%
branch from the run-26200588035 artifact, short of the 80%/70%
gate. The remaining gap is concentrated in build_packet's
post-keys body, which is unreachable until initial keys are
derived. Adding:

- ConnectionBuildPacketInitialKeysTest: derives initial secrets
  via the public crypto().derive_initial_secrets() API (does not
  require init_client/init_server) so build_packet can be driven
  past the keys-result.is_err() early-return. Covers empty payload
  short-circuit, pending CRYPTO assembly, ACK frame assembly,
  CONNECTION_CLOSE assembly, pending_frames emission, generate_packets
  pendng-crypto path, generate_packets close_sent path, and
  generate_packets after closed.

- generate_packets state-guard arms (idle, connected without data).

- next_timeout idle-deadline path with no loss-detector timeout.

- receive_packet draining and closed state arms (bytes/packets
  counters increment but no processing).

- on_timeout no-timeout idle arm.

All additions are header-include-only test cases routed through
the existing quic_connection_test_access probe; no further
src/ change required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant