Skip to content

fix: count transport-layer failures in per-host statistics#2203

Merged
mre merged 1 commit into
lycheeverse:masterfrom
afalhambra-hivemq:feat/host-stats-record-network-errors
Jun 2, 2026
Merged

fix: count transport-layer failures in per-host statistics#2203
mre merged 1 commit into
lycheeverse:masterfrom
afalhambra-hivemq:feat/host-stats-record-network-errors

Conversation

@afalhambra-hivemq

Copy link
Copy Markdown
Contributor

Fixes #2202.

Problem

The per-host statistics table reports an inaccurate success rate for hosts whose links fail at the transport layer (DNS lookup, TCP connect, TLS handshake, connection reset, etc.). Those failures returned early from Host::perform_request and never reached update_stats, so they were invisible to total_requests and to every error bucket. In a real run the top-of-report ResponseStats flagged two kubernetes.io errors while the per-host table reported kubernetes.io | 74 requests | 100.0% success for the same host.

Full root-cause analysis with code references in #2202.

Change

Add a network_errors counter on HostStats and a corresponding record_network_error method, then call it from the Err arm of Host::perform_request. The failed request now contributes to total_requests (so the success-rate denominator is correct) and to a dedicated network_errors bucket (so the failure mode is visible in the output).

Surfaces touched:

  • HostStats field + recording method + updated error_rate.
  • Serialize impl: adds network_errors to the JSON output.
  • Markdown formatter: new Network Errors column on the per-host table.
  • Detailed text formatter: new line shown when network_errors > 0 (mirrors the existing pattern for client_errors / server_errors).
  • Compact formatter: intentionally unchanged. Its % success column already reflects the corrected math, and adding a column to its rigid one-line layout would be intrusive. Happy to add it if you'd prefer consistency across all three formatters.

Before (current):

| Host          | Requests | Success Rate | Median Time | Cache Hit Rate |
| kubernetes.io | 74       | 100.0%       | 109ms       | 41.2%          |

After:

| Host          | Requests | Success Rate | Network Errors | Median Time | Cache Hit Rate |
| kubernetes.io | 76       | 97.4%        | 2              | 109ms       | 41.2%          |

Tests

  • New unit test test_record_network_error_affects_totals_and_success_rate in lychee-lib/src/ratelimit/host/stats.rs asserts that a network error increments total_requests and network_errors, leaves the HTTP-status buckets untouched, and lowers the success rate accordingly.
  • Existing JSON snapshot in lychee-bin/src/formatters/stats/json.rs updated to include the new field.

Verification

  • cargo test --workspace --all-targets: 534 passed, 0 failed.
  • make lint (workspace clippy): clean.
  • cargo clippy --workspace --no-deps -- -D warnings: clean.

@afalhambra-hivemq afalhambra-hivemq force-pushed the feat/host-stats-record-network-errors branch from 19b09ad to ad02a69 Compare May 20, 2026 15:04
The per-host `HostStats` aggregator only updated counters when an HTTP
response was received. Requests that failed before reaching the server
(DNS, TCP, TLS, connection reset, etc.) returned early from
`perform_request` and never reached `update_stats`, so they were
invisible to `total_requests` and the error buckets. A host whose only
failures were network errors reported 100% success even when the
top-of-report `ResponseStats` recorded errors against it.

Add a `network_errors` counter on `HostStats` and a `record_network_error`
method. Call it from the `Err` arm of `perform_request` so the failure
contributes to `total_requests` and to the success-rate denominator.
Surface the new counter in the markdown table (new column), in the
detailed text output (line shown when non-zero), in the JSON serialization,
and in `error_rate`. The compact output is unchanged; its success-rate
column already reflects the corrected math.

Closes lycheeverse#2202

@thomas-zahner thomas-zahner left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afalhambra-hivemq Awesome 🚀

@mre any additional thoughts?

@afalhambra-hivemq

Copy link
Copy Markdown
Contributor Author

@afalhambra-hivemq Awesome 🚀

@mre any additional thoughts?

Thanks @thomas-zahner!

@mre - do you mind having a look at it? Thanks as well

Comment thread lychee-bin/src/formatters/host_stats/markdown.rs

@Souradip121 Souradip121 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mre, @afalhambra-hivemq what do you think of this??

@mre

mre commented Jun 2, 2026

Copy link
Copy Markdown
Member

Looks good to me. Thanks for the contribution @afalhambra-hivemq. ⭐

@mre mre merged commit be136c3 into lycheeverse:master Jun 2, 2026
7 checks passed
@mre mre mentioned this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-host statistics report 100% success rate when requests fail before receiving an HTTP response

4 participants