Skip to content

Per-host statistics report 100% success rate when requests fail before receiving an HTTP response #2202

@afalhambra-hivemq

Description

@afalhambra-hivemq

Summary

The per-host statistics table (introduced in #1929, shipped in v0.23.0 and unchanged through v0.24.2 / current master) under-counts requests for hosts whose links fail at the transport layer (DNS lookup, TCP connect, TLS handshake, "Connection failed"). Those failures never reach HostStats::record_response, so they're invisible to total_requests, the per-bucket error counters, and the derived success_rate. The hosts whose links genuinely failed get reported as 100% successful.

Reproduction

A lychee run on our docs site failed with the following errors:

### Errors in build/site/.../configuration.html
* [ERROR] <https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/> (at 1040:127) | Connection failed. Check network connectivity and firewall settings

### Errors in build/site/.../introduction.html
* [ERROR] <https://kubernetes.io/docs/concepts/extend-kubernetes/operator/> (at 397:17) | Connection failed. Check network connectivity and firewall settings

The same report's Per-host Statistics section reports:

| Host          | Requests | Success Rate | Median Time | Cache Hit Rate |
| kubernetes.io | 74       | 100.0%       | 109ms       | 41.2%          |

74 requests, 100% success, even though two other requests to the same host failed. The two failures are absent from Requests (the real number should be 76) and absent from any error bucket, so the success-rate denominator never sees them.

Root cause

lychee-lib/src/ratelimit/host/host.rs::perform_request (lines 175-197 in v0.24.2):

async fn perform_request(...) -> Result<CacheableResponse> {
    let start_time = Instant::now();
    let response = match self.client.execute(request).await {
        Ok(response) => response,
        Err(e) => {
            return Err(ErrorKind::NetworkRequest(e));  // <-- early return
        }
    };
    self.update_stats(response.status(), start_time.elapsed());
    ...
}

Suggested fix

Add a method on HostStats to record outcomes that didn't produce a status code, and call it from the Err arm of perform_request. For example:

// lychee-lib/src/ratelimit/host/stats.rs
impl HostStats {
    /// Record a request that failed before receiving an HTTP response
    /// (DNS, TCP, TLS, connection reset, etc.)
    pub fn record_network_error(&mut self, request_time: Duration) {
        self.total_requests += 1;
        // Could also introduce a dedicated `network_errors` counter so the
        // failure mode shows up in the per-host serialization.
        self.request_times.push(request_time);
    }
}
// lychee-lib/src/ratelimit/host/host.rs::perform_request
Err(e) => {
    self.stats
        .lock()
        .unwrap()
        .record_network_error(start_time.elapsed());
    return Err(ErrorKind::NetworkRequest(e));
}

With that, our kubernetes.io row would correctly report 74 / 76 ≈ 97.4% instead of 100%, and the per-host table would once again be trustworthy for spotting suspicious hosts. A separate network_errors counter (exposed in the markdown formatter) would make the failure mode explicit. For example:

Before (current):

| Host          | Requests | Success Rate | Median Time | Cache Hit Rate |
| kubernetes.io | 74       | 100.0%       | 109ms       | 41.2%          |

After (with network_errors column):

| Host          | Requests | Success Rate | Network Errors | Median Time | Cache Hit Rate |
| kubernetes.io | 76       | 97.4%        | 2              | 109ms       | 41.2%          |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions