Skip to content

Fix failed requests count reporting#7944

Merged
generall merged 3 commits intodevfrom
telemetry-grpc-status
Feb 11, 2026
Merged

Fix failed requests count reporting#7944
generall merged 3 commits intodevfrom
telemetry-grpc-status

Conversation

@generall
Copy link
Member

@generall generall commented Jan 20, 2026

Fix for #7927

This PR does:

  • inproduces grpc response code into API response according to https://grpc.io/docs/guides/status-codes/
  • removes {rest,grpc}_responses_fail_total metric, as it doesn't really make sense considering that we now report explicit status code

warn: this PR contain breaking changes for /metric and /telemetry endpoints.
But I would argue, that this is justified assuming we released it with minor version

@generall generall requested a review from timvisee January 20, 2026 00:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/common/metrics.rs (1)

814-826: Stale docstring references removed fail_total.

The comment on line 815 states "only the total and fail_total counters will be added", but fail_total has been removed from this PR. The docstring should be updated to reflect the current behavior.

📝 Suggested fix
     /// Add metrics for the provided statistics.
-    /// If `add_timings` is `false`, only the total and fail_total counters will be added.
+    /// If `add_timings` is `false`, only the total counter will be added.
     pub fn add(
🤖 Fix all issues with AI agents
In `@src/tonic/tonic_telemetry.rs`:
- Around line 57-76: The code currently reads GRPC_STATUS_HEADER from
response.headers() which misses the gRPC status sent in HTTP/2 trailers; update
the telemetry capture so that you consume or wrap the response body and poll for
trailers (using the response body’s poll_trailers / HttpBody::trailers
mechanism) and extract GRPC_STATUS_HEADER from the trailers before calling
telemetry_data.lock().add_response(method_name, instant, status_code); if you
cannot change response flow, at minimum document the limitation and keep the
existing fallback (DEFAULT_SUCCESS_GRPC_STATUS_CODE /
DEFAULT_FAILURE_GRPC_STATUS_CODE) but move add_response to occur after trailer
inspection so real grpc-status (from trailers) overrides the fallback when
present.
🧹 Nitpick comments (2)
src/common/telemetry_ops/requests_telemetry.rs (2)

80-94: Parameter order differs from ActixWorkerTelemetryCollector::add_response.

The signature is (method, instant, status_code) here, while ActixWorkerTelemetryCollector::add_response (line 110-114) uses (method, status_code, instant). This inconsistency is minor but could cause confusion for future maintainers.

♻️ Consider aligning parameter order for consistency
     pub fn add_response(
         &mut self,
         method: String,
-        instant: std::time::Instant,
         status_code: GrpcStatusCode,
+        instant: std::time::Instant,
     ) {

This would require updating the call site in tonic_telemetry.rs as well.


138-148: Misleading variable names in merge() method.

The variables success_map and is_success are remnants from a previous implementation (likely when tracking success/failure as a boolean). They should be renamed to reflect the current status-code-based structure for clarity.

♻️ Suggested rename for clarity
     pub fn merge(&mut self, other: &GrpcTelemetry) {
-        for (method, success_map) in &other.responses {
+        for (method, status_map) in &other.responses {
             let entry = self.responses.entry(method.clone()).or_default();
-            for (is_success, statistics) in success_map {
-                let status_entry = entry.entry(*is_success).or_default();
+            for (status_code, statistics) in status_map {
+                let status_entry = entry.entry(*status_code).or_default();
                 *status_entry = status_entry.clone() + statistics.clone();
             }
         }
     }

Comment on lines +57 to 76

// For gRPC, HTTP status is usually 200, check grpc-status header instead
// grpc-status: 0 = OK, non-zero = error
let status_code = response
.headers()
.get(GRPC_STATUS_HEADER)
.and_then(|v| v.to_str().ok())
.and_then(|v| v.parse::<i32>().ok())
.unwrap_or_else(|| {
if response.status().is_success() {
DEFAULT_SUCCESS_GRPC_STATUS_CODE
} else {
DEFAULT_FAILURE_GRPC_STATUS_CODE
}
});

telemetry_data
.lock()
.add_response(method_name, instant, status_code);
Ok(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

gRPC specification grpc-status header trailers HTTP/2

💡 Result:

Summary (gRPC over HTTP/2 — grpc-status in trailers)

  • gRPC carries its final RPC status in HTTP/2 trailer headers (grpc-status) and an optional description (grpc-message); these are sent in the trailing HEADERS frame after the response DATA. Trailers-only (trailer HEADERS with END_STREAM) is permitted for immediate errors. [1][2]

  • The spec requires Status to be sent in Trailers even when the status is OK (grpc-status = "0"). End-of-stream is indicated by END_STREAM on the HEADERS frame that carries the trailers. [1][3]

  • Value formats:

    • grpc-status: decimal ASCII integer (no leading zeros).
    • grpc-message: UTF‑8 text percent-encoded per the spec.
    • grpc-status-details-bin: binary details (base64) allowed only when status is not OK. [1][3]
  • Rationale: trailers let the server report a final outcome after streaming or after sending response messages; headers are sent at the start and cannot be changed after body messages begin. Some deployments may be broken (non-200 HTTP status, missing trailers); clients must synthesize status when needed. [1][2]

Sources

  • gRPC PROTOCOL-HTTP2 (gRPC protocol specification / PROTOCOL-HTTP2). [1]
  • gRPC metadata guide (headers vs. trailers). [2]
  • IETF draft / gRPC protocol specification (matches PROTOCOL-HTTP2 details). [3]

🌐 Web query:

tonic rust grpc-status header response trailers

💡 Result:

Short answer: Tonic follows the gRPC over HTTP/2 spec — the canonical grpc-status (and grpc-message / grpc-status-details-bin) is sent in HTTP/2 trailers, not initial response headers. In Tonic you read trailing metadata with Streaming::trailers() (server-side you set trailers via Response/MetadataMap; client-side Tonic synthesizes Status from trailers). If you observe grpc-status in response headers, that means the server sent a Trailers-Only response or an intermediary (or grpc-web/HTTP1 fallback) converted/placed trailers into headers. Check trailers, not initial metadata, for the final gRPC status. [1][2][3]

Sources:

  • gRPC HTTP/2 protocol (grpc-status must be sent in Trailers). [1]
  • Tonic Streaming API — trailers() to fetch trailing metadata. [2]
  • Tonic client/server handling (status-from-headers for Trailers-Only / trailer behavior; intermediaries/grpc-web discussion). [3]

🏁 Script executed:

cat -n src/tonic/tonic_telemetry.rs | head -80

Repository: qdrant/qdrant

Length of output: 3209


🏁 Script executed:

# Check what Response type is imported
rg "use.*Response|struct Response|impl.*Response" src/tonic/tonic_telemetry.rs -B2 -A2

Repository: qdrant/qdrant

Length of output: 39


🌐 Web query:

hyper http::Response trailers HTTP/2

💡 Result:

Short answer: hyper supports response trailers only on HTTP/2. To send them you must use a Body implementation that implements hyper::body::HttpBody::poll_trailers (or use h2::SendStream directly). The built-in Body types (e.g. Body::from) don’t provide trailers; you create a custom streaming Body (or use frameworks/examples that do so) and return trailers from poll_trailers when the body is finished. See hyper’s HttpBody docs and the h2 SendStream API; an example approach is shown for axum/hyper in the “Stream a Body With Trailers” writeup. [1][2][3]

References

  • hyper::body::HttpBody (poll_trailers) — shows trailers support is part of the Body trait. [1]
  • h2 SendStream::send_trailers — how trailers are sent at the HTTP/2 stream layer. [2]
  • Example: streaming response with trailers (axum/hyper blog post). [3]

Sources
[1] hyper::body::HttpBody docs.
[2] h2 SendStream docs (send_trailers).
[3] “Stream a Body With Trailers in axum 0.6” (example using hyper/http2).


🌐 Web query:

rust http crate Response trailers middleware

💡 Result:

Short answer: you can't set HTTP trailers on an http::Response itself — trailers are produced by the response Body (http-body / hyper). To add trailers in middleware you must wrap/replace the response body with a Body implementation that implements poll_trailers (and add a "Trailer" response header listing the trailer names). Trailers are currently practical only over HTTP/2 (hyper notes trailers are mostly used with HTTP/2). Example: axum/hyper guides show a custom StreamBody implementing poll_trailers plus setting the Trailers header. [1][2][3][4]

Key pointers and links:

  • Body-level trailers: hyper/http-body trait has poll_trailers — implement/compose a Body that returns a HeaderMap in poll_trailers. [2][4]
  • Middleware approach: in tower/axum middleware you can map/replace the Response (ServiceBuilder.map_response or a custom tower::Service) and swap the Body for your custom Body that carries trailers; also add the "Trailer" response header. See axum middleware docs. [4]
  • Worked example: “Stream a Body With Trailers” (axum example) — shows custom StreamBody with set_trailers, setting the Trailers header, and curl verification. [1]
  • HTTP semantics: the Trailer response header names trailers; browser support is limited and trailers are primarily useful over HTTP/2. [3]

If you want, I can:

  • produce a minimal middleware example (tower/axum + hyper) that wraps a response body and emits trailers, tailored to your stack and versions; or
  • convert the axum example into a small reusable tower Layer. Which would you prefer?

🏁 Script executed:

rg "type BoxBody|BoxBody\>" src/tonic/tonic_telemetry.rs -B2 -A2

Repository: qdrant/qdrant

Length of output: 165


🌐 Web query:

tonic BoxBody hyper middleware trailers

💡 Result:

Short answer

  • tonic::body::BoxBody is a type‑erased http-body (UnsyncBoxBody). It itself doesn’t give you a direct high‑level “set trailers” API; trailers are part of the underlying http-body::HttpBody trait via poll_trailers, and sending them requires a response body that implements trailers. [1][2][3]

How to approach it (summary)

  1. Use a body type that implements http-body::HttpBody::poll_trailers (e.g. a custom streaming body or the http-body utilities used by axum/tonic) instead of trying to mutate a boxed response after the fact. [3][1]
  2. Ensure you’re using HTTP/2 (trailers over hyper are currently HTTP/2 behavior in practice). [3]
  3. If you need middleware, have the middleware produce or wrap the Response with a body type that supports trailers (or use an Either/boxed wrapper pattern that preserves the inner body type). tower / tower-http middleware works with the http + http-body abstractions and can pass through such bodies. [4][2]

Concrete pointers / reading

  • tonic BoxBody type alias / docs (type-erased UnsyncBoxBody). [1]
  • hyper/http-body: trailers are exposed via HttpBody::poll_trailers; you must implement a Body that returns trailers. [2]
  • Practical example / pattern: “stream a Body with trailers” guides (axum/hyper examples show building custom body types and sending trailers; note HTTP/2 requirement). [3]
  • tower-http / middleware works on the http + http-body abstractions so middleware should return Responses whose Body implements poll_trailers. [4]

If you want, I can:

  • show a minimal Rust example (custom Body impl or StreamBody) that returns trailers and integrate it into a tonic server middleware, or
  • inspect your existing middleware code and show how to adapt it to return a trailer-capable body.

The code checks response.headers() for grpc-status, but the gRPC specification requires this header to be sent in HTTP/2 trailers, not initial headers.

According to the gRPC over HTTP/2 specification, the final grpc-status (and grpc-message) must be sent in the trailing HEADERS frame, which arrives after the response body. The current code using response.headers() will not find the status code in standard gRPC responses.

The fallback to HTTP status is pragmatic, but incomplete—a gRPC request can succeed with HTTP 200 and grpc-status=0, or fail with HTTP 200 and a non-zero grpc-status. In such cases, the telemetry will misclassify the outcome.

At the middleware layer before response body consumption, accessing trailers from an http::Response<BoxBody> would require wrapping or consuming the entire response body (via http-body::HttpBody::poll_trailers). Consider whether the current fallback is acceptable for your telemetry needs, or refactor to extract gRPC status from trailers if accurate status tracking is required.

🤖 Prompt for AI Agents
In `@src/tonic/tonic_telemetry.rs` around lines 57 - 76, The code currently reads
GRPC_STATUS_HEADER from response.headers() which misses the gRPC status sent in
HTTP/2 trailers; update the telemetry capture so that you consume or wrap the
response body and poll for trailers (using the response body’s poll_trailers /
HttpBody::trailers mechanism) and extract GRPC_STATUS_HEADER from the trailers
before calling telemetry_data.lock().add_response(method_name, instant,
status_code); if you cannot change response flow, at minimum document the
limitation and keep the existing fallback (DEFAULT_SUCCESS_GRPC_STATUS_CODE /
DEFAULT_FAILURE_GRPC_STATUS_CODE) but move add_response to occur after trailer
inspection so real grpc-status (from trailers) overrides the fallback when
present.

coderabbitai[bot]

This comment was marked as resolved.

@timvisee timvisee added do not merge Pull requests blocked on an external event release:1.17.0 labels Jan 20, 2026
@generall generall removed the do not merge Pull requests blocked on an external event label Feb 11, 2026
@generall generall merged commit 9735bf7 into dev Feb 11, 2026
15 checks passed
@generall generall deleted the telemetry-grpc-status branch February 11, 2026 17:17
timvisee pushed a commit that referenced this pull request Feb 13, 2026
* add grpc status into grpc meterics similar to rest

* rm *_responses_fail_total metric

* upd openapi defs for grpc
@timvisee timvisee mentioned this pull request Feb 17, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants