Skip to content

refactor(middleware): add middleware to record metrics for request count and duration#7803

Merged
Gnanasundari24 merged 3 commits intomainfrom
request-response-metrics-middleware
Apr 21, 2025
Merged

refactor(middleware): add middleware to record metrics for request count and duration#7803
Gnanasundari24 merged 3 commits intomainfrom
request-response-metrics-middleware

Conversation

@SanchithHegde
Copy link
Member

Type of Change

  • Bugfix
  • New feature
  • Enhancement
  • Refactoring
  • Dependency updates
  • Documentation
  • CI/CD

Description

This PR moves the metrics that record incoming request count, status code and duration to the middleware, so that all requests that the server receives would be accounted for. As part of this change, the following two metrics are recorded as part of the middleware rather than within server_wrap():

  • REQUESTS_RECEIVED: Records the number of requests that the server received. Has request path and HTTP method as attributes.
  • REQUEST_TIME: Records the time it took to respond to the request. Also helps us obtain a count of the number of requests to which the server responded successfully. Has request path, HTTP method and HTTP status code as attributes.

Additionally, this PR removes the REQUEST_STATUS metric, since it becomes redundant with the REQUEST_TIME metric helping us obtain a count of requests that the server responded to, along with tracking the status code. This PR also removes some duplicate code from a couple of places.

Motivation and Context

This change is being as part of a bigger change to improve the metrics emitted by the application. This change would help account for all the requests that the server received, record the complete time taken by the server when responding to requests, and possibly help with monitoring the server better.

How did you test it?

Locally, by running the OpenTelemetry Collector, Prometheus and Grafana with our Docker Compose setup.

  1. Using REQUESTS_RECEIVED to count the number of requests received by the server:

    Screenshot of graph displaying number of requests received

    The graph can be obtained by running a query like:

    sum by(method, path) (increase(REQUESTS_RECEIVED_total[$__rate_interval]))
    

    Notice that the metric has method and path as attributes.

  2. Using REQUEST_TIME to count the number of requests that the server responded to:

    Screenshot of graph displaying number of requests the server responded to

    The graph can be obtained by running a query like:

    sum by(method, path, status_code) (increase(REQUEST_TIME_count[$__rate_interval]))
    

    Notice that the metric has method, path and status_code as attributes. We are able to use REQUEST_TIME to count requests (even though it is a histogram metric) because REQUEST_TIME_count is automatically generated which functions like a counter.

    Ideally, the number recorded by REQUESTS_RECEIVED and REQUEST_TIME should be the same but in case of application restarts mid-request, the number recorded by REQUEST_TIME may be slightly lesser.

  3. Using REQUEST_TIME to record the request duration:

    The 99th, 90th and 50th percentile values can be plotted:

    Screenshot of percentiles for request duration

    The graph can be obtained by running a query like:

    histogram_quantile(0.99, sum by(le, method, path, status_code) (rate(REQUEST_TIME_bucket{path=~"/payments.*"}[$__rate_interval])))
    

    This would provide us the 99th percentile values. The 90th and 50th percentile values can be obtained by replacing 0.99 in the above query with 0.9 and 0.5, respectively.

    The same data can be plotted as a heatmap:

    Screenshot of heatmap of request duration

    The heatmap can be obtained by running a query like:

    sum by(le) (rate(REQUEST_TIME_bucket{path=~"/payments.*"}[$__rate_interval]))
    

Checklist

  • I formatted the code cargo +nightly fmt --all
  • I addressed lints thrown by cargo clippy
  • I reviewed the submitted code
  • I added unit tests for my changes where possible

@SanchithHegde SanchithHegde added A-framework Area: Framework S-waiting-on-review Status: This PR has been implemented and needs to be reviewed C-refactor Category: Refactor labels Apr 12, 2025
@SanchithHegde SanchithHegde added this to the April 2025 Release milestone Apr 12, 2025
@SanchithHegde SanchithHegde self-assigned this Apr 12, 2025
@SanchithHegde SanchithHegde requested review from a team as code owners April 12, 2025 12:04
@semanticdiff-com
Copy link

semanticdiff-com bot commented Apr 12, 2025

Review changes with  SemanticDiff

Changed Files
File Status
  crates/router/src/compatibility/wrap.rs  85% smaller
  crates/router/src/services/api.rs  68% smaller
  crates/router/src/lib.rs  16% smaller
  crates/router/src/middleware.rs  3% smaller
  crates/analytics/src/metrics/request.rs  0% smaller
  crates/router/src/routes/metrics.rs  0% smaller
  crates/router/src/routes/metrics/request.rs  0% smaller
  crates/router/src/routes/metrics/utils.rs  0% smaller

Comment on lines -26 to -35
#[inline]
pub async fn time_future<F, R>(future: F) -> (R, time::Duration)
where
F: futures::Future<Output = R>,
{
let start = time::Instant::now();
let result = future.await;
let time_spent = start.elapsed();
(result, time_spent)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this in favor of the same function defined in the common_utils crate:

/// Record the time taken by the future to execute
#[inline]
pub async fn time_future<F, R>(future: F) -> (R, time::Duration)
where
F: futures::Future<Output = R>,
{
let start = time::Instant::now();
let result = future.await;
let time_spent = start.elapsed();
(result, time_spent)
}

Comment on lines -3 to -12
#[inline]
pub async fn time_future<F, R>(future: F) -> (R, time::Duration)
where
F: futures::Future<Output = R>,
{
let start = time::Instant::now();
let result = future.await;
let time_spent = start.elapsed();
(result, time_spent)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this in favor of the same function defined in the common_utils crate:

/// Record the time taken by the future to execute
#[inline]
pub async fn time_future<F, R>(future: F) -> (R, time::Duration)
where
F: futures::Future<Output = R>,
{
let start = time::Instant::now();
let result = future.await;
let time_spent = start.elapsed();
(result, time_spent)
}

Comment on lines -4 to -34
pub async fn record_request_time_metric<F, R>(
future: F,
flow: &impl router_env::types::FlowMetric,
) -> R
where
F: futures::Future<Output = R>,
{
let key = "request_type";
super::REQUESTS_RECEIVED.add(1, router_env::metric_attributes!((key, flow.to_string())));
let (result, time) = metric_utils::time_future(future).await;
super::REQUEST_TIME.record(
time.as_secs_f64(),
router_env::metric_attributes!((key, flow.to_string())),
);
result
}

pub fn status_code_metrics(
status_code: String,
flow: String,
merchant_id: common_utils::id_type::MerchantId,
) {
super::REQUEST_STATUS.add(
1,
router_env::metric_attributes!(
("status_code", status_code),
("flow", flow),
("merchant_id", merchant_id.clone()),
),
)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these two since these were only being called within server_wrap().

Copy link
Contributor

@su-shivanshmathur su-shivanshmathur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@SanchithHegde SanchithHegde removed the S-waiting-on-review Status: This PR has been implemented and needs to be reviewed label Apr 17, 2025
@Gnanasundari24 Gnanasundari24 added this pull request to the merge queue Apr 21, 2025
Merged via the queue into main with commit 54a1719 Apr 21, 2025
16 of 20 checks passed
@Gnanasundari24 Gnanasundari24 deleted the request-response-metrics-middleware branch April 21, 2025 08:19
pixincreate added a commit that referenced this pull request Apr 21, 2025
…acilitapay-pix-pmt

* 'main' of github.com:juspay/hyperswitch: (21 commits)
  refactor(required_fields): move pm required fields to pm crate (#7539)
  fix(connector): [noon] address `next_action_url` being `null` for cards in 3ds payment (#7832)
  refactor(middleware): add middleware to record metrics for request count and duration (#7803)
  chore(version): 2025.04.18.0
  chore(postman): update Postman collection files
  fix(connector): [globalpay] handle edge case where currency comes as empty upon payment decline (#7812)
  refactor(cypress-v2): change `Authorization` and `payment_methods_enabled` for v2 cypress tests (#7805)
  fix(connector): [Cybersource] send type selection indicator for co-batch cards (#7828)
  feat(payment_method): add logic for setup_future_usage downgrade and add filter based on zero mandate config (#7775)
  refactor(accounts): move dashboard_metadata table to accounts_schema and point v2 to v1 dashboard_metadata (#7793)
  chore(analytics): opensearch client creation based on config (#7810)
  ci(postman): update assertion error message for nmi collection (#7765)
  feat: add primary key not null query to generic filter function (#7785)
  chore(version): 2025.04.17.0
  chore: change payment method files ownership to `hyperswitch-payment-methods` (#7808)
  feat(vsaas): modify api key auth to support vsaas cases (#7593)
  ci(cypress): verify mandate id to be `null` if payment id not `succeeded` (#7749)
  feat(connector): [chargebee] consumes required fields to support transaction monitoring (#7774)
  ci(configs): remove vault private key from configs (#7825)
  chore(version): 2025.04.16.0
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-framework Area: Framework C-refactor Category: Refactor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants