Prometheus metrics violate counter monotonicity convention: _count and _sum reset on every scrape due to ResettingSample

#### **System information**

  Bor client version: v2.7.0 (also affects latest main)

  OS & Version: Linux (Kubernetes)

  Environment: Polygon Mainnet

  Type of node: Sentry

  #### **Overview of the problem**

  All Prometheus histogram metrics using `ResettingSample` have broken `_count` and `_sum` values. They decrease on every scrape instead of monotonically increasing, which violates the Prometheus counter type spec and breaks `rate()`,
  `increase()`, and average latency calculations.

https://prometheus.io/docs/concepts/metric_types/#counter

> A counter is a cumulative metric that represents a single [monotonically increasing counter ](https://en.wikipedia.org/wiki/Monotonic_function) whose value can only increase or be reset to zero on restart.


  **Root cause**: `ResettingSample.Snapshot()` in `metrics/resetting_sample.go` calls `Clear()` which resets count and sum to 0. The Prometheus collector in `metrics/prometheus/collector.go` emits these reset values as `counter` type,
  but they only contain the delta since the last scrape — not cumulative totals.

  **Affected files and metrics**:

  | File | Affected metrics |
  |---|---|
  | `rpc/metrics.go` | All `rpc_duration_*_count`, `rpc_duration_*_sum` |
  | `p2p/tracker/tracker.go` | P2P tracking metrics |
  | `eth/protocols/eth/handler.go` | eth protocol metrics |
  | `eth/protocols/snap/handler.go` | snap sync metrics |
  | `eth/protocols/wit/handler.go` | witness protocol metrics |

  **Expected**: `_count` and `_sum` should be monotonically increasing as required by the Prometheus counter type spec.

  **Actual**: Both reset to interval-only values on every scrape, causing:
  - `rate()` returns incorrect results or no data
  - `increase()` is unreliable
  - Average latency calculation (`rate(_sum) / rate(_count)`) is broken

  #### **Reproduction Steps**

  1. Enable telemetry with `metrics = true` and `prometheus-addr = "0.0.0.0:7071"`
  2. Send RPC requests (e.g. `eth_blockNumber`)
  3. Scrape `/debug/metrics/prometheus` twice with 1 minute interval
  4. Observe `rpc_duration_eth_blockNumber_success_count` and `_sum` values decrease on second scrape

  #### **Logs / Traces / Output / Error Messages**

  Scrape at T1 (100 requests since startup)

  rpc_duration_eth_blockNumber_success_count 100
  rpc_duration_eth_blockNumber_success_sum 5000000

  Scrape at T2 (50 requests since T1)

  rpc_duration_eth_blockNumber_success_count 50     ← decreased
  rpc_duration_eth_blockNumber_success_sum 2500000   ← decreased

  **Code path**:

  1. `rpc/metrics.go:46` — creates histogram with `ResettingSample`
  2. `metrics/resetting_sample.go:22` — `Snapshot()` calls `Clear()`
  3. `metrics/sample.go:165` — `Clear()` resets `count = 0`, `sum = 0`
  4. `metrics/prometheus/collector.go:98-107` — emits snapshot values as Prometheus `counter`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus metrics violate counter monotonicity convention: _count and _sum reset on every scrape due to ResettingSample #2173

System information

Overview of the problem

Reproduction Steps

Logs / Traces / Output / Error Messages

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

File	Affected metrics
`rpc/metrics.go`	All `rpc_duration__count`, `rpc_duration__sum`
`p2p/tracker/tracker.go`	P2P tracking metrics
`eth/protocols/eth/handler.go`	eth protocol metrics
`eth/protocols/snap/handler.go`	snap sync metrics
`eth/protocols/wit/handler.go`	witness protocol metrics

Prometheus metrics violate counter monotonicity convention: _count and _sum reset on every scrape due to ResettingSample #2173

Description

System information

Overview of the problem

Reproduction Steps

Logs / Traces / Output / Error Messages

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions