Skip to content

fix(prometheus.remote_write): Fix sent_batch_duration_seconds measuring before the request was sent [backport]#5698

Merged
kgeckhart merged 1 commit intorelease/v1.14from
backport/pr-5697-to-v1.14
Mar 2, 2026
Merged

fix(prometheus.remote_write): Fix sent_batch_duration_seconds measuring before the request was sent [backport]#5698
kgeckhart merged 1 commit intorelease/v1.14from
backport/pr-5697-to-v1.14

Conversation

@grafana-alloybot
Copy link
Contributor

Backport of #5697

This PR backports #5697 to release/v1.14.

Original PR Author

@kgeckhart

Description

prometheus_remote_storage_sent_batch_duration_seconds was measuring before the HTTP request was sent rather than after, causing the metric to reflect encoding/serialization time rather than the actual send duration.

Applies the fix from prometheus/prometheus#18214 via a fork replace directive pointing to https://github.com/grafana/prometheus/tree/fix-sent-batch-duration-v0.309.1.

Remove the replace directive when upstream PR #18214 is merged and Prometheus is upgraded.


This backport was created automatically.

…ng before the request was sent (#5697)

`prometheus_remote_storage_sent_batch_duration_seconds` was measuring
before the HTTP request was sent rather than after, causing the metric
to reflect encoding/serialization time rather than the actual send
duration.

Applies the fix from prometheus/prometheus#18214
via a fork replace directive pointing to
https://github.com/grafana/prometheus/tree/fix-sent-batch-duration-v0.309.1.

Remove the replace directive when upstream PR #18214 is merged and
Prometheus is upgraded.

(cherry picked from commit 10cfb6c)
@grafana-alloybot grafana-alloybot bot requested a review from a team as a code owner March 2, 2026 17:50
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

🔍 Dependency Review

github.com/prometheus/prometheus v0.309.1 -> github.com/grafana/prometheus v1.8.2-0.20260302171028-8cf60eef5463 — ✅ Safe
  • What changed

  • Scope and impact

    • The change is localized to Prometheus remote-write internals (storage/remote). It does not modify exported APIs that consumers (such as the OpenTelemetry Collector’s prometheusreceiver) import.
    • No breaking changes to package structure, public types, or function signatures were introduced between upstream v0.309.1 and this forked commit.
    • Behavioral change is limited to the semantics of the sent_batch_duration_seconds metric:
      • Previously included time before the request was sent.
      • Now only measures the duration after the request is sent.
      • If you consume/alert on this metric, expect values to decrease; otherwise, no code changes are required.
  • Evidence

  • Code changes required in this repository

    • None. This is a compatible, forked patch-level change with no public API changes.
    • If you have dashboards/alerts tied to sent_batch_duration_seconds, you may want to recalibrate thresholds due to the corrected measurement scope.
  • Illustrative patch from the fork (conceptual)

    • The fix delays starting the histogram timer until right before the HTTP request is sent:
      - start := time.Now()
      - defer func() { sentBatchDuration.Observe(time.Since(start).Seconds()) }()
      + var start time.Time
      + defer func() {
      +   if !start.IsZero() {
      +     sentBatchDuration.Observe(time.Since(start).Seconds())
      +   }
      + }()
        ...
      + // right before sending the HTTP request
      + start = time.Now()
        resp, err := httpClient.Do(req)
    • No function signatures or imports that downstream users rely on are altered.
  • Recommended validation

    • Run existing tests for components depending on github.com/prometheus/prometheus (e.g., prometheusreceiver).
    • If you monitor sent_batch_duration_seconds, verify expected drop in observed duration due to corrected timing.

Notes

  • This is a temporary forked replacement to pick up a specific bug fix. The comments in go.mod/builder-config.yaml indicate it should be removed once the upstream PR is merged and the project upgrades Prometheus accordingly.

@kgeckhart kgeckhart enabled auto-merge (squash) March 2, 2026 18:34
@kgeckhart kgeckhart merged commit 150aecb into release/v1.14 Mar 2, 2026
42 checks passed
@kgeckhart kgeckhart deleted the backport/pr-5697-to-v1.14 branch March 2, 2026 18:34
blewis12 pushed a commit that referenced this pull request Mar 9, 2026
🤖 I have created a release *beep* *boop*
---


## [1.14.0](v1.13.0...v1.14.0)
(2026-03-06)


### ⚠ BREAKING CHANGES

* **loki.secretfilter:** Some config options are removed entirely:
    - `partial_mask` (replaced with `redact_percent`)
    - `allowlist` (now controlled with custom gitleaks config)
    - `enable_entropy` 
    - `include_generic` (now controlled with custom gitleaks config)
    - `types` (now controlled with custom gitleaks config)
* **otelcol.receiver.prometheus:** `otelcol.receiver.prometheus` no
longer sets start times of OTLP metrics. Grafana Cloud and Mimir do not
currently use OTLP metric start times. If you do want your metrics to
have them, you can use `otelcol.processor.metric_start_time` with
`strategy` set to `true_reset_point` to get the same behaviour.

### Features 🌟

* Add automatic reconnection to database_observability components
([#5444](#5444))
([553f967](553f967))
* Add limited type checking for validate command
([#5076](#5076))
([045fb76](045fb76))
* **database_observability.mysql:** Collect client info for query
samples ([#5552](#5552))
([257a699](257a699))
* **database_observability.postgres:** Add exclude databases/users for
`logs` collector ([#5569](#5569))
([5dddd9b](5dddd9b))
* **database_observability.postgres:** Add logs collector
([#5445](#5445))
([46d79d4](46d79d4))
* **database_observability.postgres:** Allow excluding queries ran by
specific users ([#5544](#5544))
([2d0ca15](2d0ca15))
* Deprecate prometheus.write.queue
([#5509](#5509))
([ee0f227](ee0f227))
* Introduce SeriesRefMappingStore
([#5522](#5522))
([33ee297](33ee297))
* **local.file_match, loki.source.file:** Match multiple files using
doublestar `{...}` expressions
([#5470](#5470))
([284e48f](284e48f))
* **loki.process:** Add debug metrics for CRI stage to track truncation
of lines and partial line flushing
([#5399](#5399))
([a1728f6](a1728f6))
* **mixin:** Add OTel Engine Overview dashboard
([#5573](#5573))
([df52116](df52116))
* **mixin:** Add zipped dashboards as a release artifact
([#5603](#5603))
([4f7fe85](4f7fe85))
* **otel:** Add receivers used in the otel k8s helm chart presets
([#5466](#5466))
([100f6ea](100f6ea))
* **otelcol.receiver.prometheus:** Remove requirement to run Alloy with
`--stability.level=experimental` in order to translate Prometheus native
histograms into OTLP exponential histograms.
([#5308](#5308))
([237e985](237e985))
* **otelcol:** Expose missing tail_sampling drop and bytes_limiting
([6021154](6021154))
* **prometheus.exporter.postgres:** Update to version `0.19.0` and
expose new collectors settings
([#4640](#4640))
([aa01e45](aa01e45))
* **prometheus.exporter.postgres:** Update to version 0.19.1
([#5659](#5659))
([9f4e88f](9f4e88f))
* Update github exporter with github app authentication
([#5377](#5377))
([ca741a6](ca741a6))
* Update grafana cadvisor fork to v0.54.1
([#5447](#5447))
([2a3aba0](2a3aba0))
* Upgrade prometheus to version 0.309.1
([#5479](#5479))
([633944b](633944b))


### Bug Fixes 🐛

* Add /FORCEREGISTRY flag to windows installer
([#5517](#5517))
([6b22d4e](6b22d4e))
* Add missing otelcol alias to make OTel Engine work with OTel Collector
helm chart ([#5473](#5473))
([90478cd](90478cd))
* **controller:** Prevent duplicate loaders from being created
([#5446](#5446))
([31d5eea](31d5eea))
* **database_observability.mysql:** Skip wait events with `NULL`
timer_wait ([#5478](#5478))
([48750e5](48750e5))
* **database_observability.postgres:** Correctly handle table name
casing when parsing postgres queries
([#5440](#5440))
([7cca2b9](7cca2b9))
* **deps:** Update module github.com/go-git/go-git/v5 to v5.16.5
[SECURITY] ([#5485](#5485))
([71a1b8b](71a1b8b))
* Ensure Valid/Clear States in Alloy Engine Extension
([#5551](#5551))
([99ad024](99ad024))
* Expose missing `otelcol.processor.tail_sampling` options
([#5606](#5606))
([6021154](6021154))
* **loki.process:** Registration of stage.metric when used inside
stage.match ([#5460](#5460))
([81caf72](81caf72))
* **loki.source.docker:** Parse timestamp correctly when log line only
contains newline ([#5489](#5489))
([162011d](162011d))
* **loki.source.file:** Close file if we cannot find encoding
([#5528](#5528))
([56bcb26](56bcb26))
* **mixin:** Support OTel exporter batching
([#5618](#5618))
([f2b7cb8](f2b7cb8))
* **prometheus.echo:** Return zero for SeriesRef
([#5622](#5622))
([31a8680](31a8680))
* **prometheus.exporter.cloudwatch:** Respect debug flag
([#5469](#5469))
([44ade00](44ade00))
* **prometheus.receive_http:** Bump prometheus patch for bugfix
([#5505](#5505))
([b7a1d05](b7a1d05))
* **prometheus.remote_write:** Fix sent_batch_duration_seconds measuring
before the request was sent [backport]
([#5698](#5698))
([150aecb](150aecb))
* Use read-write mutex locks to prevent concurrent tagsCache map reads
and writes ([#5534](#5534))
([8efed2e](8efed2e))


### Performance

* **loki.secretfilter:** Change secretfilter implementation to use
Gitleaks ([#5503](#5503))
([08e265c](08e265c))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: grafana-alloybot[bot] <167359181+grafana-alloybot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant