Remove replace directive for golang.org/x/exp by ptodev · Pull Request #5972 · grafana/agent

ptodev · 2023-12-13T18:28:17Z

Removing a replace directive for golang.org/x/exp form the Agent's go.mod file.

This is necessary because on a separate branch I am upgrading Agent to a new OpenTelemetry version, and it requires a new version of github.com/grafana/loki/pkg/push which needs the latest golang.org/x/exp.

The reason why golang.org/x/exp has been problematic is because the return type of SortFunc changed from bool (in the old version) to int (in the new version). Apparently some packages like to use golang.org/x/exp because it's more performant.

Fixes #5921

ptodev · 2023-12-13T18:31:56Z

The replace directive for prometheus/prometheus can be removed, because v2.48.0 contains the two PRs we need:
prometheus/prometheus#12677
prometheus/prometheus#12729

mattdurham

Looks like we got rid of a lot of cruft! Awesome

ptodev · 2023-12-13T18:44:25Z

@mattdurham unfortunately the Linux build doesn't work because Pyroscope's eBPF module has a replace directive for the exp module 🙀 I'll try to change their go.mod.

ptodev · 2023-12-14T11:55:25Z

I raised a PR for Pyroscope.

ptodev · 2023-12-18T16:42:41Z

I had to upgrade our Prometheus dependency, and this became a much bigger PR than expected.

ptodev · 2023-12-18T17:01:34Z

docs/sources/flow/reference/components/prometheus.scrape.md

-`enable_http2` | `bool` | Whether HTTP2 is supported for requests. | `true` | no
+`honor_labels`                | `bool`     | Indicator whether the scraped metrics should remain unmodified. | `false` | no
+`honor_timestamps`            | `bool`     | Indicator whether the scraped timestamps should be respected. | `true` | no
+`track_timestamps_staleness`  | `bool`     | Indicator whether to track the staleness of the scraped timestamps. | `false` | no


@bboreham Regarding #5921 - for now I intend to default this to false for a few reasons:

IIUC, track_timestamps_staleness = true only makes sense if out of order ingestion is allowed in the back end database.

Apparently there are implications to querying and alerting.

Please let me know if you disagree. I'm open to changing this to default to true prior to merging. If it causes a backwards-incompatible change for the users, we'll have to mention it explicitly in our changelog and write docs with steps on what users need to change to get the behaviour they need (e.g. how to change their alerts or dashboards).

We usually list backwards incompatible changes here:
https://grafana.com/docs/agent/latest/flow/release-notes/

IIUC, track_timestamps_staleness = true only makes sense if out of order ingestion is allowed in the back end database.

Not as far as I know. This is to fix the long-standing irritation where cAdvisor metrics linger on for 5 minutes after the pod has gone.

The blog you cite relates to the last change in staleness handling 5 years ago; I don't think it is relevant here.

@bboreham I think the assumption behind setting track_timestamps_staleness = true is that if the scraper didn't get any samples then that must be because there aren't any. But is this a good assumption to make in the general case? If a sample is exposed via explicit timestamps, and if there is no new value to report, then what sample should be exposed for its series the next time a scrape happens? Is the convention to just report the same value with a new timestamp?

I think it makes sense to default track_timestamps_staleness to true if:

The convention is that the absence of a sample is considered enough evidence to decide that the series is stale.

There is no need to enable out of order ingestion. This is so that we prevent a situation where a timestamp arrives late (e.g. because it takes a long time to generate) but it can't be ingested in the TSDB because there is already a staleness marker with a more recent timestamp.

This explicit timestamp feature seems like a way to "push" metrics.... so I'm not sure what assumptions are ok to make. I suspect that Agents in "push" systems like OTel don't just declare a series as stale if no samples were pushed in a certain time.

Hi @bboreham, would you mind getting back to us on the comment above please?

If no sample is supplied for a timestamp, PromQL (at query time) will use the preceding value up to 5 minutes old.
This creates a long-standing issue with cAdvisor (i.e. Kubelet container metrics).

It's got nothing to do with "push".

There is no expectation that explicit timestamps come out of order. This never worked historically in Prometheus, and there is no reason to suppose people started sending them.

But is this a good assumption to make in the general case?

Yes, it is the standard behaviour when exporters do not supply the timestamp. Which is the vast majority.

By the way, I don't think we should change this default in the middle of an 800-line PR doing other things.
I can make a separate PR to fix it.

Ok, thank you, I'll take the track_timestamps_staleness parameter out of this PR. We can introduce it in a different PR. I don't want to add it now and change its default value later, because prometheus.scrape is a stable component and its defaults aren't meant to change often.

bboreham · 2023-12-18T18:46:52Z

IIUC, track_timestamps_staleness = true only makes sense if out of order ingestion is allowed in the back end database.

Not as far as I know. This is to fix the long-standing irritation where cAdvisor metrics linger on for 5 minutes after the pod has gone.

The blog you cite relates to the last change in staleness handling 5 years ago; I don't think it is relevant here.

wildum

looks good, just a few nits, thanks for taking care of this :)

component/discovery/aws/lightsail.go

component/discovery/ovhcloud/ovhcloud.go

wildum · 2023-12-19T09:25:41Z

docs/sources/flow/reference/components/discovery.ovhcloud.md

+  - `SERVICE`: The OVHcloud service of the targets to retrieve.
+  - `PROMETHEUS_REMOTE_WRITE_URL`: The URL of the Prometheus remote_write-compatible server to send metrics to.
+  - `USERNAME`: The username to use for authentication to the remote_write API.
+  - `PASSWORD`: The password to use for authentication to the remote_write API.


nit: for the example I would suggest to also set refresh_interval and endpoint + I would directly set some real looking data instead of the placeholders (the placeholders are already used in the ##usage part)

Yes, I agree. I also don't like how here we repeat the definitions of the arguments. I did it this way because it's consistent with other discovery components, but I agree we should change this for all discovery components at a later point.

docs/sources/flow/reference/components/prometheus.scrape.md

tpaschalis

LGTM, this was a chunky one! Approving this to unblock you so we can move ahead once CI is green, and the discussion with Bryan has settled.

CHANGELOG.md

go.mod

docs/sources/flow/reference/components/discovery.ovhcloud.md

component/discovery/aws/lightsail.go

mattdurham

lgtm

Fix missing defaults. Add an "unsupported" converter diagnostic for keep_dropped_targets. Add HTTP client options to AWS Lightsail SD.

Add discovery.ovhcloud to "targets" compatible components.

Co-authored-by: William Dumont <william.dumont@grafana.com>

ptodev · 2024-01-04T17:17:59Z

@tpaschalis @mattdurham @wildum I am re-requesting a review, because I rebased the branch and as discussed I removed track_timestamps_staleness from prometheus.scrape. The track_timestamps_staleness argument was added to Prometheus recently. Not having it would mean that Flow doesn't have feature parity with Static mode anymore. However, I think this is ok for two reasons:

track_timestamps_staleness is a new argument which is probably not widely used yet.
There is already a precedent for not supporting some arguments in Flow. For example, no_proxy in the HTTP client config.

If you agree that it's ok to not support this argument for now, please feel free to approve the PR again.

tpaschalis

LGTM! Let's follow up with discussion for the new argument in a new PR.

* Remove replace directive for golang.org/x/exp * Update pyroscope/ebpf from 0.4.0 to 0.4.1 * Fill in missing docs about HTTP client options. Fix missing defaults. Add an "unsupported" converter diagnostic for keep_dropped_targets. Add HTTP client options to AWS Lightsail SD. * Add discovery.ovhcloud * Add a converter for discovery.ovhcloud * Update cloudwatch_exporter docs * Fix converter tests * Mention Prometheus update in the changelog. --------- Co-authored-by: William Dumont <william.dumont@grafana.com>

ptodev marked this pull request as ready for review December 13, 2023 18:28

ptodev assigned mattdurham Dec 13, 2023

mattdurham approved these changes Dec 13, 2023

View reviewed changes

ptodev force-pushed the ptodev/remove-exp-replace branch 2 times, most recently from fc61ae3 to ea5d929 Compare December 18, 2023 16:38

ptodev requested a review from clayton-cornell as a code owner December 18, 2023 16:38

ptodev commented Dec 18, 2023

View reviewed changes

ptodev requested a review from mattdurham December 18, 2023 17:02

ptodev force-pushed the ptodev/remove-exp-replace branch from ed0925d to d94e489 Compare December 19, 2023 09:33

wildum reviewed Dec 19, 2023

View reviewed changes

tpaschalis approved these changes Dec 19, 2023

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

go.mod Show resolved Hide resolved

docs/sources/flow/reference/components/discovery.ovhcloud.md Outdated Show resolved Hide resolved

tpaschalis reviewed Dec 19, 2023

View reviewed changes

component/discovery/aws/lightsail.go Show resolved Hide resolved

ptodev requested a review from wildum December 19, 2023 13:36

wildum approved these changes Dec 19, 2023

View reviewed changes

clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Dec 19, 2023

mattdurham approved these changes Jan 4, 2024

View reviewed changes

ptodev added 9 commits January 4, 2024 18:04

Remove replace directive for golang.org/x/exp

c595b70

Update pyroscope/ebpf from 0.4.0 to 0.4.1

a0f5a33

Fill in missing docs about HTTP client options.

8cdd579

Fix missing defaults. Add an "unsupported" converter diagnostic for keep_dropped_targets. Add HTTP client options to AWS Lightsail SD.

Add discovery.ovhcloud

ce9c1d7

Add a converter for discovery.ovhcloud

3568054

Add track_timestamps_staleness

71072c2

Add changelog entry

14484ee

Update cloudwatch_exporter docs

3c7d318

Add discovery.ovhcloud to all.go

8a9e482

Add discovery.ovhcloud to "targets" compatible components.

ptodev and others added 8 commits January 4, 2024 18:06

Fix converter tests

087b17d

Fix tests

20b9626

Update docs/sources/flow/reference/components/prometheus.scrape.md

5c5bf54

Co-authored-by: William Dumont <william.dumont@grafana.com>

Suggestions from code review

90e81e0

Fix tests

9450686

Mention Prometheus update in the changelog.

81b77e9

Sort table alphabetically by the Name column.

80f7abb

Remove track_timestamps_staleness from Flow

b1a9911

ptodev force-pushed the ptodev/remove-exp-replace branch from 35ad5f9 to b1a9911 Compare January 4, 2024 16:49

ptodev requested review from mattdurham, tpaschalis and wildum January 4, 2024 17:26

tpaschalis approved these changes Jan 5, 2024

View reviewed changes

ptodev merged commit 404423b into main Jan 5, 2024

ptodev deleted the ptodev/remove-exp-replace branch January 5, 2024 10:53

ptodev mentioned this pull request Jan 5, 2024

Add track_timestamps_staleness to prometheus.scrape #5921

Closed

bboreham mentioned this pull request Feb 6, 2024

Add track_timestamps_staleness #6317

Merged

3 tasks

github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024

github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024

Conversation

ptodev commented Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptodev commented Dec 13, 2023

Uh oh!

mattdurham left a comment

Choose a reason for hiding this comment

Uh oh!

ptodev commented Dec 13, 2023

Uh oh!

ptodev commented Dec 14, 2023

Uh oh!

ptodev commented Dec 18, 2023

Uh oh!

ptodev Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

ptodev Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptodev Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

bboreham Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

bboreham Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

ptodev Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

bboreham commented Dec 18, 2023

Uh oh!

wildum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wildum Dec 19, 2023

Choose a reason for hiding this comment

Uh oh!

ptodev Dec 19, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tpaschalis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdurham left a comment

Choose a reason for hiding this comment

Uh oh!

ptodev commented Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpaschalis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ptodev commented Dec 13, 2023 •

edited

Loading

ptodev Dec 18, 2023 •

edited

Loading

tpaschalis left a comment •

edited

Loading

ptodev commented Jan 4, 2024 •

edited

Loading