Remove replace directive for golang.org/x/exp#5972
Conversation
|
The |
mattdurham
left a comment
There was a problem hiding this comment.
Looks like we got rid of a lot of cruft! Awesome
|
@mattdurham unfortunately the Linux build doesn't work because Pyroscope's eBPF module has a replace directive for the |
|
I raised a PR for Pyroscope. |
fc61ae3 to
ea5d929
Compare
|
I had to upgrade our Prometheus dependency, and this became a much bigger PR than expected. |
| `enable_http2` | `bool` | Whether HTTP2 is supported for requests. | `true` | no | ||
| `honor_labels` | `bool` | Indicator whether the scraped metrics should remain unmodified. | `false` | no | ||
| `honor_timestamps` | `bool` | Indicator whether the scraped timestamps should be respected. | `true` | no | ||
| `track_timestamps_staleness` | `bool` | Indicator whether to track the staleness of the scraped timestamps. | `false` | no |
There was a problem hiding this comment.
@bboreham Regarding #5921 - for now I intend to default this to false for a few reasons:
- IIUC,
track_timestamps_staleness = trueonly makes sense if out of order ingestion is allowed in the back end database. - Apparently there are implications to querying and alerting.
Please let me know if you disagree. I'm open to changing this to default to true prior to merging. If it causes a backwards-incompatible change for the users, we'll have to mention it explicitly in our changelog and write docs with steps on what users need to change to get the behaviour they need (e.g. how to change their alerts or dashboards).
We usually list backwards incompatible changes here:
https://grafana.com/docs/agent/latest/flow/release-notes/
There was a problem hiding this comment.
IIUC, track_timestamps_staleness = true only makes sense if out of order ingestion is allowed in the back end database.
Not as far as I know. This is to fix the long-standing irritation where cAdvisor metrics linger on for 5 minutes after the pod has gone.
The blog you cite relates to the last change in staleness handling 5 years ago; I don't think it is relevant here.
@bboreham I think the assumption behind setting track_timestamps_staleness = true is that if the scraper didn't get any samples then that must be because there aren't any. But is this a good assumption to make in the general case? If a sample is exposed via explicit timestamps, and if there is no new value to report, then what sample should be exposed for its series the next time a scrape happens? Is the convention to just report the same value with a new timestamp?
I think it makes sense to default track_timestamps_staleness to true if:
- The convention is that the absence of a sample is considered enough evidence to decide that the series is stale.
- There is no need to enable out of order ingestion. This is so that we prevent a situation where a timestamp arrives late (e.g. because it takes a long time to generate) but it can't be ingested in the TSDB because there is already a staleness marker with a more recent timestamp.
This explicit timestamp feature seems like a way to "push" metrics.... so I'm not sure what assumptions are ok to make. I suspect that Agents in "push" systems like OTel don't just declare a series as stale if no samples were pushed in a certain time.
There was a problem hiding this comment.
Hi @bboreham, would you mind getting back to us on the comment above please?
There was a problem hiding this comment.
If no sample is supplied for a timestamp, PromQL (at query time) will use the preceding value up to 5 minutes old.
This creates a long-standing issue with cAdvisor (i.e. Kubelet container metrics).
It's got nothing to do with "push".
There is no expectation that explicit timestamps come out of order. This never worked historically in Prometheus, and there is no reason to suppose people started sending them.
But is this a good assumption to make in the general case?
Yes, it is the standard behaviour when exporters do not supply the timestamp. Which is the vast majority.
There was a problem hiding this comment.
By the way, I don't think we should change this default in the middle of an 800-line PR doing other things.
I can make a separate PR to fix it.
There was a problem hiding this comment.
Ok, thank you, I'll take the track_timestamps_staleness parameter out of this PR. We can introduce it in a different PR. I don't want to add it now and change its default value later, because prometheus.scrape is a stable component and its defaults aren't meant to change often.
Not as far as I know. This is to fix the long-standing irritation where cAdvisor metrics linger on for 5 minutes after the pod has gone. The blog you cite relates to the last change in staleness handling 5 years ago; I don't think it is relevant here. |
ed0925d to
d94e489
Compare
wildum
left a comment
There was a problem hiding this comment.
looks good, just a few nits, thanks for taking care of this :)
| - `SERVICE`: The OVHcloud service of the targets to retrieve. | ||
| - `PROMETHEUS_REMOTE_WRITE_URL`: The URL of the Prometheus remote_write-compatible server to send metrics to. | ||
| - `USERNAME`: The username to use for authentication to the remote_write API. | ||
| - `PASSWORD`: The password to use for authentication to the remote_write API. |
There was a problem hiding this comment.
nit: for the example I would suggest to also set refresh_interval and endpoint + I would directly set some real looking data instead of the placeholders (the placeholders are already used in the ##usage part)
There was a problem hiding this comment.
Yes, I agree. I also don't like how here we repeat the definitions of the arguments. I did it this way because it's consistent with other discovery components, but I agree we should change this for all discovery components at a later point.
Fix missing defaults. Add an "unsupported" converter diagnostic for keep_dropped_targets. Add HTTP client options to AWS Lightsail SD.
Add discovery.ovhcloud to "targets" compatible components.
Co-authored-by: William Dumont <william.dumont@grafana.com>
35ad5f9 to
b1a9911
Compare
|
@tpaschalis @mattdurham @wildum I am re-requesting a review, because I rebased the branch and as discussed I removed
If you agree that it's ok to not support this argument for now, please feel free to approve the PR again. |
tpaschalis
left a comment
There was a problem hiding this comment.
LGTM! Let's follow up with discussion for the new argument in a new PR.
* Remove replace directive for golang.org/x/exp * Update pyroscope/ebpf from 0.4.0 to 0.4.1 * Fill in missing docs about HTTP client options. Fix missing defaults. Add an "unsupported" converter diagnostic for keep_dropped_targets. Add HTTP client options to AWS Lightsail SD. * Add discovery.ovhcloud * Add a converter for discovery.ovhcloud * Update cloudwatch_exporter docs * Fix converter tests * Mention Prometheus update in the changelog. --------- Co-authored-by: William Dumont <william.dumont@grafana.com>
* Remove replace directive for golang.org/x/exp * Update pyroscope/ebpf from 0.4.0 to 0.4.1 * Fill in missing docs about HTTP client options. Fix missing defaults. Add an "unsupported" converter diagnostic for keep_dropped_targets. Add HTTP client options to AWS Lightsail SD. * Add discovery.ovhcloud * Add a converter for discovery.ovhcloud * Update cloudwatch_exporter docs * Fix converter tests * Mention Prometheus update in the changelog. --------- Co-authored-by: William Dumont <william.dumont@grafana.com>
Removing a replace directive for
golang.org/x/expform the Agent's go.mod file.This is necessary because on a separate branch I am upgrading Agent to a new OpenTelemetry version, and it requires a new version of
github.com/grafana/loki/pkg/pushwhich needs the latestgolang.org/x/exp.The reason why
golang.org/x/exphas been problematic is because the return type ofSortFuncchanged frombool(in the old version) toint(in the new version). Apparently some packages like to usegolang.org/x/expbecause it's more performant.Fixes #5921