Conversation
I wished we did that on PRW 2.0 too (Samples -> Floats) 😱 (https://prometheus.io/docs/specs/prw/remote_write_spec_2_0/#protobuf-message).. should we still do this while we can? (: Also related #17036 (comment) |
krajorama
left a comment
There was a problem hiding this comment.
Looks like a good solution.
Please reuse promql/promqltest/testdata/native_histograms.test from #16823.
Regarding TODOs I've commented about the memory usage, in the appropriate function .
Regarding staleness: since you ensure the commit order we should be able to just use the memseries last(Float)HistogramValue on commit as in my PR. The caveat is that the staleness marker is always float, so it would create a new batch for histograms all the time. So I think you need the check in two places when to convert staleness NaN to histogram NaN: 1. when adding staleness to batch (optimization to not start new batch all the time). 2. on commit (if series was in older batch and also failsafe if 1. was wrong).
|
@bwplotka WRT "I wished we did that on PRW 2.0 too (Samples -> Floats)": Note that the naming changes here are just internal variable names. I have so far not changed anything in protocols, JSON serialization etc. That' a whole other story. (We can discuss it, of course, but the changes here are just continuing internal name changes and should not be taken as a strong reason to change protocols.) |
|
@krajorama test cases have been added now (I indeed forgot to add that file). And 🎉 the tests are passing. |
|
WRT staleness: I think I have understood @krajorama's point. But I need more time to work on it. Probably next week… In the meantime, I'll rebase this on top of main and run PromBench so that we see if this goes horribly wrong. |
|
/prombench main |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
|
/prombench cancel |
|
Benchmark cancel is in progress. |
|
I don't see any significant differences in CPU or memory usage patterns. |
d3e07b9 to
3360d93
Compare
|
@krajorama I hope I have implemented the staleness handling as you envisioned – and more importantly in the generally correct way. Tests are passing, at least. 🦆 |
krajorama
left a comment
There was a problem hiding this comment.
Almost there, I've commented on the staleness handling.
To prove those cases you could and probably should add tests where you have two parallel commits.
beorn7
left a comment
There was a problem hiding this comment.
As explained in the comment, I'm not sure if the detailed check for the previous sample is worth it, as it is only an optimization (isn't it?!?), and we are dealing with an exceedingly rare case here.
I have added code comments to explain that.
I'm not sure yet how to create a test that tests concurrent appends in a meaningful way. Let's first find out if my approach of not converting in case of concurrent appends will work.
4e4c7d8 to
484f115
Compare
This exposes the ommission of float histograms from the rollback. Signed-off-by: beorn7 <beorn@grafana.com>
Fixes #15177 The basic idea here is to divide the samples to be commited into (sub) batches whenever we detect that the same series receives a sample of a type different from the previous one. We then commit those batches one after another, and we log them to the WAL one after another, so that we hit both birds with the same stone. The cost of the stone is that we have to track the sample type of each series in a map. Given the amount of things we already track in the appender, I hope that it won't make a dent. Note that this even addresses the NHCB special case in the WAL. This does a few other things that I could not resist to pick up on the go: - It adds more zeropool.Pools and uses the existing ones more consistently. My understanding is that this was merely an oversight. Maybe the additional pool usage will compensate for the increased memory demand of the map. - Create the synthetic zero sample for histograms a bit more carefully. So far, we created a sample that always went into its own chunk. Now we create a sample that is compatible enough with the following sample to go into the same chunk. This changed the test results quite a bit. But IMHO it makes much more sense now. - Continuing past efforts, I changed more namings of `Samples` into `Floats` to keep things consistent and less confusing. (Histogram samples are also samples.) I still avoided changing names in other packages. - I added a few shortcuts `h := a.head`, saving many characters. TODOs: - Address @krajorama's TODOs about commit order and staleness handling. Signed-off-by: beorn7 <beorn@grafana.com>
Regression test for: - #14172 - #15177 Test cases are by @krajorama, taken from commit b48bc9d . Signed-off-by: beorn7 <beorn@grafana.com>
With the fixed commit order, we can now handle the conversion of float staleness markers to histogram staleness markers in a more direct way. Signed-off-by: beorn7 <beorn@grafana.com>
Signed-off-by: beorn7 <beorn@grafana.com>
|
The plan for the agent appender: Let's first get this merged, and then fix it separately (if needed at all). |
|
Summary of our thoughts about the appender in agent mode (discussed with @krajorama):
For now, we have concluded that we don't need to change anything for agent mode. Note for posterity: I could see the theoretical possibility of triggering the problem by mixing float histograms and integer histograms, as some exposition formats allow to mix both. But that's really an intersection of a number of circumstances where every single one is already highly unusual:
|
There is an optimization in prometheus/prometheus#17071 that ensures that the injected zero histogram sample has the same schema as the next sample. It turns out that if there are at least three samples: a normal sample (schema>0), zero sample(schema=0), normal sample (schema>0) then the histogram rate function will find the lowest schema and normalize all to that, meaning that the normal samples will be downscaled to schema 0. In dashboards it will look like as if we lost the resolution. See https://github.com/prometheus/prometheus/blob/9e4d23ddafcdc00021cd8630e78bb819e84ccac9/promql/functions.go#L344 So the optimization is actually needed to not loose resolution on read back. Alternatively we should use per sample CT as metadata instead, but that would be a big rewrite of TSDB and PromQL. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
) * fix(tsdb): backport zero sample schema optimization from pr 17071 There is an optimization in prometheus/prometheus#17071 that ensures that the injected zero histogram sample has the same schema as the next sample. It turns out that if there are at least three samples: a normal sample (schema>0), zero sample(schema=0), normal sample (schema>0) then the histogram rate function will find the lowest schema and normalize all to that, meaning that the normal samples will be downscaled to schema 0. In dashboards it will look like as if we lost the resolution. See https://github.com/prometheus/prometheus/blob/9e4d23ddafcdc00021cd8630e78bb819e84ccac9/promql/functions.go#L344 So the optimization is actually needed to not loose resolution on read back. Alternatively we should use per sample CT as metadata instead, but that would be a big rewrite of TSDB and PromQL. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
) * fix(tsdb): backport zero sample schema optimization from pr 17071 There is an optimization in prometheus/prometheus#17071 that ensures that the injected zero histogram sample has the same schema as the next sample. It turns out that if there are at least three samples: a normal sample (schema>0), zero sample(schema=0), normal sample (schema>0) then the histogram rate function will find the lowest schema and normalize all to that, meaning that the normal samples will be downscaled to schema 0. In dashboards it will look like as if we lost the resolution. See https://github.com/prometheus/prometheus/blob/9e4d23ddafcdc00021cd8630e78bb819e84ccac9/promql/functions.go#L344 So the optimization is actually needed to not loose resolution on read back. Alternatively we should use per sample CT as metadata instead, but that would be a big rewrite of TSDB and PromQL. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
) (#1001) * fix(tsdb): backport zero sample schema optimization from pr 17071 There is an optimization in prometheus/prometheus#17071 that ensures that the injected zero histogram sample has the same schema as the next sample. It turns out that if there are at least three samples: a normal sample (schema>0), zero sample(schema=0), normal sample (schema>0) then the histogram rate function will find the lowest schema and normalize all to that, meaning that the normal samples will be downscaled to schema 0. In dashboards it will look like as if we lost the resolution. See https://github.com/prometheus/prometheus/blob/9e4d23ddafcdc00021cd8630e78bb819e84ccac9/promql/functions.go#L344 So the optimization is actually needed to not loose resolution on read back. Alternatively we should use per sample CT as metadata instead, but that would be a big rewrite of TSDB and PromQL. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The original implementation in #9705 for native histograms included a technical dept #15177 where samples were committed ordered by type not by their append order. This was fixed in #17071, but this docstring was not updated. I've also took the liberty to mention that we do not order by timestamp either, thus it is possible to append out of order samples. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The original implementation in #9705 for native histograms included a technical dept #15177 where samples were committed ordered by type not by their append order. This was fixed in #17071, but this docstring was not updated. I've also took the liberty to mention that we do not order by timestamp either, thus it is possible to append out of order samples. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The original implementation in prometheus#9705 for native histograms included a technical dept prometheus#15177 where samples were committed ordered by type not by their append order. This was fixed in prometheus#17071, but this docstring was not updated. I've also took the liberty to mention that we do not order by timestamp either, thus it is possible to append out of order samples. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: Will Bollock <wbollock@linode.com>
Fixes #15177
The basic idea here is to divide the samples to be commited into (sub) batches whenever we detect that the same series receives a sample of a type different from the previous one. We then commit those batches one after another, and we log them to the WAL one after another, so that we hit both birds with the same stone. The cost of the stone is that we have to track the sample type of each series in a map. Given the amount of things we already track in the appender, I hope that it won't make a dent. Note that this even addresses the NHCB special case in the WAL.
This does a few other things that I could not resist to pick up on the go:
It adds more zeropool.Pools and uses the existing ones more consistently. My understanding is that this was merely an oversight. Maybe the additional pool usage will compensate for the increased memory demand of the map.
Create the synthetic zero sample for histograms a bit more carefully. So far, we created a sample that always went into its own chunk. Now we create a sample that is compatible enough with the following sample to go into the same chunk. This changed the test results quite a bit. But IMHO it makes much more sense now.
Continuing past efforts, I changed more namings of
SamplesintoFloatsto keep things consistent and less confusing. (Histogram samples are also samples.) I still avoided changing names in other packages.I added a few shortcuts
h := a.head, saving many characters.TODOs:
Which issue(s) does the PR fix:
Fixes #15177
Does this PR introduce a user-facing change?