Conversation
a1cd1e6 to
45933cc
Compare
|
/prombench main --bench.version=bench/cross-feature/ct-per-sample |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: Custom benchmark version: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
FunctionalityOk, it seems working for majority of incoming cumulatives. Not for native histograms so this is why there are still ones without CTs: EfficiencyIn terms of perf (for samples), we see some overhead: A lot more allocs/s Slight CPU increase Up to 115ms vs 80ms latency spikes on query_range: Slight increase in RSS: Weird stuff: I see odd notification/live long API call, not present on main... pprof |
|
/prombench cancel |
|
Benchmark cancel is in progress. |
|
No regression in terms of memory on query side. Majority of allocs comes from proto parsing still. We might want to change benchmark scenario to include proto parsing for both... |
|
RW package changes look fine 👍 Re: the allocations issue, you're suggesting that the min size needs to account for the additional integer? |
|
It feels so, yea. I want to repeat the experiment with both versions doing protobuf though for a fair experiment. |
|
Next steps:
|
Fixes #14218 and #14220 Rebased version of #15254 with improvements. This change does the following: - Change appender interface to be CT aware (optional CT) - Add created-timestamp-per-sample feature flag - Add new sample record used only if CT is appended with the sample. - Remote Write awareness of CT. Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: bwplotka <bwplotka@gmail.com> # Conflicts: # cmd/prometheus/main.go # scrape/helpers_test.go # storage/remote/write_handler_test.go
d16b65b to
a736e37
Compare
2b310a0 to
ddda640
Compare
Signed-off-by: bwplotka <bwplotka@gmail.com>
|
Retrying benchmark after optimizations in watcher. Custom scenario have one update too -- proto parsing is now enabled on both versions of Prometheus. /prombench restart main --bench.version=bench/cross-feature/ct-per-sample |
|
/prombench start main --bench.version=bench/cross-feature/ct-per-sample |
|
Incorrect Available Commands:
Advanced Flags for
Examples:
|
|
/prombench main --bench.version=bench/cross-feature/ct-per-sample |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: Custom benchmark version: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
|
Thanks to decoder/watcher fixes I see significant improvement in RSS and allocs/s vs main WITHOUT CTs in WAL 😱 CPU etc looks on par Query engine timing is worse - not sure why: CPU wise it looks like GC is more intensive, querying is actually using more CPU on main 🤔 https://pprof.me/33106dee65aec562860070b28ed2d895 Looks like simply more objects are allocated https://pprof.me/c23cf4fda6e7be3ca34d292a30fc5f13 |
|
/prombench cancel |
|
Benchmark cancel is in progress. |
|
Adding a few PRs first to split this change into smaller parts. |
|
TODO:
The main non-trivial decision is about that WAL format. In this PR I propose CT next to every sample value and timestamp (all is diff-encoded) and it was showing not that big overhead, but we have to convince others. The alternative is to perhaps design something similar to native histograms records, so create a record per different CT. |














Fixes #14218 and #14220
Rebased version of @ridwanmsharif #15254 with improvements.
This change does the following: