Skip to content

remote write: reduce remote write egress bytes with new proto format#11999

Closed
cstyan wants to merge 7 commits intomainfrom
callum-remote-proto-2
Closed

remote write: reduce remote write egress bytes with new proto format#11999
cstyan wants to merge 7 commits intomainfrom
callum-remote-proto-2

Conversation

@cstyan
Copy link
Member

@cstyan cstyan commented Feb 20, 2023

Design doc here

This PR is the result of conversations with multiple people over the last year about the remote write format and reuse of similar formats in other projects. I can't remember all the names but at the very least, thanks to: @rfratto @bboreham @cyriltovena @csmarchbanks

The short version is that this PR introduces a new _slightly different version of the remote write proto format that includes a table similar to TSDB's symbol table in each request, meaning rather than repeat many label name/value strings throughout the request we store them once and reference them via the table. While working on this I also came across an alternative library for snappy encoding, which compresses the request even more but is still uncompressable by the default Go snappy library.

The commits are sort of in order, no guarantees that each builds successfully.

The following benchmarks are of the building of the end compressed proto requests only, not including any excess memory or CPU required to process and cache with the new implementation prior to building the requests, or the decoding on the other end:

The current format but replacing the snappy library

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                     │ current.txt │            new-comp.txt             │
                     │   sec/op    │   sec/op     vs base                │
BuildWriteRequest-32   2.797m ± 1%   4.080m ± 1%  +45.86% (p=0.000 n=10)

                     │    current.txt    │               new-comp.txt               │
                     │ compressedSize/op │ compressedSize/op  vs base               │
BuildWriteRequest-32         271.1k ± 0%         255.4k ± 0%  -5.79% (p=0.000 n=10)

                     │ current.txt  │            new-comp.txt             │
                     │     B/op     │     B/op      vs base               │
BuildWriteRequest-32   4.117Mi ± 0%   3.797Mi ± 0%  -7.78% (p=0.000 n=10)

                     │ current.txt │          new-comp.txt          │
                     │  allocs/op  │ allocs/op   vs base            │
BuildWriteRequest-32    3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

The end bytes are smaller but building the request takes significantly longer to build.

The new format with the standard snappy library vs the current format and current snappy library

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                     │ current.txt │           new-format.txt            │
                     │   sec/op    │   sec/op     vs base                │
BuildWriteRequest-32   2.797m ± 1%   1.972m ± 1%  -29.51% (p=0.000 n=10)

                     │    current.txt    │              new-format.txt              │
                     │ compressedSize/op │ compressedSize/op  vs base               │
BuildWriteRequest-32         271.1k ± 0%         262.5k ± 0%  -3.16% (p=0.000 n=10)

                     │ current.txt  │            new-format.txt            │
                     │     B/op     │     B/op      vs base                │
BuildWriteRequest-32   4.117Mi ± 0%   1.289Mi ± 0%  -68.69% (p=0.000 n=10)

                     │ current.txt │         new-format.txt         │
                     │  allocs/op  │ allocs/op   vs base            │
BuildWriteRequest-32    3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

The new format with the alternate snappy library vs the current format and current snappy library

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                     │ current.txt │       new-format-new-comp.txt       │
                     │   sec/op    │   sec/op     vs base                │
BuildWriteRequest-32   2.797m ± 1%   2.512m ± 1%  -10.19% (p=0.000 n=10)

                     │    current.txt    │          new-format-new-comp.txt          │
                     │ compressedSize/op │ compressedSize/op  vs base                │
BuildWriteRequest-32         271.1k ± 0%         238.2k ± 0%  -12.12% (p=0.000 n=10)

                     │ current.txt  │       new-format-new-comp.txt        │
                     │     B/op     │     B/op      vs base                │
BuildWriteRequest-32   4.117Mi ± 0%   1.188Mi ± 0%  -71.16% (p=0.000 n=10)

                     │ current.txt │    new-format-new-comp.txt     │
                     │  allocs/op  │ allocs/op   vs base            │
BuildWriteRequest-32    3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

And finally the new format vs the new format with the alternate snappy library

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                     │ new-format.txt │       new-format-new-comp.txt       │
                     │     sec/op     │   sec/op     vs base                │
BuildWriteRequest-32      1.972m ± 1%   2.512m ± 1%  +27.40% (p=0.000 n=10)

                     │  new-format.txt   │         new-format-new-comp.txt          │
                     │ compressedSize/op │ compressedSize/op  vs base               │
BuildWriteRequest-32         262.5k ± 0%         238.2k ± 0%  -9.25% (p=0.000 n=10)

                     │ new-format.txt │       new-format-new-comp.txt       │
                     │      B/op      │     B/op      vs base               │
BuildWriteRequest-32     1.289Mi ± 0%   1.188Mi ± 0%  -7.88% (p=0.000 n=10)

                     │ new-format.txt │    new-format-new-comp.txt     │
                     │   allocs/op    │ allocs/op   vs base            │
BuildWriteRequest-32       3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

These benchmarks are for relatively small synthetic datasets. While "Prometheus on my laptop" is not a production workload, it's a bit more realistic. The following screenshot is a comparison of the rate of compressed bytes sent by Prometheus, one using the new format and the other using the current format. Both Prometheus were running the same configs, scraping themselves. Over the course of the graph all that changes is the remote write config for the # of samples that can be batched into a remote write request:
2023-02-20_15-53

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
write request format

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Member Author

cstyan commented Feb 21, 2023

To extend this, we could allow the option of using alternative encoding/compression types. For example, with zstd we can achieve another 25% reduction in the end compressed bytes size but require double the amount of time to produce those compressed bytes:

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                            │ snappy.txt  │               zstd.txt               │
                            │   sec/op    │   sec/op     vs base                 │
BuildReducedWriteRequest-32   2.459m ± 1%   5.098m ± 2%  +107.29% (p=0.000 n=10)

                            │    snappy.txt     │                 zstd.txt                  │
                            │ compressedSize/op │ compressedSize/op  vs base                │
BuildReducedWriteRequest-32         238.2k ± 0%         174.1k ± 0%  -26.94% (p=0.000 n=10)

                            │  snappy.txt  │              zstd.txt               │
                            │     B/op     │     B/op      vs base               │
BuildReducedWriteRequest-32   1.188Mi ± 0%   1.188Mi ± 0%  -0.00% (p=0.000 n=10)

                            │ snappy.txt │            zstd.txt            │
                            │ allocs/op  │ allocs/op   vs base            │
BuildReducedWriteRequest-32   3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

@gouthamve
Copy link
Member

OTLP has done some experiments in this space and they saw a lot of benefit in compression and speed by trying an arrow based implementation.

See https://github.com/open-telemetry/oteps/blob/main/text/0156-columnar-encoding.md and https://github.com/f5/otel-arrow-adapter/blob/main/docs/benchmarks.md

I am not sure we should adopt it, but it might be something to evaluate.

Further, my naive benchmark using avalanche data saw that OTLP (non-arrow) is more efficient on the compression. See: https://github.com/gouthamve/otlp-prw-compare

@gouthamve
Copy link
Member

The OTLP Arrow implementation has been accepted by the project on July 6th. Might be worth exploring more: https://docs.google.com/document/d/1-23Sf7-xZK3OL5Ogv2pK0NP9YotlSa0PKU9bvvtQwp8/edit#bookmark=id.fmr6zitz9egb

@cstyan
Copy link
Member Author

cstyan commented Oct 23, 2023

Closing in favour of using a feature branch for remote write 1.1. This piece of work will be updated in a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants