remote write: reduce remote write egress bytes with new proto format by cstyan · Pull Request #11999 · prometheus/prometheus

cstyan · 2023-02-20T23:55:55Z

Design doc here

This PR is the result of conversations with multiple people over the last year about the remote write format and reuse of similar formats in other projects. I can't remember all the names but at the very least, thanks to: @rfratto @bboreham @cyriltovena @csmarchbanks

The short version is that this PR introduces a new _slightly different version of the remote write proto format that includes a table similar to TSDB's symbol table in each request, meaning rather than repeat many label name/value strings throughout the request we store them once and reference them via the table. While working on this I also came across an alternative library for snappy encoding, which compresses the request even more but is still uncompressable by the default Go snappy library.

The commits are sort of in order, no guarantees that each builds successfully.

The following benchmarks are of the building of the end compressed proto requests only, not including any excess memory or CPU required to process and cache with the new implementation prior to building the requests, or the decoding on the other end:

The current format but replacing the snappy library

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                     │ current.txt │            new-comp.txt             │
                     │   sec/op    │   sec/op     vs base                │
BuildWriteRequest-32   2.797m ± 1%   4.080m ± 1%  +45.86% (p=0.000 n=10)

                     │    current.txt    │               new-comp.txt               │
                     │ compressedSize/op │ compressedSize/op  vs base               │
BuildWriteRequest-32         271.1k ± 0%         255.4k ± 0%  -5.79% (p=0.000 n=10)

                     │ current.txt  │            new-comp.txt             │
                     │     B/op     │     B/op      vs base               │
BuildWriteRequest-32   4.117Mi ± 0%   3.797Mi ± 0%  -7.78% (p=0.000 n=10)

                     │ current.txt │          new-comp.txt          │
                     │  allocs/op  │ allocs/op   vs base            │
BuildWriteRequest-32    3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

The end bytes are smaller but building the request takes significantly longer to build.

The new format with the standard snappy library vs the current format and current snappy library

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                     │ current.txt │           new-format.txt            │
                     │   sec/op    │   sec/op     vs base                │
BuildWriteRequest-32   2.797m ± 1%   1.972m ± 1%  -29.51% (p=0.000 n=10)

                     │    current.txt    │              new-format.txt              │
                     │ compressedSize/op │ compressedSize/op  vs base               │
BuildWriteRequest-32         271.1k ± 0%         262.5k ± 0%  -3.16% (p=0.000 n=10)

                     │ current.txt  │            new-format.txt            │
                     │     B/op     │     B/op      vs base                │
BuildWriteRequest-32   4.117Mi ± 0%   1.289Mi ± 0%  -68.69% (p=0.000 n=10)

                     │ current.txt │         new-format.txt         │
                     │  allocs/op  │ allocs/op   vs base            │
BuildWriteRequest-32    3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

The new format with the alternate snappy library vs the current format and current snappy library

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                     │ current.txt │       new-format-new-comp.txt       │
                     │   sec/op    │   sec/op     vs base                │
BuildWriteRequest-32   2.797m ± 1%   2.512m ± 1%  -10.19% (p=0.000 n=10)

                     │    current.txt    │          new-format-new-comp.txt          │
                     │ compressedSize/op │ compressedSize/op  vs base                │
BuildWriteRequest-32         271.1k ± 0%         238.2k ± 0%  -12.12% (p=0.000 n=10)

                     │ current.txt  │       new-format-new-comp.txt        │
                     │     B/op     │     B/op      vs base                │
BuildWriteRequest-32   4.117Mi ± 0%   1.188Mi ± 0%  -71.16% (p=0.000 n=10)

                     │ current.txt │    new-format-new-comp.txt     │
                     │  allocs/op  │ allocs/op   vs base            │
BuildWriteRequest-32    3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

And finally the new format vs the new format with the alternate snappy library

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                     │ new-format.txt │       new-format-new-comp.txt       │
                     │     sec/op     │   sec/op     vs base                │
BuildWriteRequest-32      1.972m ± 1%   2.512m ± 1%  +27.40% (p=0.000 n=10)

                     │  new-format.txt   │         new-format-new-comp.txt          │
                     │ compressedSize/op │ compressedSize/op  vs base               │
BuildWriteRequest-32         262.5k ± 0%         238.2k ± 0%  -9.25% (p=0.000 n=10)

                     │ new-format.txt │       new-format-new-comp.txt       │
                     │      B/op      │     B/op      vs base               │
BuildWriteRequest-32     1.289Mi ± 0%   1.188Mi ± 0%  -7.88% (p=0.000 n=10)

                     │ new-format.txt │    new-format-new-comp.txt     │
                     │   allocs/op    │ allocs/op   vs base            │
BuildWriteRequest-32       3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

These benchmarks are for relatively small synthetic datasets. While "Prometheus on my laptop" is not a production workload, it's a bit more realistic. The following screenshot is a comparison of the rate of compressed bytes sent by Prometheus, one using the new format and the other using the current format. Both Prometheus were running the same configs, scraping themselves. Over the course of the graph all that changes is the remote write config for the # of samples that can be batched into a remote write request:

Signed-off-by: Callum Styan <callumstyan@gmail.com>

write request format Signed-off-by: Callum Styan <callumstyan@gmail.com>

Signed-off-by: Callum Styan <callumstyan@gmail.com>

cstyan · 2023-02-21T17:01:56Z

To extend this, we could allow the option of using alternative encoding/compression types. For example, with zstd we can achieve another 25% reduction in the end compressed bytes size but require double the amount of time to produce those compressed bytes:

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: AMD Ryzen 9 5950X 16-Core Processor
                            │ snappy.txt  │               zstd.txt               │
                            │   sec/op    │   sec/op     vs base                 │
BuildReducedWriteRequest-32   2.459m ± 1%   5.098m ± 2%  +107.29% (p=0.000 n=10)

                            │    snappy.txt     │                 zstd.txt                  │
                            │ compressedSize/op │ compressedSize/op  vs base                │
BuildReducedWriteRequest-32         238.2k ± 0%         174.1k ± 0%  -26.94% (p=0.000 n=10)

                            │  snappy.txt  │              zstd.txt               │
                            │     B/op     │     B/op      vs base               │
BuildReducedWriteRequest-32   1.188Mi ± 0%   1.188Mi ± 0%  -0.00% (p=0.000 n=10)

                            │ snappy.txt │            zstd.txt            │
                            │ allocs/op  │ allocs/op   vs base            │
BuildReducedWriteRequest-32   3.000 ± 0%   3.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

gouthamve · 2023-07-26T13:12:57Z

OTLP has done some experiments in this space and they saw a lot of benefit in compression and speed by trying an arrow based implementation.

See https://github.com/open-telemetry/oteps/blob/main/text/0156-columnar-encoding.md and https://github.com/f5/otel-arrow-adapter/blob/main/docs/benchmarks.md

I am not sure we should adopt it, but it might be something to evaluate.

Further, my naive benchmark using avalanche data saw that OTLP (non-arrow) is more efficient on the compression. See: https://github.com/gouthamve/otlp-prw-compare

gouthamve · 2023-07-26T14:14:30Z

The OTLP Arrow implementation has been accepted by the project on July 6th. Might be worth exploring more: https://docs.google.com/document/d/1-23Sf7-xZK3OL5Ogv2pK0NP9YotlSa0PKU9bvvtQwp8/edit#bookmark=id.fmr6zitz9egb

cstyan · 2023-10-23T22:50:04Z

Closing in favour of using a feature branch for remote write 1.1. This piece of work will be updated in a new PR.

cstyan added 7 commits February 20, 2023 12:29

replace snappy encoding library

3da45b9

Signed-off-by: Callum Styan <callumstyan@gmail.com>

add new proto types

54c72b8

Signed-off-by: Callum Styan <callumstyan@gmail.com>

add decode function for new write request proto

d1d4552

Signed-off-by: Callum Styan <callumstyan@gmail.com>

add lookup table struct that is used to build the symbol table in new

80ddf09

write request format Signed-off-by: Callum Styan <callumstyan@gmail.com>

Implement code paths for new proto format

30bf3b9

Signed-off-by: Callum Styan <callumstyan@gmail.com>

update example server to include handler for new format

8673d35

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Add new test client

16924ee

Signed-off-by: Callum Styan <callumstyan@gmail.com>

bboreham mentioned this pull request Jul 26, 2023

Labels: reduce memory by de-duplicating strings in multiple SymbolTables #12304

Merged

cstyan closed this Oct 23, 2023

npazosmendez mentioned this pull request Oct 30, 2023

remote write 2.0: new proto format with string interning #13052

Merged

cstyan mentioned this pull request Nov 8, 2023

[meta] Remote write 2.0 #13105

Closed

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remote write: reduce remote write egress bytes with new proto format#11999

remote write: reduce remote write egress bytes with new proto format#11999
cstyan wants to merge 7 commits intomainfrom
callum-remote-proto-2

cstyan commented Feb 20, 2023 •

edited

Loading

Uh oh!

cstyan commented Feb 21, 2023

Uh oh!

gouthamve commented Jul 26, 2023

Uh oh!

gouthamve commented Jul 26, 2023

Uh oh!

cstyan commented Oct 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cstyan commented Feb 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cstyan commented Feb 21, 2023

Uh oh!

gouthamve commented Jul 26, 2023

Uh oh!

gouthamve commented Jul 26, 2023

Uh oh!

cstyan commented Oct 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cstyan commented Feb 20, 2023 •

edited

Loading