Skip to content

feat: ProtobufParse.formatOpenMetricsFloat: improve float formatting …#15167

Merged
machine424 merged 1 commit intoprometheus:mainfrom
machine424:impor
Oct 18, 2024
Merged

feat: ProtobufParse.formatOpenMetricsFloat: improve float formatting …#15167
machine424 merged 1 commit intoprometheus:mainfrom
machine424:impor

Conversation

@machine424
Copy link
Member

@machine424 machine424 commented Oct 15, 2024

…by using strconv.AppendFloat instead of fmt.Sprint

                                                     │    v0.txt    │               v1.txt               │
                                                    │    sec/op    │    sec/op     vs base              │
Parse/data=promtestdata.txt/parser=promtext-2          126.7µ ± 11%   129.5µ ±  7%       ~ (p=0.310 n=6)
Parse/data=promtestdata.txt/parser=xpfmt-2             693.5µ ± 24%   744.2µ ± 18%  +7.30% (p=0.041 n=6)
Parse/data=promtestdata.nometa.txt/parser=promtext-2   113.1µ ±  4%   110.3µ ±  5%       ~ (p=0.310 n=6)
Parse/data=promtestdata.nometa.txt/parser=xpfmt-2      508.7µ ±  3%   498.1µ ±  4%       ~ (p=0.240 n=6)
Parse/data=createTestProtoBuf()/parser=promproto-2     37.61µ ± 20%   35.54µ ±  3%  -5.48% (p=0.004 n=6)
Parse/data=omtestdata.txt/parser=omtext-2              30.03µ ±  3%   29.91µ ±  2%       ~ (p=0.937 n=6)
Parse/data=promtestdata.txt/parser=omtext-2            2.413m ± 10%   2.392m ±  4%       ~ (p=0.394 n=6)
geomean                                                202.1µ         201.4µ        -0.34%

                                                    │    v0.txt     │               v1.txt                │
                                                    │      B/s      │      B/s       vs base              │
Parse/data=promtestdata.txt/parser=promtext-2          251.1Mi ± 10%   245.8Mi ±  7%       ~ (p=0.310 n=6)
Parse/data=promtestdata.txt/parser=xpfmt-2             45.89Mi ± 19%   42.77Mi ± 15%  -6.80% (p=0.041 n=6)
Parse/data=promtestdata.nometa.txt/parser=promtext-2   213.1Mi ±  3%   218.5Mi ±  5%       ~ (p=0.310 n=6)
Parse/data=promtestdata.nometa.txt/parser=xpfmt-2      47.36Mi ±  3%   48.38Mi ±  4%       ~ (p=0.240 n=6)
Parse/data=createTestProtoBuf()/parser=promproto-2     92.33Mi ± 17%   97.69Mi ±  3%  +5.80% (p=0.004 n=6)
Parse/data=omtestdata.txt/parser=omtext-2              142.2Mi ±  3%   142.8Mi ±  2%       ~ (p=0.937 n=6)
Parse/data=promtestdata.txt/parser=omtext-2            13.18Mi ± 10%   13.30Mi ±  4%       ~ (p=0.416 n=6)
geomean                                                79.53Mi         79.81Mi        +0.34%

                                                    │    v0.txt    │                v1.txt                │
                                                    │     B/op     │     B/op      vs base                │
Parse/data=promtestdata.txt/parser=promtext-2          57.07Ki ± 0%   57.07Ki ± 0%       ~ (p=1.000 n=6)
Parse/data=promtestdata.txt/parser=xpfmt-2             536.9Ki ± 0%   536.9Ki ± 0%       ~ (p=0.117 n=6)
Parse/data=promtestdata.nometa.txt/parser=promtext-2   57.07Ki ± 0%   57.07Ki ± 0%       ~ (p=1.000 n=6) ¹
Parse/data=promtestdata.nometa.txt/parser=xpfmt-2      381.1Ki ± 0%   381.1Ki ± 0%       ~ (p=1.000 n=6)
Parse/data=createTestProtoBuf()/parser=promproto-2     38.26Ki ± 0%   37.86Ki ± 0%  -1.06% (p=0.002 n=6)
Parse/data=omtestdata.txt/parser=omtext-2              10.55Ki ± 0%   10.55Ki ± 0%       ~ (p=1.000 n=6) ¹
Parse/data=promtestdata.txt/parser=omtext-2            108.4Ki ± 0%   108.4Ki ± 0%       ~ (p=0.922 n=6)
geomean                                                83.85Ki        83.72Ki       -0.15%
¹ all samples are equal

                                                    │   v0.txt    │               v1.txt                │
                                                    │  allocs/op  │  allocs/op   vs base                │
Parse/data=promtestdata.txt/parser=promtext-2           828.0 ± 0%    828.0 ± 0%       ~ (p=1.000 n=6) ¹
Parse/data=promtestdata.txt/parser=xpfmt-2             12.35k ± 0%   12.35k ± 0%       ~ (p=0.182 n=6)
Parse/data=promtestdata.nometa.txt/parser=promtext-2    828.0 ± 0%    828.0 ± 0%       ~ (p=1.000 n=6) ¹
Parse/data=promtestdata.nometa.txt/parser=xpfmt-2      10.01k ± 0%   10.01k ± 0%       ~ (p=1.000 n=6) ¹
Parse/data=createTestProtoBuf()/parser=promproto-2      863.0 ± 0%    810.0 ± 0%  -6.14% (p=0.002 n=6)
Parse/data=omtestdata.txt/parser=omtext-2               211.0 ± 0%    211.0 ± 0%       ~ (p=1.000 n=6) ¹
Parse/data=promtestdata.txt/parser=omtext-2            2.282k ± 0%   2.282k ± 0%       ~ (p=1.000 n=6) ¹
geomean                                                1.663k        1.648k       -0.90%
¹ all samples are equal

For BenchmarkParse use case (use "a lot" of summaries)

v1.txt being the PR.

Still cannot justify the Parse/data=promtestdata.txt/parser=xpfmt-2 693.5µ ± 24% 744.2µ ± 18% +7.30% (p=0.041 n=6) as formatOpenMetricsFloat is only used by promproto


Ran with sync.Pool

                                                   │   v0.txt    │               v1.txt               │
                                                   │   sec/op    │   sec/op     vs base               │
Parse/data=createTestProtoBuf()/parser=promproto-2   45.57µ ± 1%   42.98µ ± 3%  -5.67% (p=0.002 n=10)

                                                   │    v0.txt    │               v1.txt                │
                                                   │     B/s      │     B/s       vs base               │
Parse/data=createTestProtoBuf()/parser=promproto-2   76.20Mi ± 1%   80.79Mi ± 3%  +6.01% (p=0.002 n=10)

                                                   │    v0.txt    │               v1.txt                │
                                                   │     B/op     │     B/op      vs base               │
Parse/data=createTestProtoBuf()/parser=promproto-2   38.26Ki ± 0%   37.81Ki ± 0%  -1.18% (p=0.000 n=10)

                                                   │   v0.txt   │              v1.txt               │
                                                   │ allocs/op  │ allocs/op   vs base               │
Parse/data=createTestProtoBuf()/parser=promproto-2   863.0 ± 0%   809.0 ± 0%  -6.26% (p=0.000 n=10)

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach. I have some nits about doc comments and maybe a useful idea.

return "-Inf"
}
s := fmt.Sprint(f)
s := string(strconv.AppendFloat(b[:0], f, 'g', -1, 64))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that this is really measurably better in performance.

Now that we are acting on a bytes buffer anyway, how about folding the operation down in line 641 into this? I.e. appending the ".0" string is much more efficient if using a mutable bytes buffer rather than string concatenation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, yes it should be beneficial (even though it doesn't change anything in BenchmarkParse case).

@beorn7
Copy link
Member

beorn7 commented Oct 16, 2024

I have quickly dived into the fmt code. It already uses a sync.Pool for it's working buffers, so I would think the returns from this PR are diminishing the more this code path is used, in particular in use cases where many parsers are instantiated (in which case the change here could even be a net negative).

However, with my idea in the comments above (i.e. also append .0 to the buffer and only then do the string conversion), we have a very substantial gain (avoiding a whole string allocation for good) for the cases where a .0 actually has to be appended (which is plausible to happen a lot, e.g. with a histogram with many buckets with integer boundaries

@beorn7
Copy link
Member

beorn7 commented Oct 16, 2024

Real life example (from Prometheus's own metrics):

# HELP prometheus_http_request_duration_seconds Histogram of latencies for HTTP requests.
# TYPE prometheus_http_request_duration_seconds histogram
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.1"} 1423
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.2"} 1423
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.4"} 1423
prometheus_http_request_duration_seconds_bucket{handler="/",le="1"} 1423
prometheus_http_request_duration_seconds_bucket{handler="/",le="3"} 1423
prometheus_http_request_duration_seconds_bucket{handler="/",le="8"} 1423
prometheus_http_request_duration_seconds_bucket{handler="/",le="20"} 1423
prometheus_http_request_duration_seconds_bucket{handler="/",le="60"} 1423
prometheus_http_request_duration_seconds_bucket{handler="/",le="120"} 1423
prometheus_http_request_duration_seconds_bucket{handler="/",le="+Inf"} 1423
prometheus_http_request_duration_seconds_sum{handler="/"} 0.05321879099999997
prometheus_http_request_duration_seconds_count{handler="/"} 1423

@machine424
Copy link
Member Author

I have quickly dived into the fmt code. It already uses a sync.Pool for it's working buffers, so I would think the returns from this PR are diminishing the more this code path is used, in particular in use cases where many parsers are instantiated (in which case the change here could even be a net negative).

BenchmarkParse does instantiate a new parser on each iteration (imitating what happens in scrape.append()), and still in BenchmarkParse use case, having a per parser buffer looks more optimal that fmt's (I didn't into its pooling code).

I had tried a global sync.Pool, seen no difference, but I wasn't aware a new parser was created on every scrape/iteration. Avoiding a requisite 24bytes per scrape will always be better, I'll go back to sync.Pool.

@machine424 machine424 requested a review from beorn7 October 17, 2024 13:34
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a question about maybe using defer.

How do the benchmark look like with the current state?
Let's maybe add the case where .0 has to be appended to the benchmarks to see how well it works?

*bp = strconv.AppendFloat((*bp)[:0], f, 'g', -1, 64)
if bytes.ContainsAny(*bp, "e.") {
s := string(*bp)
floatFormatBufPool.Put(bp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put the Put in a defer statement? I have vague memories that the performance overhead of that should be much smaller these days…

Being able to write return string (*bp) is totally worth it, IMHO. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to test the version without defer but didn't restore.

@machine424
Copy link
Member Author

machine424 commented Oct 18, 2024

How do the benchmark look like with the current state?

I added it to the description.

Let's maybe add the case where .0 has to be appended to the benchmarks to see how well it works?

Saw this coming, I realized I was asking for trouble when I pointed out that the tests didn’t cover that case :) Added one.

Comparing it with the "alter in string" version

                                                   │   v0.txt    │            v1.txt             │
                                                   │   sec/op    │   sec/op     vs base          │
Parse/data=createTestProtoBuf()/parser=promproto-2   33.85µ ± 4%   34.39µ ± 3%  ~ (p=0.529 n=10)

                                                   │    v0.txt    │             v1.txt             │
                                                   │     B/s      │     B/s       vs base          │
Parse/data=createTestProtoBuf()/parser=promproto-2   108.3Mi ± 4%   106.6Mi ± 3%  ~ (p=0.529 n=10)

                                                   │    v0.txt    │               v1.txt                │
                                                   │     B/op     │     B/op      vs base               │
Parse/data=createTestProtoBuf()/parser=promproto-2   40.15Ki ± 0%   40.14Ki ± 0%  -0.04% (p=0.000 n=10)

                                                   │   v0.txt   │              v1.txt               │
                                                   │ allocs/op  │ allocs/op   vs base               │
Parse/data=createTestProtoBuf()/parser=promproto-2   876.0 ± 0%   870.0 ± 0%  -0.68% (p=0.000 n=10)

@machine424 machine424 requested a review from beorn7 October 18, 2024 08:23
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Let's go with it.

@beorn7
Copy link
Member

beorn7 commented Oct 18, 2024

Just make sure to get the DCO right when squashing…

…by using strconv.AppendFloat instead of fmt.Sprint

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
@machine424 machine424 enabled auto-merge (rebase) October 18, 2024 13:24
@machine424 machine424 disabled auto-merge October 18, 2024 13:24
@machine424 machine424 enabled auto-merge October 18, 2024 13:33
@machine424 machine424 merged commit 5505c83 into prometheus:main Oct 18, 2024
julienduchesne pushed a commit to julienduchesne/prometheus that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants