Skip to content

[BUGFIX] Scraping: Naive fixes and optimzations for CreatedTimestamp function#14965

Merged
bwplotka merged 15 commits intoprometheus:mainfrom
Maniktherana:ct-om-optimization
Oct 4, 2024
Merged

[BUGFIX] Scraping: Naive fixes and optimzations for CreatedTimestamp function#14965
bwplotka merged 15 commits intoprometheus:mainfrom
Maniktherana:ct-om-optimization

Conversation

@Maniktherana
Copy link
Contributor

@Maniktherana Maniktherana commented Sep 23, 2024

addresses #14808 and #14823
ran this command:

go test -benchmem -run=^$ -bench ^BenchmarkParse$ -count=50 github.com/prometheus/prometheus/model/textparse 

Benchstat result of running this test on main (old.txt) and this PR (new.txt):

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/textparse
cpu: Apple M1
                                                    │   old.txt   │           new.txt           │
                                                    │   sec/op    │   sec/op     vs base        │
Parse/openmetrics-skip-ct/parse-ct/omtestdata.txt-8   350.7n ± 1%   106.3n ± 2%  -69.70% (n=50)

                                                    │   old.txt    │                new.txt                │
                                                    │     B/s      │     B/s       vs base                 │
Parse/openmetrics-skip-ct/parse-ct/omtestdata.txt-8   27.20Mi ± 2%   89.76Mi ± 3%  +230.03% (p=0.000 n=50)

                                                    │   old.txt   │          new.txt           │
                                                    │    B/op     │    B/op     vs base        │
Parse/openmetrics-skip-ct/parse-ct/omtestdata.txt-8   668.00 ± 0%   60.00 ± 0%  -91.02% (n=50)

                                                    │  old.txt   │          new.txt           │
                                                    │ allocs/op  │ allocs/op   vs base        │
Parse/openmetrics-skip-ct/parse-ct/omtestdata.txt-8   2.000 ± 0%   1.000 ± 0%  -50.00% (n=50)

Note: results are updated with each commit

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana Maniktherana changed the title feat: naive optimzations for CreatedTimestamp function feat: (WIP) naive optimzations for CreatedTimestamp function Sep 23, 2024
Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana Maniktherana changed the title feat: (WIP) naive optimzations for CreatedTimestamp function feat: naive fixes and optimzations for CreatedTimestamp function Sep 24, 2024
Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana
Copy link
Contributor Author

this PR depends on #14933

Maniktherana and others added 7 commits September 25, 2024 23:47
Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <Manikrana54@gmail.com>
Signed-off-by: Manik Rana <Manikrana54@gmail.com>
Signed-off-by: Manik Rana <manikrana54@gmail.com>
bwplotka added a commit that referenced this pull request Oct 3, 2024
Signed-off-by: bwplotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member

bwplotka commented Oct 3, 2024

Replicated the benchmark run with the new proposed benchmark #15083

Command I used:

 export bench=testv2 && go test ./model/textparse/... \
                 -run '^$' -bench '^BenchmarkParse/data=(promtestdata.txt|omtestdata.txt)/parser=(omtext|promtext)' \
                 -benchtime 2s -count 6 -cpu 2 -benchmem -timeout 999m \
         | tee ${bench}.txt

I did that for:

  • testv1: state from main (commit).
  • testv1-noct: state from main with CreatedTimestamp commented out.
  • testv2: state of om text parser from this PR (commit)
benchstat -row=".name /data /parser" -filter /parser:omtext testv1.txt testv1-noct.txt testv2.txt 
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/textparse
cpu: Apple M1 Pro
                              │  testv1.txt  │          testv1-noct.txt           │             testv2.txt              │
                              │    sec/op    │   sec/op     vs base               │    sec/op     vs base               │
Parse omtestdata.txt omtext     144.11µ ± 2%   20.41µ ± 1%  -85.83% (p=0.002 n=6)    41.28µ ± 3%  -71.36% (p=0.002 n=6)
Parse promtestdata.txt omtext   5137.7µ ± 1%   152.3µ ± 1%  -97.04% (p=0.002 n=6)   3639.6µ ± 4%  -29.16% (p=0.002 n=6)
geomean                          860.5µ        55.75µ       -93.52%                  387.6µ       -54.96%

                              │  testv1.txt  │             testv1-noct.txt             │              testv2.txt               │
                              │     B/s      │      B/s        vs base                 │      B/s       vs base                │
Parse omtestdata.txt omtext     29.63Mi ± 2%    209.15Mi ± 1%   +605.97% (p=0.002 n=6)   103.45Mi ± 3%  +249.17% (p=0.002 n=6)
Parse promtestdata.txt omtext   6.194Mi ± 1%   208.964Mi ± 1%  +3273.60% (p=0.002 n=6)    8.740Mi ± 3%   +41.11% (p=0.002 n=6)
geomean                         13.55Mi          209.1Mi       +1443.26%                  30.07Mi       +121.97%

                              │   testv1.txt   │           testv1-noct.txt           │              testv2.txt              │
                              │      B/op      │     B/op      vs base               │     B/op       vs base               │
Parse omtestdata.txt omtext      274.31Ki ± 0%   14.59Ki ± 0%  -94.68% (p=0.002 n=6)    27.08Ki ± 0%  -90.13% (p=0.002 n=6)
Parse promtestdata.txt omtext   17634.6Ki ± 0%   113.0Ki ± 0%  -99.36% (p=0.002 n=6)   2158.5Ki ± 0%  -87.76% (p=0.002 n=6)
geomean                           2.148Mi        40.61Ki       -98.15%                  241.8Ki       -89.01%

                              │  testv1.txt  │          testv1-noct.txt           │             testv2.txt              │
                              │  allocs/op   │  allocs/op   vs base               │  allocs/op    vs base               │
Parse omtestdata.txt omtext      1278.0 ± 0%    216.0 ± 0%  -83.10% (p=0.002 n=6)     446.0 ± 0%  -65.10% (p=0.002 n=6)
Parse promtestdata.txt omtext   32.577k ± 0%   1.649k ± 0%  -94.94% (p=0.002 n=6)   30.826k ± 0%   -5.37% (p=0.002 n=6)
geomean                          6.452k         596.8       -90.75%                  3.708k       -42.53%

~50% improvement for all metrics, so it's a great iteration, I think we could start with that, will do readability review later on. If we could improve and NOT change tests too much (except deep copy testing), it would be amazing.

@bwplotka
Copy link
Member

bwplotka commented Oct 3, 2024

We still allocate a lot 113.0Ki ± 0% (no-ct) vs 2158.5Ki ± 0%

Capture pprof profiles: mem and cpu

You can see that even with optimizations CT parsing takes 70% of CPU for parsing:

image

.. mostly due to use of Metric (odd, considering we use Metric in benchmark for each series already - can we cache this?) and Next.

This is similar for mem

image

cc @bboreham

let me comment in the code for some suggestions

return nil
}

var peekedLset labels.Labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easy optimization - move that to start of this function or even parser, then here do peekedLset =peekedLset[:0].

This allows to reuse internal array we use for label bytes (hopefully) (:

Signed-off-by: Manik Rana <manikrana54@gmail.com>
@Maniktherana Maniktherana marked this pull request as ready for review October 3, 2024 10:39
Comment on lines +359 to +367
func findBaseMetricName(name string) string {
suffixes := []string{"_created", "_count", "_sum", "_bucket", "_total", "_gcount", "_gsum", "_info"}
for _, suffix := range suffixes {
if strings.HasSuffix(name, suffix) {
return strings.TrimSuffix(name, suffix)
}
}
return name
}
Copy link
Member

@ArthurSens ArthurSens Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we can do that without checking the appropriate types.

E.g. _total is only a suffix for counters, but it is still possible to have other metrics with this suffix and it will be part of the metric name. As far as I understood there is nothing in OM that prevents that from happening 🤔

Copy link
Contributor Author

@Maniktherana Maniktherana Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true
This function is used in one place atm (beginning of created timestamp function). otherwise we check with hashsets like before.

There's a weird case where if we compare a metric line that had no extra labels with another of created line, it would generate the same hash (because we don't include __name__ when comparing hashsets)

Copy link
Member

@bwplotka bwplotka Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there is a way to remove this logic (the whole function).

We use currName to figure family name essentially, but isn't this parsed somewhere from metadata? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to find a way too to get rid of this function.

But even if we find the currName won't it have suffixes attached to it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time yes, but unfortunately all metadata fields are optional in OM, including type. But bartek has a good point, we don't need to do any of this if metadata was set, and just call this function if metadata isn't present.

This brings another problem though, if we call this function it means we don't know the type and therefore we don't know which suffixes we were supposed to remove 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings another problem though, if we call this function it means we don't know the type and therefore we don't know which suffixes we were supposed to remove 😬

we already check for that using typeRequiresCT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it is still possible to have other metrics with this suffix and it will be part of the metric name.

but might still leave room for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will never be perfect, but it would be nice to have exact test case for where we "know" we will be wrong. Ideally "wrong" means failure, not corrupted CT (:

Copy link
Contributor Author

@Maniktherana Maniktherana Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay to better illustrate why we have this function I've updated tests
if you replace line 280 with

if currName == p.visitedName && currFamilyLsetHash == p.ctHashSet && p.visitedName != "" && p.ctHashSet > 0 && p.ct > 0 {

and debug it with TestOpenMetricsParse

and focus on currFamilyLsetHash vs p.ctHashSet you'll notice we'll have the same hash in this case and it would incorrectly early return CT

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we synced and let's merge this first. It's not ideal as I have a feeling it introduces some regressions (around the things we discussed with thing_total gauges after thing counter). It's a small thing though AND we did optimize things 2x (still lot's to go). With benchmark and testing refactors (#15095), might be better to merge and iterate

@bwplotka bwplotka merged commit 47aeca9 into prometheus:main Oct 4, 2024
@bboreham bboreham changed the title feat: naive fixes and optimzations for CreatedTimestamp function [BUGFIX] Scraping: Naive fixes and optimzations for CreatedTimestamp function Oct 8, 2024
@bboreham
Copy link
Member

bboreham commented Oct 8, 2024

Small nit: "feat" in the headline denotes a feature, something a user might read and think "that's new!".
(and we have historically written "[FEATURE]" for those things).
Fixing a bug is [BUGFIX] and optimisations are [PERF].

@beorn7 beorn7 mentioned this pull request Oct 8, 2024
@bboreham
Copy link
Member

bboreham commented Oct 8, 2024

This looks like it should be backported to release-2.55 branch.

squat pushed a commit to squat/prometheus that referenced this pull request Nov 8, 2024
…rometheus#14965)

* enhance: wip ct parse optimizations

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: further work on optimization

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: further improvements and remove unused code

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: improve optimizations and fix some CT parse errors

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: check for LsetHash along with name

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: cleanup and documentation

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* enhance: improve comments and add cleaner functions

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: improve comments and add cleaner functions

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: rename to resetCTParseValues

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: post-merge fixes

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: add all possible reserved suffixes

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* test: separate CT values for each metric

Signed-off-by: Manik Rana <manikrana54@gmail.com>

---------

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <Manikrana54@gmail.com>
squat pushed a commit to squat/prometheus that referenced this pull request Nov 8, 2024
…rometheus#14965)

* enhance: wip ct parse optimizations

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: further work on optimization

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: further improvements and remove unused code

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: improve optimizations and fix some CT parse errors

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: check for LsetHash along with name

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: cleanup and documentation

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* enhance: improve comments and add cleaner functions

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: improve comments and add cleaner functions

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: rename to resetCTParseValues

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: post-merge fixes

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: add all possible reserved suffixes

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* test: separate CT values for each metric

Signed-off-by: Manik Rana <manikrana54@gmail.com>

---------

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <Manikrana54@gmail.com>
squat pushed a commit to squat/prometheus that referenced this pull request Nov 8, 2024
…rometheus#14965)

* enhance: wip ct parse optimizations

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: further work on optimization

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: further improvements and remove unused code

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: improve optimizations and fix some CT parse errors

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: check for LsetHash along with name

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: cleanup and documentation

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* enhance: improve comments and add cleaner functions

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* feat: improve comments and add cleaner functions

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* chore: rename to resetCTParseValues

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: post-merge fixes

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* fix: add all possible reserved suffixes

Signed-off-by: Manik Rana <manikrana54@gmail.com>

* test: separate CT values for each metric

Signed-off-by: Manik Rana <manikrana54@gmail.com>

---------

Signed-off-by: Manik Rana <manikrana54@gmail.com>
Signed-off-by: Manik Rana <Manikrana54@gmail.com>
julienduchesne pushed a commit to julienduchesne/prometheus that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants