enhance promtool tsdb analyze command by fatsheep9146 · Pull Request #12869 · prometheus/prometheus

fatsheep9146 · 2023-09-20T05:12:46Z

This is first commit to try to enhance the tsdb analyze for native histgram. Toke the advise from @beorn7

So maybe have separate histograms for float chunks and histogram chunks, and each of them should have a histogram for sample count, chunk size in bytes, and for histogram number of buckets in the chunk (maybe even "represented buckets" vs. "populated buckets").

Originally posted by @beorn7 in #12146 (comment)

I first try to

separate the histogram for float and histogram chunks, and display about the sample count and total bytes
display the result according to the absolute value, since the max chunk number is not const

here is the format now

Compaction analysis:
Fullness: Amount of samples in float chunks
chunk sample num min: 101
chunk sample num max: 141
    100: ########
    110:
    120: ################
    130:
    140: ################
    150:

Size of float chunks
chunk size min: 121
chunk size max: 1183
      0: ######################################
   1000: ##
   2000:

Fullness: Amount of samples in histogram chunks
chunk sample num min: 101
chunk sample num max: 141
    100: ###########
    110:
    120: #######################
    130:
    140: #######################
    150:

Size of histogram chunks
chunk size min: 2362
chunk size max: 3257
   2300: #####
   2400: #####
   2500:
   2600: ############
   2700: ##########
   2800:
   2900:
   3000:
   3100: ##################
   3200: #####
   3300:

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

beorn7 · 2023-09-20T11:04:40Z

I haven't looked at the code yet, just the example output above.

Generally looks very good.

Some ideas about the detailed representation:

I think it would avoid a lot of guessing by the reader if you specified the exact range for each bucket, maybe in the mathematical notation for open and closed intervals, so instead of 100: ####, use (100, 200]: #### (assuming that 100 means a bucket with an exclusive lower boundary of 100 and an inclusive upper boundary of 200). Alternatively, because we are dealing with integers here, you could use [101, 200]: ####. Maybe even easier to read for most people.

Then there should be an absolute number of counts for the bucket, not just the ASCII art, something like [101, 200]: 1234 ####.

Average would also be cool, maybe collapsing the lines like

chunk size min: 121
chunk size max: 1183

to something like

bytes per chunk (min/max/avg): 121/1183/544

And finally, it would be good to also get "buckets per chunk" for histogram chunks.

But these are just display layer details. The general direction is right, I would say. Happy to do a detailed code review once you let me know that the code is ready.

fatsheep9146 · 2023-09-20T11:06:44Z

I haven't looked at the code yet, just the example output above.

Generally looks very good.

Some ideas about the detailed representation:

I think it would avoid a lot of guessing by the reader if you specified the exact range for each bucket, maybe in the mathematical notation for open and closed intervals, so instead of 100: ####, use (100, 200]: #### (assuming that 100 means a bucket with an exclusive lower boundary of 100 and an inclusive upper boundary of 200). Alternatively, because we are dealing with integers here, you could use [101, 200]: ####. Maybe even easier to read for most people.

Then there should be an absolute number of counts for the bucket, not just the ASCII art, something like [101, 200]: 1234 ####.

Average would also be cool, maybe collapsing the lines like
chunk size min: 121
chunk size max: 1183
to something like
bytes per chunk (min/max/avg): 121/1183/544
And finally, it would be good to also get "buckets per chunk" for histogram chunks.

But these are just display layer details. The general direction is right, I would say. Happy to do a detailed code review once you let me know that the code is ready.

Thanks! I will modify as you suggest.

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

fatsheep9146 · 2023-09-25T01:47:42Z

The pr is ready to be reviewed. @beorn7 @dgl
Now the display format is as follows

Compaction analysis:
Fullness: Amount of samples in float chunks
sample count per float chunk (min/max/avg): 101/141/125
[101, 110]: 14 ########
[111, 120]: 0
[121, 130]: 28 ################
[131, 140]: 0
[141, 150]: 28 ################
[151, 160]: 0


Size of float chunks
bytes per float chunk (min/max/avg): 121/1183/283
[1, 1000]: 66 ######################################
[1001, 2000]: 4 ##
[2001, 3000]: 0


Fullness: Amount of samples in histogram chunks
sample count per histogram chunk (min/max/avg): 101/141/125
[101, 110]: 20 ###########
[111, 120]: 0
[121, 130]: 40 #######################
[131, 140]: 0
[141, 150]: 40 #######################
[151, 160]: 0


Size of histogram chunks
bytes per histogram chunk (min/max/avg): 2362/3257/2833
[2301, 2400]: 10 #####
[2401, 2500]: 10 #####
[2501, 2600]: 0
[2601, 2700]: 22 ############
[2701, 2800]: 18 ##########
[2801, 2900]: 0
[2901, 3000]: 0
[3001, 3100]: 0
[3101, 3200]: 31 ##################
[3201, 3300]: 9 #####
[3301, 3400]: 0


Amount of buckets of histogram chunks
buckets per histogram chunk (min/max/avg): 1717/2397/2125
[1701, 1800]: 10 #####
[1801, 1900]: 10 #####
[1901, 2000]: 0
[2001, 2100]: 40 #######################
[2101, 2200]: 0
[2201, 2300]: 0
[2301, 2400]: 40 #######################
[2401, 2500]: 0

dgl

Mostly reviewing from the promtool side, a sanity check for histograms would be good.

cmd/promtool/tsdb.go

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

beorn7

Very very nice.

I only have some comments about the presentation. Note that we have to make a call about min/max/avg vs min/avg/max.

cmd/promtool/tsdb.go

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

fatsheep9146 · 2023-09-29T05:10:50Z

@beorn7 all comments are resolved, please help review it again

beorn7

My apologies for the review delay.

Almost perfect, but I think there is a tiny bug in the formatting.

beorn7 · 2023-10-08T12:14:54Z

cmd/promtool/tsdb.go

+	}
+	avg := sum / len(datas)
+	fmt.Printf("%s (min/avg/max): %d/%d/%d\n", dataType, datas[0], avg, datas[len(datas)-1])
+	maxLen := strconv.Itoa(len(fmt.Sprintf("%d", end)))


Does this have to be end + step rather than just end? Not sure, but I see the following if only the last upper bound has a greater length:

bytes per histogram chunk (min/avg/max): 491/818/862 [401, 500]: 2 [501, 600]: 1 [601, 700]: 0 [701, 800]: 222 [801, 900]: 2162 # [901, 1000]: 0

yes you are right, besides I also split the maxLen to maxLeftLen and maxRightLen, because if only use the maxLen, the output will be like following in your case, which has an extra whitespace on the left.

bytes per histogram chunk (min/avg/max): 491/818/862 [ 401, 500]: 2 [ 501, 600]: 1 [ 601, 700]: 0 [ 701, 800]: 222 [ 801, 900]: 2162 # [ 901, 1000]: 0

@beorn7

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

beorn7

Thank you very much.

Improve promtool tsdb analyze - Make it more suitable for variable size float chunks. - Add support for histogram chunks. --------- Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com> Signed-off-by: Sheikh-Abubaker <sheikhabubaker761@gmail.com>

Improve promtool tsdb analyze - Make it more suitable for variable size float chunks. - Add support for histogram chunks. --------- Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

Improve promtool tsdb analyze - Make it more suitable for variable size float chunks. - Add support for histogram chunks. --------- Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com> Signed-off-by: Levi Harrison <git@leviharrison.dev>

optimiz promtool tsdb analyze command

aeaee42

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

fatsheep9146 requested a review from dgl as a code owner September 20, 2023 05:12

fatsheep9146 added 3 commits September 22, 2023 13:04

optimiz promtool tsdb analyze command

2054dfa

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

fix failed lint check

e9bc192

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

add bucket count histogram

d421caa

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

dgl reviewed Sep 25, 2023

View reviewed changes

cmd/promtool/tsdb.go Outdated Show resolved Hide resolved

cmd/promtool/tsdb.go Outdated Show resolved Hide resolved

cmd/promtool/tsdb.go Outdated Show resolved Hide resolved

fatsheep9146 added 2 commits September 25, 2023 18:44

fix for comments

7bbe062

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

fix for comments

3c0c24f

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

beorn7 mentioned this pull request Sep 27, 2023

Add analyze histograms command to promtool #12331

Merged

beorn7 requested changes Sep 27, 2023

View reviewed changes

fix for review comments

eff2a7b

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

fatsheep9146 requested review from beorn7 and dgl October 3, 2023 14:55

beorn7 requested changes Oct 8, 2023

View reviewed changes

fix for logic to get maxlen

1ca5d90

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

beorn7 approved these changes Oct 14, 2023

View reviewed changes

beorn7 merged commit 1a6edff into prometheus:main Oct 14, 2023

Conversation

fatsheep9146 commented Sep 20, 2023

Uh oh!

beorn7 commented Sep 20, 2023

Uh oh!

fatsheep9146 commented Sep 20, 2023

Uh oh!

fatsheep9146 commented Sep 25, 2023

Uh oh!

dgl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

beorn7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fatsheep9146 commented Sep 29, 2023

Uh oh!

beorn7 left a comment

Choose a reason for hiding this comment

Uh oh!

beorn7 Oct 8, 2023

Choose a reason for hiding this comment

Uh oh!

fatsheep9146 Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

beorn7 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants