Skip to content

enhance promtool tsdb analyze command#12869

Merged
beorn7 merged 8 commits intoprometheus:mainfrom
fatsheep9146:promtool-tsdb-analyze-optimiz
Oct 14, 2023
Merged

enhance promtool tsdb analyze command#12869
beorn7 merged 8 commits intoprometheus:mainfrom
fatsheep9146:promtool-tsdb-analyze-optimiz

Conversation

@fatsheep9146
Copy link
Copy Markdown
Contributor

related #12146

This is first commit to try to enhance the tsdb analyze for native histgram. Toke the advise from @beorn7

So maybe have separate histograms for float chunks and histogram chunks, and each of them should have a histogram for sample count, chunk size in bytes, and for histogram number of buckets in the chunk (maybe even "represented buckets" vs. "populated buckets").

Originally posted by @beorn7 in #12146 (comment)

I first try to

  • separate the histogram for float and histogram chunks, and display about the sample count and total bytes
  • display the result according to the absolute value, since the max chunk number is not const

here is the format now

Compaction analysis:
Fullness: Amount of samples in float chunks
chunk sample num min: 101
chunk sample num max: 141
    100: ########
    110:
    120: ################
    130:
    140: ################
    150:

Size of float chunks
chunk size min: 121
chunk size max: 1183
      0: ######################################
   1000: ##
   2000:

Fullness: Amount of samples in histogram chunks
chunk sample num min: 101
chunk sample num max: 141
    100: ###########
    110:
    120: #######################
    130:
    140: #######################
    150:

Size of histogram chunks
chunk size min: 2362
chunk size max: 3257
   2300: #####
   2400: #####
   2500:
   2600: ############
   2700: ##########
   2800:
   2900:
   3000:
   3100: ##################
   3200: #####
   3300:

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
@fatsheep9146 fatsheep9146 requested a review from dgl as a code owner September 20, 2023 05:12
@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Sep 20, 2023

I haven't looked at the code yet, just the example output above.

Generally looks very good.

Some ideas about the detailed representation:

I think it would avoid a lot of guessing by the reader if you specified the exact range for each bucket, maybe in the mathematical notation for open and closed intervals, so instead of 100: ####, use (100, 200]: #### (assuming that 100 means a bucket with an exclusive lower boundary of 100 and an inclusive upper boundary of 200). Alternatively, because we are dealing with integers here, you could use [101, 200]: ####. Maybe even easier to read for most people.

Then there should be an absolute number of counts for the bucket, not just the ASCII art, something like [101, 200]: 1234 ####.

Average would also be cool, maybe collapsing the lines like

chunk size min: 121
chunk size max: 1183

to something like

bytes per chunk (min/max/avg): 121/1183/544

And finally, it would be good to also get "buckets per chunk" for histogram chunks.

But these are just display layer details. The general direction is right, I would say. Happy to do a detailed code review once you let me know that the code is ready.

@fatsheep9146
Copy link
Copy Markdown
Contributor Author

I haven't looked at the code yet, just the example output above.

Generally looks very good.

Some ideas about the detailed representation:

I think it would avoid a lot of guessing by the reader if you specified the exact range for each bucket, maybe in the mathematical notation for open and closed intervals, so instead of 100: ####, use (100, 200]: #### (assuming that 100 means a bucket with an exclusive lower boundary of 100 and an inclusive upper boundary of 200). Alternatively, because we are dealing with integers here, you could use [101, 200]: ####. Maybe even easier to read for most people.

Then there should be an absolute number of counts for the bucket, not just the ASCII art, something like [101, 200]: 1234 ####.

Average would also be cool, maybe collapsing the lines like

chunk size min: 121
chunk size max: 1183

to something like

bytes per chunk (min/max/avg): 121/1183/544

And finally, it would be good to also get "buckets per chunk" for histogram chunks.

But these are just display layer details. The general direction is right, I would say. Happy to do a detailed code review once you let me know that the code is ready.

Thanks! I will modify as you suggest.

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
@fatsheep9146
Copy link
Copy Markdown
Contributor Author

The pr is ready to be reviewed. @beorn7 @dgl
Now the display format is as follows

Compaction analysis:
Fullness: Amount of samples in float chunks
sample count per float chunk (min/max/avg): 101/141/125
[101, 110]: 14 ########
[111, 120]: 0
[121, 130]: 28 ################
[131, 140]: 0
[141, 150]: 28 ################
[151, 160]: 0


Size of float chunks
bytes per float chunk (min/max/avg): 121/1183/283
[1, 1000]: 66 ######################################
[1001, 2000]: 4 ##
[2001, 3000]: 0


Fullness: Amount of samples in histogram chunks
sample count per histogram chunk (min/max/avg): 101/141/125
[101, 110]: 20 ###########
[111, 120]: 0
[121, 130]: 40 #######################
[131, 140]: 0
[141, 150]: 40 #######################
[151, 160]: 0


Size of histogram chunks
bytes per histogram chunk (min/max/avg): 2362/3257/2833
[2301, 2400]: 10 #####
[2401, 2500]: 10 #####
[2501, 2600]: 0
[2601, 2700]: 22 ############
[2701, 2800]: 18 ##########
[2801, 2900]: 0
[2901, 3000]: 0
[3001, 3100]: 0
[3101, 3200]: 31 ##################
[3201, 3300]: 9 #####
[3301, 3400]: 0


Amount of buckets of histogram chunks
buckets per histogram chunk (min/max/avg): 1717/2397/2125
[1701, 1800]: 10 #####
[1801, 1900]: 10 #####
[1901, 2000]: 0
[2001, 2100]: 40 #######################
[2101, 2200]: 0
[2201, 2300]: 0
[2301, 2400]: 40 #######################
[2401, 2500]: 0

Copy link
Copy Markdown
Member

@dgl dgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly reviewing from the promtool side, a sanity check for histograms would be good.

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Copy link
Copy Markdown
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice.

I only have some comments about the presentation. Note that we have to make a call about min/max/avg vs min/avg/max.

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
@fatsheep9146
Copy link
Copy Markdown
Contributor Author

@beorn7 all comments are resolved, please help review it again

@fatsheep9146 fatsheep9146 requested review from beorn7 and dgl October 3, 2023 14:55
Copy link
Copy Markdown
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies for the review delay.

Almost perfect, but I think there is a tiny bug in the formatting.

}
avg := sum / len(datas)
fmt.Printf("%s (min/avg/max): %d/%d/%d\n", dataType, datas[0], avg, datas[len(datas)-1])
maxLen := strconv.Itoa(len(fmt.Sprintf("%d", end)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be end + step rather than just end? Not sure, but I see the following if only the last upper bound has a greater length:

bytes per histogram chunk (min/avg/max): 491/818/862
[401, 500]:    2 
[501, 600]:    1 
[601, 700]:    0 
[701, 800]:  222 
[801, 900]: 2162 #
[901, 1000]:    0 

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you are right, besides I also split the maxLen to maxLeftLen and maxRightLen, because if only use the maxLen, the output will be like following in your case, which has an extra whitespace on the left.

bytes per histogram chunk (min/avg/max): 491/818/862
[ 401,  500]:    2 
[ 501,  600]:    1 
[ 601,  700]:    0 
[ 701,  800]:  222 
[ 801,  900]: 2162 #
[ 901, 1000]:    0 

@beorn7

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Copy link
Copy Markdown
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much.

@beorn7 beorn7 merged commit 1a6edff into prometheus:main Oct 14, 2023
Sheikh-Abubaker pushed a commit to Sheikh-Abubaker/prometheus that referenced this pull request Oct 15, 2023
Improve promtool tsdb analyze

- Make it more suitable for variable size float chunks.
- Add support for histogram chunks.

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Sheikh-Abubaker <sheikhabubaker761@gmail.com>
LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
Improve promtool tsdb analyze

- Make it more suitable for variable size float chunks.
- Add support for histogram chunks.

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
Improve promtool tsdb analyze

- Make it more suitable for variable size float chunks.
- Add support for histogram chunks.

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
Improve promtool tsdb analyze

- Make it more suitable for variable size float chunks.
- Add support for histogram chunks.

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
Improve promtool tsdb analyze

- Make it more suitable for variable size float chunks.
- Add support for histogram chunks.

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
Improve promtool tsdb analyze

- Make it more suitable for variable size float chunks.
- Add support for histogram chunks.

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants