statistics: merge the partition-level histograms to a global-level histogram#22603
statistics: merge the partition-level histograms to a global-level histogram#22603qw4990 merged 12 commits intopingcap:masterfrom
Conversation
|
/label status/WIP |
|
/unlabel status/WIP |
|
/label sig/planner |
statistics/histogram.go
Outdated
|
|
||
| // mergeBucketNDV merges bucket NDV from tow bucket `b` & `left`. | ||
| // Before merging, you need to make sure that when using (upper, lower) as the comparison key, `b` is greater than `left` | ||
| func (b *bucket4Merging) mergeBucketNDV(sc *stmtctx.StatementContext, left *bucket4Merging) error { |
There was a problem hiding this comment.
How about changing this method to a statistics function like mergeBucketNDV(sc, left, right) (ndv, err), which is more easy-to-use and test?
Co-authored-by: Yuanjia Zhang <qw4990@163.com>
statistics/histogram.go
Outdated
| // if the buckets have the same upper, we merge them into the same new buckets. | ||
| for ; i > 0; i-- { | ||
| res, err := buckets[i-1].upper.CompareDatum(sc, buckets[i].upper) | ||
| if err == nil || res != 0 { |
There was a problem hiding this comment.
Do we need to handle this error?
| { | ||
| lower: 6, | ||
| upper: 9, | ||
| count: 5, |
There was a problem hiding this comment.
It seems that the meaning of the count in h1Buckets and h2Buckets is different from the meaning of the count in expHist. The former is the cumulative value. Is this expected?
statistics/histogram.go
Outdated
| } | ||
| } | ||
|
|
||
| // mergeBucketNDV merges bucket NDV from tow bucket `b` & `left`. |
There was a problem hiding this comment.
What does tow bucket mean?
statistics/histogram.go
Outdated
| return nil, err | ||
| } | ||
| // __right__| | ||
| // -------left----| |
There was a problem hiding this comment.
Is the meaning of the dotted line(---) the same as the solid line(___)?
statistics/histogram.go
Outdated
| // repeat = sum of buckets[i] (buckets[i].upper == global bucket.upper && i in [l...r)) | ||
| // ndv = merge bucket ndv from r-1 to l by mergeBucketNDV | ||
| // Notice: lower is not calculated here. | ||
| func mergePartitionBuckets(sc *stmtctx.StatementContext, buckets []*bucket4Merging) *bucket4Merging { |
There was a problem hiding this comment.
Should we return the error message?
statistics/histogram.go
Outdated
| // Notice: If expBucketNumber == 0, we will let expBucketNumber = max(hists.Len()) | ||
| func MergePartitionHist2GlobalHist(sc *stmtctx.StatementContext, hists []*Histogram, expBucketNumber int64) (*Histogram, error) { | ||
| var totCount, totNull, bucketNumber, totColSize int64 | ||
| // minValue is used to calc the bucket lower. |
There was a problem hiding this comment.
Move to the place where minValue is defined.
statistics/histogram.go
Outdated
| if !needBucketNumber { | ||
| continue | ||
| } | ||
| if int64(hist.Len()) > expBucketNumber { | ||
| expBucketNumber = int64(hist.Len()) | ||
| } |
There was a problem hiding this comment.
We can merge here to
if needBucketNumber && int64(hist.Len()) > expBucketNumber {
expBucketNumber = int64(hist.Len())
}
statistics/histogram.go
Outdated
| continue | ||
| } | ||
| res, err := hist.GetLower(0).CompareDatum(sc, minValue) | ||
| if err != nil && res < 0 { |
There was a problem hiding this comment.
Do we need to handle this error?
| totCount += hist.Buckets[hist.Len()-1].Count | ||
| if minValue == nil { | ||
| minValue = hist.GetLower(0).Clone() | ||
| continue |
There was a problem hiding this comment.
If the continue occurs, the expBucketNumber can not be updated.
| } | ||
| globalBuckets = append(globalBuckets, merged) | ||
| r = i | ||
| bucketCount++ |
There was a problem hiding this comment.
Should we set sum = 0 here?
There was a problem hiding this comment.
We don't need to set sum = 0, because we use sum >= totCount*bucketCount/expBucketNumber as the judgment condition.
|
I addressed all your comments. @Reminiscent PTAL |
| func mergePartitionBuckets(sc *stmtctx.StatementContext, buckets []*bucket4Merging) (*bucket4Merging, error) { | ||
| if len(buckets) == 0 { | ||
| return nil | ||
| return nil, errors.Errorf("not enough buckets to merge") |
There was a problem hiding this comment.
Should we return the error here? Or check whether bucket4Merging is nil in the calling function.
|
|
||
| // Merge histogram | ||
| err = errors.Errorf("TODO: The merge function of the histogram structure has not been implemented yet") | ||
| globalStats.Hg[i], err = statistics.MergePartitionHist2GlobalHist(sc, allHg[i], 0) |
There was a problem hiding this comment.
Can we explicitly set the number of histogram buckets (the third parameter of the MergePartitionHist2GlobalHist function)?
|
/merge |
|
/run-all-tests |
|
@rebelice merge failed. |
|
Please update handler.go to let our stats-collection mechanism can use your algorithm. @rebelice |
|
/run-e2e-test |
What problem does this PR solve?
sub-PR for the PR #22472
Problem Summary:
We merge the partition-level histograms to a global-level histogram.
design doc: link
Related changes
pingcap/docs/pingcap/docs-cn:Check List
Tests
Side effects
Release note