planner, statistics: build the global statistics for the partition table by Reminiscent · Pull Request #22472 · pingcap/tidb

Reminiscent · 2021-01-21T09:14:19Z

What problem does this PR solve?

Issue Number: related: #18551

Problem Summary:
Build the global statistics for the partition tables when we execute the analyze statement.

What is changed and how it works?

Proposal (in Chinese)

What's Changed(In the Dynamic-Only mode):
In the origin implementation, we build analyze task that "collect multi partitions and save as a table" mode. You can see PR#19846, PR#19899 and PR#20271 for more details.

In this PR, we build the analyze task that "collect a partition save a partition" first. And then merge the partition-level stats which belong to the same partition table to get the global-level stats.

We will use the changed variable analyzeTableID for more explanation. You can see this comment for more details.

How it Works(In the Dynamic-Only mode):

When we build the analyze task, we build every task for every partition. And record the table ID to which table the partition belongs to.
If there are some tasks related to the partition table, we will record the partition table ID and index ID.
We use the partition table ID to get the corresponding partition-level stats from the storage. And merge them to build the global-level stats.

Related changes

PR to update pingcap/docs/pingcap/docs-cn:
Need to cherry-pick to the release branch

Check List

Tests

Unit test
Integration test

Side effects

Performance regression
- Consumes more CPU
- Consumes more MEM
Breaking backward compatibility

Release note

Build the global statistics for the partition table

…o global-level stats"

…obalStats

sre-bot · 2021-01-21T09:14:32Z

Please follow PR Title Format:

pkg [, pkg2, pkg3]: what's changed

Or if the count of mainly changed packages are more than 3, use

*: what's changed

sre-bot · 2021-01-21T09:53:18Z

Please follow PR Title Format:

pkg [, pkg2, pkg3]: what's changed

Or if the count of mainly changed packages are more than 3, use

*: what's changed

Reminiscent · 2021-01-21T09:54:53Z

executor/builder.go

-				} else {
-					e.tasks = append(e.tasks, b.buildAnalyzeIndexPushdown(task, v.Opts, autoAnalyze))
-				}
+				e.tasks = append(e.tasks, b.buildAnalyzeIndexPushdown(task, v.Opts, autoAnalyze))


TODO: Need more investigation on when to use buildAnalyzeFastIndex.

Reminiscent · 2021-01-21T10:07:40Z

planner/core/common_plans.go

 type AnalyzeTableID struct {
-	PersistID  int64
-	CollectIDs []int64
+	PersistID int64
+	// FatherID just used in the partition table.
+	// It represents the ID of the table to which the partition belongs.
+	FatherID int64
 }


Version1(The 'static-only' mode): No global-level stats, each partition has its own stats.
Version2: Only global-level stats, no partition-level stats.
Version3(This PR): Both have partition-level stats and global-level stats, and the global-level stats are merged from the partition-level stats

Give an example to make the meaning of these IDs more clear.

Now, we have table t(ID = 1). And the table t has three partitions p1(ID = 2), p2(ID = 3) and p3(ID = 4).

In version1, the analyze request has three tasks: AnalyzeTableID{PersistID = 2, CollectIDs = {2}}, AnalyzeTableID{PersistID = 3, CollectIDs = {3}} , AnalyzeTableID{PersistID = 4, CollectIDs = {4}}.

In version2, the analyze request has only one task: AnalyzeTableID{PersistID = 1, CollectIDs = {2, 3, 4}}.

In version3, the analyze request has three tasks: AnalyzeTableID{PersistID = 2, FatherID= 1}, AnalyzeTableID{PersistID = 3, FatherID= 1} , AnalyzeTableID{PersistID = 4, FatherID= 1}.

sre-bot · 2021-01-22T04:10:29Z

Please follow PR Title Format:

pkg [, pkg2, pkg3]: what's changed

Or if the count of mainly changed packages are more than 3, use

*: what's changed

refine the code deal with some todo lists

…obalStats

Reminiscent · 2021-01-25T03:31:09Z

executor/analyze_test.go

+		"IndexReader_8 2.80 root  index:IndexRangeScan_7",
+		"└─IndexRangeScan_7 2.80 cop[tikv] table:t3, partition:p1, index:k(v) range:[3,3], keep order:false",


TODO: Need more investigation on it is possible to use partition-level stats when there is only a single partition in the where condition.

Reminiscent · 2021-01-25T06:35:18Z

The reason why the test fails now is that the merged function of other statistical information like histogram and topN has not been completed, we need to wait until they are all completed.

ti-srebot · 2021-01-25T07:22:56Z

These labels are not found sig/ planner.

qw4990 · 2021-01-26T08:53:08Z

statistics/table.go

+func (t *Table) getColumnStatsInfo(colID int64) (*Histogram, *CMSketch, *TopN) {
+	colStatsInfo := t.Columns[colID]
+	return colStatsInfo.Histogram.Copy(), colStatsInfo.CMSketch.Copy(), colStatsInfo.TopN.Copy()
+}
+
+func (t *Table) getIndexStatsInfo(idxID int64) (*Histogram, *CMSketch, *TopN) {
+	idxStatsInfo := t.Indices[idxID]
+	return idxStatsInfo.Histogram.Copy(), idxStatsInfo.CMSketch.Copy(), idxStatsInfo.TopN.Copy()
+}
+
+// GetStatsInfo returns their statistics according to the ID of the column or index, including histogram, CMSketch and TopN.
+func (t *Table) GetStatsInfo(ID int64, isIndex int) (*Histogram, *CMSketch, *TopN) {
+	if isIndex == 0 {
+		return t.getColumnStatsInfo(ID)
+	}
+	return t.getIndexStatsInfo(ID)
+}
+


getColumnStatsInfo and getIndexStatsInfo are simple and not used by other functions directly, so how about merging these 3 functions into 1?

qw4990 · 2021-01-26T10:03:41Z

statistics/handle/handle.go

+			return
+		}
+		tableInfo := partitionTable.Meta()
+		partitionStats, err := h.tableStatsFromStorage(tableInfo, partitionID, false, nil)


Should we load it from cache first?

qw4990 · 2021-01-26T10:08:18Z

statistics/handle/handle.go

+type GlobalStats struct {
+	num   int
+	count int64
+	hg    []*statistics.Histogram
+	cms   []*statistics.CMSketch
+	topN  []*statistics.TopN
+}


How about exposing this struct's fields and removing its methods, which may make it clearer?

qw4990 · 2021-01-26T11:14:33Z

executor/analyze.go

+						succ = false
+						break
+					}
+					err = statsHandle.SaveStatsToStorage(info.tableID, globalStatsCount, info.isIndex, hg, cms, topN, info.statsVersion, 1)


Should we update the cache after saving？

Reminiscent · 2021-01-27T03:31:48Z

To make it easier to review, we will split this PR into multiple sub-PRs.

xuyifangreeneyes · 2021-01-27T05:51:45Z

executor/analyze.go

-	reqBuilder := builder.SetHandleRangesForTables(e.ctx.GetSessionVars().StmtCtx, e.tableID.CollectIDs, e.handleCols != nil && !e.handleCols.IsInt(), ranges, nil)
+	reqBuilder := builder.SetHandleRangesForTables(e.ctx.GetSessionVars().StmtCtx, []int64{e.tableID.FatherID}, e.handleCols != nil && !e.handleCols.IsInt(), ranges, nil)


Why we use FatherID rather than PersistID here? I'm kind of confused. Thx~

Please see this PR. It has some differences with this PR. Thanks~

Reminiscent added 2 commits January 21, 2021 16:28

Implemented the basic framework of "merge the partition-level stats t…

4a6efc8

…o global-level stats"

Merge branch 'master' of https://github.com/pingcap/tidb into BuildGl…

fe15ba8

…obalStats

modify collectIDs to fatherID

05ffe8b

Reminiscent commented Jan 21, 2021

View reviewed changes

github-actions bot added component/statistics sig/execution SIG execution labels Jan 21, 2021

add some comments and refine the code

8d965de

Reminiscent changed the title ~~[WIP] Build the global statistics for the partition table~~ planner, statistics: build the global statistics for the partition table Jan 22, 2021

Reminiscent added 2 commits January 25, 2021 11:09

add some comments

74c57b7

refine the code deal with some todo lists

Merge branch 'master' of https://github.com/pingcap/tidb into BuildGl…

f69d571

…obalStats

Reminiscent marked this pull request as ready for review January 25, 2021 03:11

Reminiscent requested review from a team as code owners January 25, 2021 03:11

Reminiscent requested review from XuHuaiyu and removed request for a team January 25, 2021 03:11

Reminiscent added 2 commits January 25, 2021 11:14

fix UT

f0f071b

fix UT

6c40450

Reminiscent commented Jan 25, 2021

View reviewed changes

ti-srebot added the type/enhancement The issue or PR belongs to an enhancement. label Jan 25, 2021

ti-srebot added the sig/planner SIG: Planner label Jan 25, 2021

qw4990 reviewed Jan 26, 2021

View reviewed changes

Reminiscent mentioned this pull request Jan 27, 2021

planner: change the content of AnalyzeTableID to build global-stats #22554

Merged

xuyifangreeneyes reviewed Jan 27, 2021

View reviewed changes

rebelice mentioned this pull request Jan 28, 2021

statistics: merge the partition-level histograms to a global-level histogram #22603

Merged

xuyifangreeneyes mentioned this pull request Jan 29, 2021

planner, statistics: allow (auto) analyze single partition in dynamic-only mode #22625

Merged

Reminiscent mentioned this pull request Feb 2, 2021

statistics: merge the partition-level stats to global-level stats #22667

Merged

Reminiscent closed this Feb 18, 2021

This was referenced Feb 20, 2021

statistics: merge poped topn when generating the global histogram #22848

Merged

*: modify the switch to control global stats #22866

Merged

Reminiscent deleted the BuildGlobalStats branch August 5, 2021 01:57

		"IndexReader_8 2.80 root index:IndexRangeScan_7",
		"└─IndexRangeScan_7 2.80 cop[tikv] table:t3, partition:p1, index:k(v) range:[3,3], keep order:false",

		reqBuilder := builder.SetHandleRangesForTables(e.ctx.GetSessionVars().StmtCtx, e.tableID.CollectIDs, e.handleCols != nil && !e.handleCols.IsInt(), ranges, nil)
		reqBuilder := builder.SetHandleRangesForTables(e.ctx.GetSessionVars().StmtCtx, []int64{e.tableID.FatherID}, e.handleCols != nil && !e.handleCols.IsInt(), ranges, nil)

Conversation

Reminiscent commented Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Related changes

Check List

Release note

Uh oh!

sre-bot commented Jan 21, 2021

Uh oh!

sre-bot commented Jan 21, 2021

Uh oh!

Reminiscent Jan 21, 2021

Choose a reason for hiding this comment

Uh oh!

Reminiscent Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sre-bot commented Jan 22, 2021

Uh oh!

Reminiscent Jan 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reminiscent commented Jan 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ti-srebot commented Jan 25, 2021

Uh oh!

qw4990 Jan 26, 2021

Choose a reason for hiding this comment

Uh oh!

qw4990 Jan 26, 2021

Choose a reason for hiding this comment

Uh oh!

qw4990 Jan 26, 2021

Choose a reason for hiding this comment

Uh oh!

qw4990 Jan 26, 2021

Choose a reason for hiding this comment

Uh oh!

Reminiscent commented Jan 27, 2021

Uh oh!

xuyifangreeneyes Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

Reminiscent Jan 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Reminiscent commented Jan 21, 2021 •

edited

Loading

Reminiscent Jan 21, 2021 •

edited

Loading

Reminiscent Jan 25, 2021 •

edited

Loading

Reminiscent commented Jan 25, 2021 •

edited

Loading

Reminiscent Jan 27, 2021 •

edited

Loading