Skip to content

Read-in-order over query plan#42829

Merged
KochetovNicolai merged 35 commits intomasterfrom
read-in-order-from-query-plan
Nov 11, 2022
Merged

Read-in-order over query plan#42829
KochetovNicolai merged 35 commits intomasterfrom
read-in-order-from-query-plan

Conversation

@KochetovNicolai
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implement read-in-order optimization on top of query plan. It is enabled by default. Set query_plan_read_in_order = 0 to use previous AST-based version.

@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-improvement Pull request with some product improvements label Oct 31, 2022
@nickitat nickitat self-assigned this Oct 31, 2022
@kitaisreal kitaisreal self-assigned this Nov 1, 2022
LimitBy
Expression (Before LIMIT BY)
Union
Union
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed after liftUpUnion

Comment on lines +16 to +17
SELECT a FROM t_max_rows_to_read WHERE a > 10 ORDER BY a LIMIT 5 SETTINGS max_rows_to_read = 12;
SELECT a FROM t_max_rows_to_read WHERE a = 10 OR a = 20 SETTINGS max_rows_to_read = 12;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, both of this queries works.

Comment on lines -62 to -63
Sorting (Stream): a ASC
Sorting (Stream): a ASC
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, read-in-order optimization happens after plan is built.
Current implementation does not propagate sorting property upwards, so it was changed only for reading step.

-- enable optimization -> sorting order is propagated from subquery -> merge sort
-- QUERY: set optimize_sorting_by_input_stream_properties=1;set max_threads=1;EXPLAIN PIPELINE SELECT a FROM (SELECT a FROM optimize_sorting) ORDER BY a
MergeSortingTransform
MergingSortedTransform 3 → 1
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the comment, there should have been MergingSortedTransform initially. MergeSortingTransform is a full sort.
It was magically fixed.

Sorting (Sorting for ORDER BY)
Sorting (Global): a ASC
Sorting (Stream): a ASC
Sorting (Chunk): a ASC, b ASC
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an incorrect property, but it won't be used cause plan is already built.
We should probably fix it later.

ReadFromStorage
Header: dummy UInt8
Expression
Union
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed after liftUpUnion

@KochetovNicolai KochetovNicolai marked this pull request as ready for review November 4, 2022 17:40
@KochetovNicolai
Copy link
Copy Markdown
Member Author

Tsan should be fixed in #43009

@kitaisreal
Copy link
Copy Markdown
Contributor

Something is wrong with performance tests, probably it does not work for such queries https://s3.amazonaws.com/clickhouse-test-reports/42829/7ac258c2a77fa813052ac67cc67977b05368fa1d/performance_comparison_[4/4]/report.html.
We need to add a lot of additional performance tests to cover all scenarious.

Copy link
Copy Markdown
Member

@CurtizJ CurtizJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general ok, but let's wait for review from someone who is more familiar with code near plan optimizations.

return reading;
}

if (auto * merge = typeid_cast<ReadFromMerge *>(step))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be supported for other storages like Buffer and MaterializedView, because they have a ReadFromMergeTree inside plan, which is built to read from those storages, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For MV it is naturally supported. For Buffer I had to support a simple optimization with union.
Anyway, we had a test for such engines.

using FixedColumns = std::unordered_set<const ActionsDAG::Node *>;

/// Right now we find only simple cases like 'and(..., and(..., and(column = value, ...), ...'
void appendFixedColumnsFromFilterExpression(const ActionsDAG::Node & filter_expression, FixedColumns & fiexd_columns)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo fiexd_columns

else if (name == "equals")
{
const ActionsDAG::Node * maybe_fixed_column = nullptr;
bool is_singe = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo is_singe

if (!sort_column_node)
break;

if (!dag)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible? Shouldn't we build dag just for input columns even without any functions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible. This DAG is from SELECT. For example, SELECT * FROM tab ORDER BY a, b will not have any expression steps, and DAG will be empty.

Comment on lines +364 to +369
auto it = matches.find(child);
if (it == matches.end())
{
stack.push(Frame{child, {}});
break;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a comment why do we do this if node is not in matches.


bool found_all_children = true;
for (const auto * child : frame.mapped_children)
if (!child)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it possible to get nullptr in mapped_children?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will write a comment.

/// This structure stores a node mapping from one DAG to another.
/// The rule is following:
/// * Input nodes are mapped by name.
/// * Function is mapped to function if all children are mapped and function names are same.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that function is mapped to itself?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but from different DAGs.
I mean, this is the idea - to find equal calculations if DAGs

@KochetovNicolai
Copy link
Copy Markdown
Member Author

KochetovNicolai commented Nov 8, 2022

Lol, one query from order_by_read_in_order perftest became slower cause read-in-order actually did not work for it

EXPLAIN PIPELINE
SELECT *
FROM datasets.hits_100m_obfuscated
WHERE UserID = 1988954671305023629
ORDER BY
    CounterID ASC,
    EventDate ASC
LIMIT 100
SETTINGS query_plan_read_in_order = 1

Query id: 7a3efb48-35f5-428d-a1f6-51668b44a779

┌─explain────────────────────────┐
│ (Expression)                   │
│ ExpressionTransform            │
│   (Limit)                      │
│   Limit                        │
│     (Sorting)                  │
│       (Expression)             │
│       ExpressionTransform      │
│         (ReadFromMergeTree)    │
│         MergeTreeInOrder 0 → 1 │
└────────────────────────────────┘


EXPLAIN PIPELINE
SELECT *
FROM datasets.hits_100m_obfuscated
WHERE UserID = 1988954671305023629
ORDER BY
    CounterID ASC,
    EventDate ASC
LIMIT 100
SETTINGS query_plan_read_in_order = 0

Query id: 73f61e3e-89ac-488c-97b7-2522428895b7

┌─explain─────────────────────────────┐
│ (Expression)                        │
│ ExpressionTransform                 │
│   (Limit)                           │
│   Limit                             │
│     (Sorting)                       │
│     MergingSortedTransform 16 → 1   │
│       (Expression)                  │
│       ExpressionTransform × 16      │
│         (ReadFromMergeTree)         │
│         MergeTreeInOrder × 16 0 → 1 │
└─────────────────────────────────────┘

Looks like ordinary read worked faster cause condition on pk is good and we full sort not so many data. parallel execution wins in this case.

@KochetovNicolai
Copy link
Copy Markdown
Member Author

KochetovNicolai commented Nov 8, 2022

And the same for a query from monotonous_order_by

EXPLAIN PIPELINE
SELECT *
FROM
(
    SELECT
        CounterID,
        EventDate
    FROM datasets.hits_100m_obfuscated
)
ORDER BY
    toFloat32(toFloat64(toFloat32(toFloat64(CounterID)))) DESC,
    toFloat32(toFloat64(toFloat32(toFloat64(EventDate)))) ASC
SETTINGS max_threads = 3, query_plan_read_in_order = 1

Query id: 0d9bbfc9-2e36-4a91-910e-9602d736542f

┌─explain────────────────────────────────────┐
│ (Expression)                               │
│ ExpressionTransform                        │
│   (Sorting)                                │
│   FinishSortingTransform                   │
│     PartialSortingTransform                │
│       MergingSortedTransform 3 → 1         │
│         (Expression)                       │
│         ExpressionTransform × 3            │
│           (ReadFromMergeTree)              │
│           ReverseTransform                 │
│             MergeTreeReverse 0 → 1         │
│               ReverseTransform             │
│                 MergeTreeReverse 0 → 1     │
│                   ReverseTransform         │
│                     MergeTreeReverse 0 → 1 │
└────────────────────────────────────────────┘

15 rows in set. Elapsed: 0.003 sec. 

EXPLAIN PIPELINE
SELECT *
FROM
(
    SELECT
        CounterID,
        EventDate
    FROM datasets.hits_100m_obfuscated
)
ORDER BY
    toFloat32(toFloat64(toFloat32(toFloat64(CounterID)))) DESC,
    toFloat32(toFloat64(toFloat32(toFloat64(EventDate)))) ASC
SETTINGS max_threads = 3, query_plan_read_in_order = 0

Query id: de260a83-1190-4f6f-9f4f-2ae530aaa21a

┌─explain───────────────────────────────┐
│ (Expression)                          │
│ ExpressionTransform                   │
│   (Sorting)                           │
│   MergingSortedTransform 3 → 1        │
│     MergeSortingTransform × 3         │
│       LimitsCheckingTransform × 3     │
│         PartialSortingTransform × 3   │
│           (Expression)                │
│           ExpressionTransform × 3     │
│             (ReadFromMergeTree)       │
│             MergeTreeThread × 3 0 → 1 │
└───────────────────────────────────────┘

But in this case it's harder to explain why. Probably, computation of toFloat32(toFloat64(toFloat32(toFloat64( is slow, and parallel execution wins again. Also, optimized query need to add FinishSorting to sort by EventDate.

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Nov 8, 2022

Lol, one query from order_by_read_in_order perftest became slower cause read-in-order actually did not work for it

@KochetovNicolai

But in the second case:

EXPLAIN PIPELINE
SELECT *
FROM datasets.hits_100m_obfuscated
WHERE UserID = 1988954671305023629
ORDER BY
    CounterID ASC,
    EventDate ASC
LIMIT 100
SETTINGS query_plan_read_in_order = 0

Query id: 73f61e3e-89ac-488c-97b7-2522428895b7

┌─explain─────────────────────────────┐
│ (Expression)                        │
│ ExpressionTransform                 │
│   (Limit)                           │
│   Limit                             │
│     (Sorting)                       │
│     MergingSortedTransform 161   │
│       (Expression)                  │
│       ExpressionTransform × 16      │
│         (ReadFromMergeTree)         │
│         MergeTreeInOrder × 16 01 │
└─────────────────────────────────────┘

reading in order also worked, because we have MergeTreeInOrder and MergingSortedTransform instead of MergeTreeThread, MergeSorting and PartialSorting. The difference is in number of threads.

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Nov 8, 2022

The case with monotonous_order_by is reasonable. I think the cause is that sorting with FinishSorting without or with high limit is never faster than regular sorting, because it has the same algorithmic complexity, but slower reading method.

In that case reading in order was not enable previously, because the chain of monotonic function was not supported.

Copy link
Copy Markdown
Contributor

@kitaisreal kitaisreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is good. Just added small comments for clarification.

&& !query.final()
&& join_allow_read_in_order;

if (storage && optimize_read_in_order)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is removed ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a fix of a bug from #39157
I was reproduced again with a new implementation, and I had to fix it in a different way.

if (getContext()->getSettingsRef().allow_experimental_analyzer)
{
InterpreterSelectQueryAnalyzer interpreter(ast.getExplainedQuery(), options, getContext());
context = interpreter.getContext();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we use context from interpreter, not context from EXPLAIN QUERY ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a stupid bug that some settings were not applied for EXPLAIN.
InterpreterSelectQuery copied context and changed settings only for it, and the following plan.optimize did not see this settings change.
I don't know if this is applied to InterpreterSelectQueryAnalyzer as well, but I think it's better to copy context anyway.

for (const auto & key_name : key_names)
order_descr.emplace_back(key_name);

SortingStep::Settings sort_settings;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to add constructor in SortingStep::Settings from Settings. Currently such construction is worse then was before, because you can easy do not initialize some setting, and there will be no error.


const Settings & settings = query_context->getSettingsRef();

SortingStep::Settings sort_settings;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy pases several times in Interpreter and Planner, need to be extracted in separate method, or added constructor.

size_t tryDistinctReadInOrder(QueryPlan::Node * node, QueryPlan::Nodes & nodes);
size_t tryDistinctReadInOrder(QueryPlan::Node * node);

/// Put some steps under union, so that plan optimisation could be applied to union parts separately.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need example in comment. It is impossible to understand what some steps are and what we can expect from this function to do with query plan, without reading its internals.

using Matches = std::unordered_map<const ActionsDAG::Node *, Match>;
};

MatchedTrees::Matches matchTrees(const ActionsDAG & inner_dag, const ActionsDAG & outer_dag)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is generally well written, we need to just add some internal comments, for each step that follows documentation above.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider moving it somewhere to common interface, but likely will do it later (after aggregation-in-order)

///
/// So far, 0 means any direction is possible. It is ok for constant prefix.
int read_direction = 0;
size_t next_descr_column = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to rename next_descr_column into next_description_column.

while (next_descr_column < description.size() && next_sort_key < sorting_key_columns.size())
{
const auto & sorting_key_column = sorting_key_columns[next_sort_key];
const auto & descr = description[next_descr_column];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to rename descr into sort_column_description.

const auto & sorting_key = reading->getStorageMetadata()->getSortingKey();
const auto & sorting_key_columns = sorting_key.column_names;

return buildInputOrderInfo(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably if we decide to split this function call in multiple lines, better to have each function argument on the same line.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or on the different lines.


/// This optimisation is obsolete and will be removed.
/// optimizeReadInOrder covers it.
size_t tryReuseStorageOrderingForWindowFunctions(QueryPlan::Node * parent_node, QueryPlan::Nodes & /*nodes*/)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remove this. It should be supported by default from code above, and even better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to have it till next stable release just in case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants