Support Parallel Order-Preserving Result Set Materialization by Mytherin · Pull Request #3700 · duckdb/duckdb

Mytherin · 2022-05-24T12:46:34Z

This PR adds support for parallel order-preserving result set materialization (fixes #2260, #2653, duckdb/duckdb-web#142).

This enables support full end-to-end parallelism, i.e. an entire query including result set materialization can now be executed fully in parallel. Previously the final pipeline (the pipeline materializing the result) was always executed in a single-threaded fashion. For most analytical queries this was not a problem, but for queries that involved large results or for queries that have a slow final pipeline this could become a significant bottleneck (see #2245).

Batch Indexes

The way the parallelism works in an order-preserving manner is that pipeline sources emit a "batch index". This is an index that relates to the current batch that is being processed. For example, in a regular table scan, you can imagine the batch index to be the row group index.

When assembling the result of a query, we can then use the fact that each thread processes data "in-order" to reconstruct the same result as single-threaded execution - it is only the batch indexes that might be out of sync between the different threads. This allows us to restore the original order without an additional sorting step - we can scan all the data in the result ordered by batch index.

This allows us not only to materialize result sets in parallel, but also perform operators that rely on maintaining order in parallel. For example, we can execute a LIMIT n clause (without a corresponding ORDER clause) in parallel now by having each thread materialize n tuples, and selecting the n tuples with the lowest batch indexes.

In order for this to work, sources need to implement the batch_index method, which is provided as an additional method in the PhysicalOperator, as well as an additional method in the TableFunction API. This PR implements this for built-in table scans, parquet scans and Pandas DataFrame scans. If a source does not support the batch_index method, the single-threaded option is used instead.

Result Set Collectors

In order to facilitate parallel result set materialization, a new class is added: the PhysicalResultCollector. This class is placed at the root of a query tree and functions as a regular sink does. However, instead of producing chunks again after its completion, it instead produces a QueryResult. When a materialized result is requested, the result collectors are used to construct the materialized query result.

Note that streaming query results are not yet parallelized currently.

In addition, a custom result collector can be provided. This is not used yet, but should be used in the future to allow e.g. directly constructing Pandas DataFrames in parallel, without having to first go to a materialized query result.

Preserve Insertion Order

In addition to the changes above, this PR introduces a new setting preserve_insertion_order. This setting defaults to true. When the setting is switched to false, the system will not care about preserving the insertion order and is free to re-order as much as the SQL standard allows. This means that e.g. a query with a LIMIT but without an ORDER BY will return non-deterministic results (i.e. SELECT * FROM tbl LIMIT 5 will return any 5 rows from tbl, and not necessarily the first 5 rows).

BuildPipelines

As a general clean-up of operators, the BuildPipelines method has been converted from a single giant method in the Executor to a method that can be overloaded by individual operators. This allows individual operators to more easily extend and modify how pipelines are constructed without having to do this all in one central place.

Benchmark

Running the following query results in the following timings (previous versions of DuckDB would always run at the single-threaded timing since the final pipeline could not be parallelized):

CREATE TABLE integers AS SELECT * FROM generate_series(0, 100000000, 1) tbl(i);
CREATE TABLE other_table AS SELECT 337 i UNION ALL SELECT 948247 UNION ALL SELECT 17797934;
SELECT * FROM integers WHERE i IN (SELECT * FROM other_table);

System	Time (s)
DuckDB 1T	0.53s
DuckDB 8T	0.07s
PostgreSQL	5.60s

…uce batch index and add support for parallel limit

… of materialized results

…erialization

hawkfish · 2022-05-24T17:13:41Z

benchmark/micro/limit/parquet_parallel_limit_glob.benchmark

+CREATE TABLE other_table AS SELECT 337 i UNION ALL SELECT 948247 UNION ALL SELECT 17797934 UNION ALL SELECT 99999998 UNION ALL SELECT 99999999;
+COPY (SELECT * FROM range(50000000) t(i)) TO '${BENCHMARK_DIR}/integers1.parquet';
+COPY (SELECT * FROM range(50000000, 100000000) t(i)) TO '${BENCHMARK_DIR}/integers2.parquet';
+CREATE VIEW integers AS SELECT * FROM '${BENCHMARK_DIR}/integers*.parquet';


Doesn't this rely on the order in which the globber returns the file names? My experience with Windows suggests that this can be fragile...

Yes that is an excellent point and one I already fixed in a test.

src/common/types/batched_chunk_collection.cpp

Mytherin added 30 commits May 12, 2022 11:25

Initial implementation of insert-order preserving parallelism: introd…

022f2a4

…uce batch index and add support for parallel limit

Add limit benchmark, and more tests for parallel limit

ffad349

Add separate RequiresBatchIndex modifier

e5d901e

WIP commit: move BuildPipelines into the physical operators themselves

a338456

Use correct pipeline_child

57c6e18

Pipeline scheduling: fix for union scheduling

2d1ab7d

PhysicalStreamingLimit working

2504570

Fixes for streaming limit - all working again

54e8a58

Add setting for preserve insertion order

318984b

WIP: add result collector interface to allow parallel materialization…

9f62125

… of materialized results

Add BatchCollector, collectors working now

30eaf75

Merge branch 'master' into parallelinsertorderpreserving

ad478ee

All tests working again

7324db4

Fix for single-file compilation

0b6654d

Missing includes

3aacf1c

More missing includes

474582b

Add support for order dependent operators

f81de36

Emit correct batch index in case of transaction-local appends

def59d0

Fixes for pipeline execution & caching

3f62006

Merge branch 'master' into parallelinsertorderpreserving

d6aa0df

Avoid unnecessarily killing parallelism

7745bf0

Use IN clause for these benchmarks as well

391aa12

Add transaction local test, plus benchmark for regular result set mat…

f2548f4

…erialization

Tests for parallel result set materialization

ba987f4

Expand materialization tests

6199032

Expanded LIMIT tests

ced3593

Support batch index in Parquet files + benchmarks and tests

5a112ae

Add batch index to pandas scan (wip)

55cc7c6

Remove the streaming limit for now

83303e2

Format

77ed9ca

Mytherin added 2 commits May 24, 2022 12:05

Add pandas limit test

04e155e

Tidy fix

ed7d86d

hawkfish reviewed May 24, 2022

View reviewed changes

src/common/types/batched_chunk_collection.cpp Show resolved Hide resolved

Fix benchmark to avoid dependence on glob ordering

01c4a2e

hawkfish approved these changes May 24, 2022

View reviewed changes

Mytherin merged commit a25b6e3 into duckdb:master May 25, 2022

Mytherin mentioned this pull request May 30, 2022

Should this query be utilizing parallel threads? #3737

Closed

Mytherin deleted the parallelinsertorderpreserving branch October 11, 2022 10:58

Mytherin mentioned this pull request Oct 25, 2022

Parallel order preserving CREATE TABLE AS and INSERT INTO #5082

Merged

Tmonster mentioned this pull request Apr 14, 2023

Add h2oai benchmark blog duckdb/duckdb-web#667

Merged

Mytherin mentioned this pull request May 3, 2023

Add Minimum Batch Index + Order Preserving Insertion Rework #7352

Merged

Tishj mentioned this pull request May 31, 2023

[Python] Construct pandas.DataFrame results in parallel #7751

Closed

Tishj mentioned this pull request Jan 16, 2024

[Execution] Parallel StreamQueryResult #10245

Merged

Mytherin mentioned this pull request Mar 7, 2024

Use batch limit only when limit + offset are small constants #11035

Merged

Tishj mentioned this pull request Mar 21, 2024

Performance issue with DuckDBPyRelation::FetchOne() and FetchMany() due to stream_result being true #11278

Closed

1 task

Tishj mentioned this pull request Apr 3, 2024

[StreamQueryResult] Batched variant of the StreamQueryResult collector #11494

Merged

Mytherin mentioned this pull request Jun 7, 2024

Avoid parallelizing LIMIT clauses when the query plan is simple #12433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Parallel Order-Preserving Result Set Materialization#3700

Support Parallel Order-Preserving Result Set Materialization#3700
Mytherin merged 33 commits intoduckdb:masterfrom
Mytherin:parallelinsertorderpreserving

Mytherin commented May 24, 2022 •

edited

Loading

Uh oh!

hawkfish May 24, 2022

Uh oh!

Mytherin May 24, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mytherin commented May 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Batch Indexes

Result Set Collectors

Preserve Insertion Order

BuildPipelines

Benchmark

Uh oh!

hawkfish May 24, 2022

Choose a reason for hiding this comment

Uh oh!

Mytherin May 24, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mytherin commented May 24, 2022 •

edited

Loading