ARROW-14191: [C++][Dataset] Dataset writes should respect backpressure #11286

westonpace · 2021-10-01T11:35:13Z

This PR adds the backpressure feature (introduced in ARROW-13611) to the dataset write node (introduced in ARROW-13542).

To note: The python test here is unfortunately a bit slow for two reasons. First, we don't make the "how many rows can the dataset writer hold before backpressure applies" option a configurable option (it's not clear in what circumstances a user would ever change it) and it defaults to 64Mi rows.

Second, there is no signal sent from the C++ side when backpressure has been applied. So we are forced to poll and guess when it seems we have stopped reading from the source.

I'm open to suggestions (e.g. don't include the test, make the test a large memory test or something so it doesn't always run, any ideas we can use to test this better, etc.)

github-actions · 2021-10-01T11:36:32Z

https://issues.apache.org/jira/browse/ARROW-14191

github-actions · 2021-10-01T11:36:33Z

⚠️ Ticket has no components in JIRA, make sure you assign one.

github-actions · 2021-10-01T11:36:34Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

…ng rows from an ExecPlan to disk This PR adds a write node. The write node takes in `FileSystemDatasetWriteOptions` and the projected schema and writes the incoming data to disk. It is a sink node but it is a bit different than the existing sink node. The existing sink node transferred batches via an AsyncGenerator but that puts ownership of the batches outside of the push-based flow of the exec plan. I added a new ConsumingSinkNode which consumes the batches as part of the push-based flow. This makes it possible to block the exec plan from finishing until all data has been written to disk. In addition this PR refines the AsyncTaskGroup a little. `WaitForTasksToFinish` was not a very clearly named method. Once called it actually transitioned the task group from a "top level tasks can be added" state to a "no more top level tasks can be added" state and that was not clear in the name. The new name (`End`) is hopefully more clear. This PR does not solve the backpressure problem. Instead a serial async task group was created. This will run all tasks in order (the default task group allows them to run in parallel) but not necessarily on the calling thread (i.e. unlike SerialTaskGroup we do not block on the AddTask method). This allows tasks to pile up in a queue and, if the write is slow, this will become a pile-up point which will eventually run out of memory (provided there is enough data being written). That problem is solved in a follow-up #11286 The `AsyncSerializedTaskGroup` and `AsyncTaskGroup` classes have very similar APIs but I did not create an interface / abstract base class because I don't yet envision any case where they would be interchangeable. The distinction is a "can these tasks run in parallel or not" and is not a performance / resource question. As a consequence of using the ExecPlan dataset writes no longer have reliable ordering. If you pass in three batches and they are all destined for the same file then the batches may be written in any order in the destination file. This is because the ExecPlan creates a thread task for each input batch and so they could arrive at the write node in any order. Closes #11017 from westonpace/feature/ARROW-13542-write-node Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

bkietz

This looks good in principle, just one nit on the test

python/pyarrow/tests/test_dataset.py

aocsa

This looks good to me, Just a couple of comments related to code style and tests.

aocsa · 2021-10-14T00:06:30Z

cpp/src/arrow/dataset/file_base.cc

maybe this is a good time to uniformize code styling as this is a class all the data members should follow name_ style.

Good point. I fixed all data members to use name_ style.

aocsa · 2021-10-14T01:15:37Z

python/pyarrow/tests/test_dataset.py

Just a comment, If python test is slow why don't write this test in C++. I think there is more control in the C++, and even we a test with large workload is achivable, or run test cases when something so it doesn't always run,

There are already tests in C++ for the scanner backpressure and the dataset writer backpressure. You are correct that we have more control. I was able to use the thread pool's "wait for idle" method to know when backpressure had been hit.

I wanted a python test to pull everything together and make sure it is actually being utilized correctly (I think it is easy sometimes for python to get missed due to a configuration parameter or something else). I'd be ok with removing this test but I don't think we need to add anything to C++. @bkietz thoughts?

I'd say this is sufficient for this PR

westonpace · 2021-10-15T11:17:27Z

@bkietz Any other concerns? Otherwise I will rebase tomorrow morning and merge if all looks good.

…was added in 3.9

ursabot · 2021-10-15T20:31:23Z

Benchmark runs are scheduled for baseline = 02f11b9 and contender = cf50b31. cf50b31 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.51% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.13% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

…ng rows from an ExecPlan to disk This PR adds a write node. The write node takes in `FileSystemDatasetWriteOptions` and the projected schema and writes the incoming data to disk. It is a sink node but it is a bit different than the existing sink node. The existing sink node transferred batches via an AsyncGenerator but that puts ownership of the batches outside of the push-based flow of the exec plan. I added a new ConsumingSinkNode which consumes the batches as part of the push-based flow. This makes it possible to block the exec plan from finishing until all data has been written to disk. In addition this PR refines the AsyncTaskGroup a little. `WaitForTasksToFinish` was not a very clearly named method. Once called it actually transitioned the task group from a "top level tasks can be added" state to a "no more top level tasks can be added" state and that was not clear in the name. The new name (`End`) is hopefully more clear. This PR does not solve the backpressure problem. Instead a serial async task group was created. This will run all tasks in order (the default task group allows them to run in parallel) but not necessarily on the calling thread (i.e. unlike SerialTaskGroup we do not block on the AddTask method). This allows tasks to pile up in a queue and, if the write is slow, this will become a pile-up point which will eventually run out of memory (provided there is enough data being written). That problem is solved in a follow-up apache#11286 The `AsyncSerializedTaskGroup` and `AsyncTaskGroup` classes have very similar APIs but I did not create an interface / abstract base class because I don't yet envision any case where they would be interchangeable. The distinction is a "can these tasks run in parallel or not" and is not a performance / resource question. As a consequence of using the ExecPlan dataset writes no longer have reliable ordering. If you pass in three batches and they are all destined for the same file then the batches may be written in any order in the destination file. This is because the ExecPlan creates a thread task for each input batch and so they could arrive at the write node in any order. Closes apache#11017 from westonpace/feature/ARROW-13542-write-node Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

github-actions bot added Component: C++ Component: GLib Component: Python Component: R Component: Ruby labels Oct 1, 2021

westonpace mentioned this pull request Oct 4, 2021

ARROW-13542: [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk #11017

Closed

westonpace force-pushed the feature/ARROW-13611--scanning-datasets-backpressure-with-write branch 2 times, most recently from 72aea77 to edeed63 Compare October 12, 2021 21:37

westonpace marked this pull request as ready for review October 12, 2021 22:16

westonpace requested a review from bkietz October 12, 2021 22:16

westonpace removed Component: R Component: Ruby Component: GLib labels Oct 12, 2021

bkietz requested changes Oct 13, 2021

View reviewed changes

python/pyarrow/tests/test_dataset.py Outdated Show resolved Hide resolved

aocsa reviewed Oct 14, 2021

View reviewed changes

westonpace force-pushed the feature/ARROW-13611--scanning-datasets-backpressure-with-write branch from 871b282 to b55c34a Compare October 14, 2021 11:54

westonpace requested a review from bkietz October 14, 2021 11:56

aocsa approved these changes Oct 14, 2021

View reviewed changes

westonpace added 6 commits October 15, 2021 09:37

ARROW-13611: Add backpressure to the dataset write node

78d087a

ARROW-13611: Lint

942cfde

ARROW-13611: Added a python test case

6f9dc19

ARROW-13611: Lint

1104a54

ARROW-13611: Removing optional argument to semaphore release as that …

441299d

…was added in 3.9

ARROW-13611: Addressed comments from PR

35d512f

westonpace force-pushed the feature/ARROW-13611--scanning-datasets-backpressure-with-write branch from b55c34a to 35d512f Compare October 15, 2021 19:38

westonpace closed this in cf50b31 Oct 15, 2021

westonpace mentioned this pull request Oct 18, 2021

ARROW-14192: [C++][Dataset] Backpressure broken on ordered scans #11294

Closed

westonpace deleted the feature/ARROW-13611--scanning-datasets-backpressure-with-write branch January 6, 2022 08:15

asfimport mentioned this pull request Oct 17, 2021

[C++][Dataset] Dataset writes should respect backpressure #29776

Closed

ARROW-14191: [C++][Dataset] Dataset writes should respect backpressure #11286

ARROW-14191: [C++][Dataset] Dataset writes should respect backpressure #11286

Uh oh!

Conversation

westonpace commented Oct 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 1, 2021

Uh oh!

github-actions bot commented Oct 1, 2021

Uh oh!

github-actions bot commented Oct 1, 2021

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aocsa left a comment

Choose a reason for hiding this comment

Uh oh!

aocsa Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

westonpace Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

aocsa Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

westonpace Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

bkietz Oct 14, 2021

Choose a reason for hiding this comment

Uh oh!

westonpace commented Oct 15, 2021

Uh oh!

ursabot commented Oct 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

westonpace commented Oct 1, 2021 •

edited

Loading

ursabot commented Oct 15, 2021 •

edited

Loading