ARROW-13227: [Documentation][Compute] Document ExecNode #11309

bkietz · 2021-10-04T16:35:37Z

No description provided.

github-actions · 2021-10-04T16:36:03Z

https://issues.apache.org/jira/browse/ARROW-13227

pitrou

Very nice doc!

docs/source/cpp/simple_graph.svg

docs/source/cpp/streaming_execution.rst

docs/source/cpp/compute.rst

pitrou · 2021-10-06T16:47:38Z

docs/source/cpp/streaming_execution.rst

+compute functions <invoking compute functions>` is not feasible
+in either memory or computation time. Doing so causes all intermediate
+data to be fully materialized. To facilitate arbitrarily large inputs
+and more efficient resource usage, arrow also provides a streaming query


Say "Arrow C++" rather than "arrow" here?

If you prefer, but I think that's implicit in the declared namespace of this doc

docs/source/cpp/streaming_execution.rst

edponce · 2021-10-14T19:15:55Z

docs/source/cpp/streaming_execution.rst

+
+.. image:: simple_graph.svg
+
+:class:`ExecNode` is provided to reify the graph of operations in a query.


Maybe state explicitly that ExecNode represent the nodes of the graph which can perform processing, ....

I think that's implied by the name ExecNode and the following sentence. I'm not sure how to make this more clear without making the sentence confusingly wordy

docs/source/cpp/streaming_execution.rst

westonpace

This is a good addition. In the future we might want to add a bit more introduction. At the moment a fairly naive user will probably be a bit lost at "is provided to reify the graph of operations in a query" (e.g. "what do graphs have to do with streaming execution"?)

Although a naive user is admittedly probably going to be using SQL or some other front end. It probably still wouldn't hurt to set the context a little. I don't think we need to address that today though. This adds valuable information and it would be good to get it in place for the 6.0.0 release.

My comments are all nits, so take them or leave them. This could be merged as is and I would be content.

docs/source/cpp/streaming_execution.rst

westonpace · 2021-10-15T00:04:03Z

docs/source/cpp/streaming_execution.rst

+    if (need_stop) {
+      // stop all nodes in the graph
+      plan->StopProducing();
+    }


This is kind of confusing. What is need_stop? Why would it be used here? If this is an example of how you would cancel a running plan then I think the pseudocode would be more along the lines of...

while (!plan->finished() && !user_requested_cancellation) { WaitForSignal(); } if (!plan->finished()) { plan->StopProducing(); }

...but I'm not sure that is any more clear. Maybe it is easiest to just remove this block.

That's fair. I'll make this a bit more clear by phrasing it as an optional callback

westonpace · 2021-10-15T00:13:18Z

docs/source/cpp/streaming_execution.rst

+to the default registry with the name ``"scan"`` by calling
+``arrow::dataset::internal::Initialize()``::


Technically you also need to call arrow::dataset::internal::Initialize() before you use the write factory used above.

I've added more calls to Initialize() so that any of these snippets should be well formed

Ideally we can resolve ARROW-13773 and remove these :)

westonpace · 2021-10-15T00:15:56Z

docs/source/cpp/streaming_execution.rst

+  :class:`ExecNodeOptions`.
+
+:struct:`Declaration`
+  ``dplyr``-inspired helper for efficient construction of an :class:`ExecPlan`.


Unless the user has an R background the dplyr reference isn't helping much. Maybe just "A helper for efficient construction of sequences of ExecNodes"

I think giving something to look up is preferable; if anyone isn't familiar the worst that can happen is they read the example to see what Declaration can be used for

westonpace · 2021-10-15T00:16:38Z

docs/source/cpp/streaming_execution.rst

+    MakeExecNode("write", plan.get(), {project_node},
+                 WriteNodeOptions{/*base_dir=*/"/dat", /*...*/});
+
+:struct:`Declaration` is a `dplyr <https://dplyr.tidyverse.org>`-inspired


Same comment about dplyr here.

westonpace · 2021-10-15T00:18:39Z

docs/source/cpp/streaming_execution.rst

+  ``dplyr``-inspired helper for efficient construction of an :class:`ExecPlan`.
+
+:struct:`ExecBatch`
+  A lightweight container for a single chunk of data in the Arrow format. In


Suggested change

A lightweight container for a single chunk of data in the Arrow format. In

A lightweight container for columns of data in the Arrow format. In

"A single chunk" makes me think "contiguous"

ExecBatch's columns are a contiguous chunk- if you have a float32 column with no nulls that's stored in a single buffer.

Right, but you have multiple columns and each one is its own set of contiguous chunks. I.e. it's not a pandas block.

docs/source/cpp/streaming_execution.rst

Co-authored-by: Weston Pace <weston.pace@gmail.com>

bkietz · 2021-10-15T18:42:07Z

+1, merging

ursabot · 2021-10-15T18:44:22Z

Benchmark runs are scheduled for baseline = 444cdac and contender = 8650c23. 8650c23 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.18% ⬆️0.18%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

ARROW-13227: [Documentation][Compute] Document ExecNode

7f887da

bkietz requested a review from pitrou October 4, 2021 16:35

github-actions bot added the Component: C++ label Oct 4, 2021

apache deleted a comment from github-actions bot Oct 4, 2021

pitrou reviewed Oct 6, 2021

View reviewed changes

bkietz added 2 commits October 12, 2021 15:42

SVG typo

60fae24

address review comments

3c9030b

edponce suggested changes Oct 14, 2021

View reviewed changes

edponce reviewed Oct 14, 2021

View reviewed changes

docs/source/cpp/streaming_execution.rst Outdated Show resolved Hide resolved

arrow -> Arrow

2f31d13

westonpace approved these changes Oct 15, 2021

View reviewed changes

bkietz and others added 9 commits October 15, 2021 09:57

add seealso for push vs pull model paper

e1fb731

Update docs/source/cpp/streaming_execution.rst

5471a80

Co-authored-by: Weston Pace <weston.pace@gmail.com>

Update docs/source/cpp/streaming_execution.rst

5357656

Co-authored-by: Weston Pace <weston.pace@gmail.com>

Update docs/source/cpp/streaming_execution.rst

6a23c68

Co-authored-by: Weston Pace <weston.pace@gmail.com>

add more calls to dataset::Initialize(), clarify usage of StopProducing

b0c8813

Update docs/source/cpp/streaming_execution.rst

eddb8d9

Co-authored-by: Weston Pace <weston.pace@gmail.com>

Update docs/source/cpp/streaming_execution.rst

b0656bc

Co-authored-by: Weston Pace <weston.pace@gmail.com>

Update docs/source/cpp/streaming_execution.rst

39c6c53

Co-authored-by: Weston Pace <weston.pace@gmail.com>

Update docs/source/cpp/streaming_execution.rst

bdf8183

Co-authored-by: Weston Pace <weston.pace@gmail.com>

bkietz closed this in 8650c23 Oct 15, 2021

bkietz deleted the 13227-Document-ExecNode-ExecPla branch October 15, 2021 18:43

asfimport mentioned this pull request Oct 17, 2021

[C++][Compute] Document ExecNode, ExecPlan #18728

Closed


		.. image:: simple_graph.svg

		:class:`ExecNode` is provided to reify the graph of operations in a query.

		to the default registry with the name ``"scan"`` by calling
		``arrow::dataset::internal::Initialize()``::

	A lightweight container for a single chunk of data in the Arrow format. In
	A lightweight container for columns of data in the Arrow format. In

ARROW-13227: [Documentation][Compute] Document ExecNode #11309

ARROW-13227: [Documentation][Compute] Document ExecNode #11309

Uh oh!

Conversation

bkietz commented Oct 4, 2021

Uh oh!

github-actions bot commented Oct 4, 2021

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bkietz commented Oct 15, 2021

Uh oh!

ursabot commented Oct 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ursabot commented Oct 15, 2021 •

edited

Loading