Skip to content

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Aug 3, 2021

Adds a sink node that accumulates, sorts, and emits sorted batches. This is a sink node as currently we don't have a good design for handling order-dependent operations in the rest of the pipeline.

@github-actions
Copy link

github-actions bot commented Aug 3, 2021

@lidavidm
Copy link
Member Author

lidavidm commented Aug 3, 2021

Draft for now since it'll require rebasing on top of #10793/ARROW-13482

@lidavidm
Copy link
Member Author

lidavidm commented Aug 6, 2021

Now reworked to tag batches instead of being a sink node. Next, I can try to implement a simple order-dependent aggregate kernel and see how far that takes us.

@lidavidm
Copy link
Member Author

lidavidm commented Aug 6, 2021

Added a hash_arg_min_max kernel that makes use of the OrderByNode's tag. See ARROW-12873.

@lidavidm
Copy link
Member Author

Rebased on top of the registry PR.

@lidavidm lidavidm marked this pull request as ready for review August 10, 2021 13:58
@lidavidm lidavidm changed the title ARROW-13540: [C++] Add order by sink node ARROW-13540: [C++] Add order by node Aug 10, 2021
@lidavidm
Copy link
Member Author

lidavidm commented Aug 10, 2021

TODO: revert back to making OrderBy a SinkNode and split arg_min_max and related things into a separate JIRA

@lidavidm lidavidm changed the title ARROW-13540: [C++] Add order by node ARROW-13540: [C++] Add order by sink node Aug 11, 2021
@lidavidm
Copy link
Member Author

Converted back to a sink node.

@bkietz bkietz self-requested a review August 11, 2021 15:59
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, overall this looks solid

Comment on lines 115 to 119
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment needs to be refactored

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: using this above please

Suggested change
Finishes(ResultWith(::testing::ElementsAreArray(expected))));
Finishes(ResultWith(ElementsAreArray(expected))));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be worthwhile to extract TableFromExecBatches into exec/util.h

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this would be more readable as

Suggested change
auto maybe_sorted = SortData();
Status st = DoFinish();
if (ErrorIfNotOk(st)) {
producer_.Push(std::move(st));
}
SinkNode::Finish();
}
Status DoFinish() {
ARROW_ASSIGN_OR_RAISE(auto sorted, SortData());
//...

.AddToPlan(plan.get()));

ASSERT_THAT(StartAndCollect(plan.get(), sink_gen),
Finishes(ResultWith(::testing::ElementsAreArray({ExecBatchFromJSON(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Finishes(ResultWith(::testing::ElementsAreArray({ExecBatchFromJSON(
Finishes(ResultWith(ElementsAreArray({ExecBatchFromJSON(

#include "arrow/compute/exec/exec_plan.h"

#include <mutex>
#include <unordered_map>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <unordered_map>

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants