Skip to content

Conversation

@ZMZ91
Copy link
Contributor

@ZMZ91 ZMZ91 commented Jun 27, 2021

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@lidavidm lidavidm changed the title add support for take implementation on dense union type ARROW-13005: [C++] Add support for take implementation on dense union type Jun 28, 2021
@github-actions
Copy link

@ZMZ91
Copy link
Contributor Author

ZMZ91 commented Jun 29, 2021

ARROW-13005: [C++] Implementation of take function for dense union data type.

@pitrou
Copy link
Member

pitrou commented Jun 29, 2021

@ZMZ91 Thanks for tackling this slightly delicate issue. I've posted two comments about the chosen approach.

@ZMZ91
Copy link
Contributor Author

ZMZ91 commented Jul 7, 2021

Hi @pitrou and @bkietz, just update the pr due to your comments. Please help check the latest code. Thanks a lot.

@ZMZ91
Copy link
Contributor Author

ZMZ91 commented Jul 8, 2021

Thank you @bkietz. All comments resolved. Please help check.

@ZMZ91
Copy link
Contributor Author

ZMZ91 commented Jul 9, 2021

Hi @bkietz and @pitrou, just a question not quite related to this pr and about dense union array. I've seen the limitation for array data is under 2GB. Does it apply to nested arrays or just the sub arrays inside? For example, dense union array, it has child id array, value offset array and a bunch of child arrays. Each of them should be under 2GB. And is it restricted that the sum under 2GB?

@bkietz
Copy link
Member

bkietz commented Jul 9, 2021

The 2GB limitation applies to buffers and is type specific. It derives from usage of signed 32 bit integers to encode offsets, so for example a StringArray cannot have more than 2GB total character data because otherwise we could not express the final offset. IIUC it's allowed for a child of the union to be an Int8Array with 2**33 elements but a dense union's offsets could not point to most of those elements.

In short, this is not a constraint you need to enforce in this PR.

@ZMZ91 ZMZ91 requested a review from bkietz July 10, 2021 02:07
@ZMZ91
Copy link
Contributor Author

ZMZ91 commented Jul 14, 2021

Hi @bkietz and @pitrou, could you help check what else to update in this pr? Or we may merge it? Thanks.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I'll merge on green CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants