Skip to content

GH-33500: [Python] add Table.to/from_struct_array#38520

Merged
jorisvandenbossche merged 12 commits intoapache:mainfrom
judahrand:struct-array
Jan 8, 2024
Merged

GH-33500: [Python] add Table.to/from_struct_array#38520
jorisvandenbossche merged 12 commits intoapache:mainfrom
judahrand:struct-array

Conversation

@judahrand
Copy link
Copy Markdown
Contributor

@judahrand judahrand commented Oct 30, 2023

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@judahrand judahrand changed the title Add Table.to/from_struct_array GH-33500 [python] add Table.to/from_struct_array Oct 30, 2023
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #33500 has been automatically assigned in GitHub to PR creator.

@judahrand judahrand force-pushed the struct-array branch 3 times, most recently from 3739dc8 to 9cd3212 Compare October 30, 2023 23:23
@judahrand judahrand changed the title GH-33500 [python] add Table.to/from_struct_array GH-33500 [Python] add Table.to/from_struct_array Oct 30, 2023
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #33500 has been automatically assigned in GitHub to PR creator.

@kou kou changed the title GH-33500 [Python] add Table.to/from_struct_array GH-33500: [Python] add Table.to/from_struct_array Oct 31, 2023
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #33500 has been automatically assigned in GitHub to PR creator.

@judahrand
Copy link
Copy Markdown
Contributor Author

@jorisvandenbossche I think this is ready for a review whenever you have the time.

@judahrand
Copy link
Copy Markdown
Contributor Author

@kou Do you know if there is someone else that I could request a review from? It'd be good to get this merged if possible.

Copy link
Copy Markdown
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@AlenkaF What do you think about this?

Comment on lines +900 to +911
def test_table_from_struct_array():
struct_array = pa.array(
[{"ints": 1}, {"floats": 1.0}],
type=pa.struct([("ints", pa.int32()), ("floats", pa.float32())]),
)
result = pa.Table.from_struct_array(struct_array)
assert result.equals(pa.Table.from_arrays(
[
pa.array([1, None], type=pa.int32()),
pa.array([None, 1.0], type=pa.float32()),
], ["ints", "floats"]
))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test_table_from_struct_array_chunked_array() like this?

Copy link
Copy Markdown
Contributor Author

@judahrand judahrand Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +914 to +927
def test_table_to_struct_array():
table = pa.Table.from_arrays(
[
pa.array([1, None], type=pa.int32()),
pa.array([None, 1.0], type=pa.float32()),
], ["ints", "floats"]
)
result = table.to_struct_array()
assert result.equals(pa.chunked_array(
pa.array(
[{"ints": 1}, {"floats": 1.0}],
type=pa.struct([("ints", pa.int32()), ("floats", pa.float32())]),
),
))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test with max_chunksize?

Copy link
Copy Markdown
Contributor Author

@judahrand judahrand Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added awaiting review Awaiting review awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Nov 13, 2023
Copy link
Copy Markdown
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!
Just one nit comment from me, otherwise it looks good to me 👍

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review awaiting merge Awaiting merge labels Nov 13, 2023
@jorisvandenbossche jorisvandenbossche removed the awaiting committer review Awaiting committer review label Dec 1, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Dec 11, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 12, 2023

def to_struct_array(self, max_chunksize=None):
"""
Convert to a struct array.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can be more explicit that it will be a chunked array of struct type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Dec 12, 2023
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 12, 2023
@jorisvandenbossche jorisvandenbossche merged commit 60b89ff into apache:main Jan 8, 2024
@jorisvandenbossche jorisvandenbossche removed the awaiting change review Awaiting change review label Jan 8, 2024
@jorisvandenbossche
Copy link
Copy Markdown
Member

Thanks @judahrand!

@judahrand judahrand deleted the struct-array branch January 8, 2024 16:11
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 60b89ff.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
### Rationale for this change

### What changes are included in this PR?

### Are these changes tested?

### Are there any user-facing changes?

* Closes: apache#33500

Lead-authored-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Table.from_struct_array function

4 participants