Skip to content

[Data] Replace BlockAccessor.append_column with BlockAccessor.fill_column#52112

Merged
bveeramani merged 6 commits intomasterfrom
arrow-column
Apr 9, 2025
Merged

[Data] Replace BlockAccessor.append_column with BlockAccessor.fill_column#52112
bveeramani merged 6 commits intomasterfrom
arrow-column

Conversation

@bveeramani
Copy link
Copy Markdown
Member

@bveeramani bveeramani commented Apr 8, 2025

Why are these changes needed?

If you're using PyArrow 12 or earlier and pass in a list of scalars to BlockAccessor.append_column, the method raises an error. To address this issue, this PR re-implements the method.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani requested a review from a team as a code owner April 8, 2025 21:25
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Apr 8, 2025

return self._table.append_column(name, [data])
array = pa.nulls(len(self._table), type=type)
array = pc.fill_null(array, value)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value have to be converted to Pyarrow Scalar

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of explicitly converting value to a scalar rather than letting PyArrow internally handle the conversion?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work w/o converting to scalar?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We test both cases in test_arrow_block::test_fill_column

f"`{self.__class__.__name__}.append_column()` doesn't support "
"array-like data."
)
import pyarrow as pa
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a top-level pyarrow import already, no need for local one

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani enabled auto-merge (squash) April 9, 2025 06:41
@github-actions github-actions bot disabled auto-merge April 9, 2025 06:41
@bveeramani bveeramani merged commit 49f7db6 into master Apr 9, 2025
5 checks passed
@bveeramani bveeramani deleted the arrow-column branch April 9, 2025 23:01
han-steve pushed a commit to han-steve/ray that referenced this pull request Apr 11, 2025
…_column` (ray-project#52112)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

If you're using PyArrow 12 or earlier and pass in a list of scalars to
`BlockAccessor.append_column`, the method raises an error. To address
this issue, this PR re-implements the method.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Steve Han <stevehan2001@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants