[Data] Fix TensorArray to Arrow tensor conversion#59449
[Data] Fix TensorArray to Arrow tensor conversion#59449alexeykudinkin merged 20 commits intoray-project:masterfrom
TensorArray to Arrow tensor conversion#59449Conversation
…ct ndarrays; Abstracted `AVSTA._ravel_tensors` Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request addresses an issue with converting TensorArray to Arrow tensors, particularly for data originating from pandas with nullable types. The core changes involve adding a specific handling path for np.object_ dtype arrays in ArrowTensorArray._from_numpy, refactoring tensor raveling logic into a reusable _ravel_tensors method, and updating an API call to use a keyword argument for clarity. The changes are logical and well-targeted. My review includes suggestions to address a TODO for ensuring correctness, a minor performance optimization, and a recommendation to add assertions to the new test case to make it more robust.
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
This reverts commit b031dfd. Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
TensorArray to Arrow tensor conversionTensorArray to Arrow tensor conversion
srinathk10
left a comment
There was a problem hiding this comment.
LGTM. Tests need fixing
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Cleaning up useless permutations Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a long-standing issue with converting Pandas tensors containing null values into Arrow tensors. The core of the fix, which involves cycling object-dtyped NumPy arrays through Pyarrow to handle nulls correctly, is sound and well-implemented in the new _ensure_scalar_ndarray function.
The accompanying changes are also positive:
- The bug fix in
_is_ndarray_variable_shaped_tensoris correct and crucial. - Refactoring the tensor raveling logic into
_ravel_tensorsimproves code clarity and maintainability. - The API is improved by making
column_namea keyword-only argument inArrowTensorArray.from_numpy. - The new tests in
test_tensor_extension.pyprovide good coverage for the fix.
I've left a few minor comments regarding typos in comments and test code.
One thing to note is the removal of tensor_format parametrization from several tests. While this simplifies the test suite, it seems to reduce test coverage for the v1 tensor format, as the tests will now only run against the default v2 format. This might be intentional, but it would be good to confirm.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
## Description This change addresses a long-standing problem when Pandas tensors holding null values couldn't be converted into Arrow ones. More details are captured in ray-project#59445. Following changes are made to address that: - Fixed `_is_ndarray_variable_shaped_tensor` - Numpy tensors with `dtype='o'` are cycled t/h Pyarrow to be converted into proper ndarrays - Path raveling and formatting are unified b/w fixed-shape and var-shaped tensors ## Related issues Addresses ray-project#59445 ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
## Description This change addresses a long-standing problem when Pandas tensors holding null values couldn't be converted into Arrow ones. More details are captured in ray-project#59445. Following changes are made to address that: - Fixed `_is_ndarray_variable_shaped_tensor` - Numpy tensors with `dtype='o'` are cycled t/h Pyarrow to be converted into proper ndarrays - Path raveling and formatting are unified b/w fixed-shape and var-shaped tensors ## Related issues Addresses ray-project#59445 ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
## Description This change addresses a long-standing problem when Pandas tensors holding null values couldn't be converted into Arrow ones. More details are captured in ray-project#59445. Following changes are made to address that: - Fixed `_is_ndarray_variable_shaped_tensor` - Numpy tensors with `dtype='o'` are cycled t/h Pyarrow to be converted into proper ndarrays - Path raveling and formatting are unified b/w fixed-shape and var-shaped tensors ## Related issues Addresses ray-project#59445 ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com> Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
This change addresses a long-standing problem when Pandas tensors holding null values couldn't be converted into Arrow ones.
More details are captured in #59445.
Following changes are made to address that:
_is_ndarray_variable_shaped_tensordtype='o'are cycled t/h Pyarrow to be converted into proper ndarraysRelated issues
Addresses #59445
Additional information