Skip to content

[Data] Fix TensorArray to Arrow tensor conversion#59449

Merged
alexeykudinkin merged 20 commits intoray-project:masterfrom
alexeykudinkin:ak/pd-tnsr-fix
Dec 17, 2025
Merged

[Data] Fix TensorArray to Arrow tensor conversion#59449
alexeykudinkin merged 20 commits intoray-project:masterfrom
alexeykudinkin:ak/pd-tnsr-fix

Conversation

@alexeykudinkin
Copy link
Copy Markdown
Contributor

@alexeykudinkin alexeykudinkin commented Dec 15, 2025

Description

This change addresses a long-standing problem when Pandas tensors holding null values couldn't be converted into Arrow ones.

More details are captured in #59445.

Following changes are made to address that:

  • Fixed _is_ndarray_variable_shaped_tensor
  • Numpy tensors with dtype='o' are cycled t/h Pyarrow to be converted into proper ndarrays
  • Path raveling and formatting are unified b/w fixed-shape and var-shaped tensors

Related issues

Addresses #59445

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

…ct ndarrays;

Abstracted `AVSTA._ravel_tensors`

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin requested review from a team as code owners December 15, 2025 20:46
@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Dec 15, 2025
@alexeykudinkin alexeykudinkin linked an issue Dec 15, 2025 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue with converting TensorArray to Arrow tensors, particularly for data originating from pandas with nullable types. The core changes involve adding a specific handling path for np.object_ dtype arrays in ArrowTensorArray._from_numpy, refactoring tensor raveling logic into a reusable _ravel_tensors method, and updating an API call to use a keyword argument for clarity. The changes are logical and well-targeted. My review includes suggestions to address a TODO for ensuring correctness, a minor performance optimization, and a recommendation to add assertions to the new test case to make it more robust.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
This reverts commit b031dfd.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin changed the title [WIP][Data] Fix TensorArray to Arrow tensor conversion [Data] Fix TensorArray to Arrow tensor conversion Dec 16, 2025
@ray-gardener ray-gardener bot added the data Ray Data-related issues label Dec 16, 2025
Copy link
Copy Markdown
Contributor

@srinathk10 srinathk10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tests need fixing

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Cleaning up useless permutations

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a long-standing issue with converting Pandas tensors containing null values into Arrow tensors. The core of the fix, which involves cycling object-dtyped NumPy arrays through Pyarrow to handle nulls correctly, is sound and well-implemented in the new _ensure_scalar_ndarray function.

The accompanying changes are also positive:

  • The bug fix in _is_ndarray_variable_shaped_tensor is correct and crucial.
  • Refactoring the tensor raveling logic into _ravel_tensors improves code clarity and maintainability.
  • The API is improved by making column_name a keyword-only argument in ArrowTensorArray.from_numpy.
  • The new tests in test_tensor_extension.py provide good coverage for the fix.

I've left a few minor comments regarding typos in comments and test code.

One thing to note is the removal of tensor_format parametrization from several tests. While this simplifies the test suite, it seems to reduce test coverage for the v1 tensor format, as the tests will now only run against the default v2 format. This might be intentional, but it would be good to confirm.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Copy link
Copy Markdown
Contributor

@matthewdeng matthewdeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STAMP

@alexeykudinkin alexeykudinkin enabled auto-merge (squash) December 17, 2025 02:45
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@github-actions github-actions bot disabled auto-merge December 17, 2025 19:37
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) December 17, 2025 19:37
@alexeykudinkin alexeykudinkin merged commit 2015205 into ray-project:master Dec 17, 2025
7 checks passed
zzchun pushed a commit to zzchun/ray that referenced this pull request Dec 18, 2025
## Description

This change addresses a long-standing problem when Pandas tensors
holding null values couldn't be converted into Arrow ones.

More details are captured in
ray-project#59445.

Following changes are made to address that:

 - Fixed `_is_ndarray_variable_shaped_tensor`
- Numpy tensors with `dtype='o'` are cycled t/h Pyarrow to be converted
into proper ndarrays
- Path raveling and formatting are unified b/w fixed-shape and
var-shaped tensors

## Related issues

Addresses ray-project#59445

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
## Description

This change addresses a long-standing problem when Pandas tensors
holding null values couldn't be converted into Arrow ones.

More details are captured in
ray-project#59445.

Following changes are made to address that:

 - Fixed `_is_ndarray_variable_shaped_tensor`
- Numpy tensors with `dtype='o'` are cycled t/h Pyarrow to be converted
into proper ndarrays
- Path raveling and formatting are unified b/w fixed-shape and
var-shaped tensors

## Related issues

Addresses ray-project#59445

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
## Description

This change addresses a long-standing problem when Pandas tensors
holding null values couldn't be converted into Arrow ones.

More details are captured in
ray-project#59445.

Following changes are made to address that:

 - Fixed `_is_ndarray_variable_shaped_tensor`
- Numpy tensors with `dtype='o'` are cycled t/h Pyarrow to be converted
into proper ndarrays
- Path raveling and formatting are unified b/w fixed-shape and
var-shaped tensors

## Related issues

Addresses ray-project#59445

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Failure when converting to Arrow from TensorArray

3 participants