[Data] Remove deprecated `TENSOR_COLUMN_NAME` constant and associated dead code

**Description**

Remove the `TENSOR_COLUMN_NAME` constant (`"__value__"`) from Ray Data. This constant was historically used to wrap raw numpy arrays into single-column tables, but this behavior has been deprecated since Ray 2.5. The constant and its associated code paths are now dead code that should be cleaned up.

**Background**

`TENSOR_COLUMN_NAME` (defined as `"__value__"`) was introduced to handle cases where users passed raw numpy arrays to Ray Data APIs. When a raw numpy array was encountered, it would be automatically wrapped into a single-column table with the column name `"__value__"`.

Since Ray 2.5, passing raw numpy arrays to APIs like `map_batches()` raises an explicit error:

```python
if isinstance(batch, np.ndarray):
    raise ValueError(
        "Standalone numpy arrays are not allowed in Ray 2.5. "
        "Return a dict of field -> array, e.g., `{'data': array}` instead of `array`."
    )
```

See: [`block.py:463-469`](https://github.com/ray-project/ray/blob/master/python/ray/data/block.py#L463-L469)

Current public APIs already use explicit column names:
- `from_numpy()` uses `"data"` as the column name
- `from_items()` uses `"item"` as the column name
- `range_tensor()` uses `"data"` as the column name
- `read_numpy()` uses `"data"` as the column name

The remaining usages of `TENSOR_COLUMN_NAME` are:
1. **Dead code** in `TableBlockBuilder.add()` that wraps numpy arrays (never triggered by current code paths)
2. **Backwards-compatibility logic** in `_convert_batch_type_to_numpy()` that auto-unwraps single tensor columns
3. **Tensor detection logic** in `_should_convert_to_tensor()` that checks `column_name == TENSOR_COLUMN_NAME`
4. **Row extraction helpers** `_build_tensor_row()` in pandas/arrow block accessors

**Implementation Boundaries & Constraints**

- **Target Files**:
  - `python/ray/data/constants.py` - Remove `TENSOR_COLUMN_NAME` definition
  - `python/ray/data/_internal/table_block.py` - Remove numpy array handling in `TableBlockBuilder.add()` (lines 79-80)
  - `python/ray/data/util/data_batch_conversion.py` - Remove backwards-compat logic in `_convert_batch_type_to_pandas()` and `_convert_batch_type_to_numpy()`
  - `python/ray/data/_internal/tensor_extensions/utils.py` - Remove `column_name == TENSOR_COLUMN_NAME` check in `_should_convert_to_tensor()`
  - `python/ray/data/_internal/pandas_block.py` - Remove or update `_build_tensor_row()`
  - `python/ray/data/_internal/arrow_block.py` - Remove default parameter `col_name: str = TENSOR_COLUMN_NAME` from `_build_tensor_row()`
  - `python/ray/data/tests/unit/test_data_batch_conversion.py` - Update tests that reference `TENSOR_COLUMN_NAME`
  - `python/ray/data/tests/conftest.py` - Update test fixtures that use `TENSOR_COLUMN_NAME`

- **Do Not Touch**:
  - `python/ray/air/constants.py` - This is in the AIR module (separate cleanup)
  - `python/ray/air/util/data_batch_conversion.py` - This is in the AIR module (separate cleanup)
  - `python/ray/train/` - Predictor classes that use `TENSOR_COLUMN_NAME` are in Train module

- **Breaking Change Assessment**:
  - This is **not** a user-facing breaking change because:
    1. `TENSOR_COLUMN_NAME` is not exported in any `__init__.py`
    2. Current public APIs (`from_numpy`, etc.) already use different column names like `"data"`
    3. Passing raw numpy arrays to `map_batches()` already errors since Ray 2.5
  - Users who hardcoded `"__value__"` in their code were relying on undocumented internal behavior

**Contributing expectations**

Please follow the [Ray Data Contributing Guide](https://docs.ray.io/en/latest/data/contributing/contributing-guide.html) for development setup and testing instructions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Remove deprecated `TENSOR_COLUMN_NAME` constant and associated dead code #60547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Data] Remove deprecated TENSOR_COLUMN_NAME constant and associated dead code #60547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Data] Remove deprecated `TENSOR_COLUMN_NAME` constant and associated dead code #60547