Skip to content

[Ray Data] Add a force overwrite option to the rename_columns() #56878

@codingl2k1

Description

@codingl2k1

Description

It is expensive to retrieve the schema. I just want to force rename the columns: if the target column already exists, overwrite it.

Currently,

import ray.data
from ray.data.expressions import col

# build a ds with two columns: id, id2
ds = ray.data.range(10).with_column("id2", col("id"))
ds.rename_columns({"id2": "id"}).show()

This will raise an exception:

    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
             ~^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 1711, in pyarrow.lib._Tabular.__getitem__
  File "pyarrow/table.pxi", line 1796, in pyarrow.lib._Tabular.column
  File "pyarrow/table.pxi", line 1735, in pyarrow.lib._Tabular._ensure_integer_index
KeyError: 'Field "id" exists 2 times in schema'

Use case

I want this:

import ray.data
from ray.data.expressions import col

# build a ds with two columns: id, id2
ds = ray.data.range(10).with_column("id2", col("id"))
ds.rename_columns({"id2": "id"}, force_overwrite=True).show()

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions