-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Ray Data] Add a force overwrite option to the rename_columns() #56878
Copy link
Copy link
Closed
Labels
community-backlogdataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityperformanceusability
Description
Description
It is expensive to retrieve the schema. I just want to force rename the columns: if the target column already exists, overwrite it.
Currently,
import ray.data
from ray.data.expressions import col
# build a ds with two columns: id, id2
ds = ray.data.range(10).with_column("id2", col("id"))
ds.rename_columns({"id2": "id"}).show()This will raise an exception:
return super().dump(obj)
^^^^^^^^^^^^^^^^^
~^^^^^^^^^^^^^
File "pyarrow/table.pxi", line 1711, in pyarrow.lib._Tabular.__getitem__
File "pyarrow/table.pxi", line 1796, in pyarrow.lib._Tabular.column
File "pyarrow/table.pxi", line 1735, in pyarrow.lib._Tabular._ensure_integer_index
KeyError: 'Field "id" exists 2 times in schema'Use case
I want this:
import ray.data
from ray.data.expressions import col
# build a ds with two columns: id, id2
ds = ray.data.range(10).with_column("id2", col("id"))
ds.rename_columns({"id2": "id"}, force_overwrite=True).show()Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
community-backlogdataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityperformanceusability