-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Ray Data] Support expressions on GroupedData.map_groups #57907
Copy link
Copy link
Closed
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weekscommunity-backlogdataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityusability
Description
Description
As the with_column API supports expressions as well as UDFs, I expect to use the predefined expression udf on GroupedData.map_groups.
Use case
import ray
from ray.data.datatype import DataType
from ray.data.expressions import udf, col
import numpy as np
# Get first value per group.
ds = ray.data.from_items([
{"group": 1, "value": 1},
{"group": 1, "value": 2},
{"group": 2, "value": 3},
{"group": 2, "value": 4}])
@udf(return_dtype=DataType.int32())
def _first_elem(x):
return x[0]
ds.groupby("group").map_groups(lambda g: {"result": np.array([g["value"][0]])})
# Use udfs instead (mirrors top-level API)
ds.groupby("group").with_column(_first_elem(col("value")))Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weekscommunity-backlogdataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityusability