Skip to content

[Ray Data] Support expressions on GroupedData.map_groups #57907

@codingl2k1

Description

@codingl2k1

Description

As the with_column API supports expressions as well as UDFs, I expect to use the predefined expression udf on GroupedData.map_groups.

Use case

import ray
from ray.data.datatype import DataType
from ray.data.expressions import udf, col
import numpy as np

# Get first value per group.
ds = ray.data.from_items([
    {"group": 1, "value": 1},
    {"group": 1, "value": 2},
    {"group": 2, "value": 3},
    {"group": 2, "value": 4}])

@udf(return_dtype=DataType.int32())
def _first_elem(x):
    return x[0]

ds.groupby("group").map_groups(lambda g: {"result": np.array([g["value"][0]])})

# Use udfs instead (mirrors top-level API)
ds.groupby("group").with_column(_first_elem(col("value")))

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Issue that should be fixed within a few weekscommunity-backlogdataRay Data-related issuesenhancementRequest for new feature and/or capabilityusability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions