Skip to content

[Data] error filtering using isin filters. #57849

@dyami0123

Description

@dyami0123

What happened + What you expected to happen

when filtering a dataset using an isin expression, there is an attribute error raised.

it looks like there was a typo in this commit: left.is_in(...) instead of left.isin(...)
ad2362e#diff-3a1c07f04f2ce517cd3b74f2387e8a88d73cb72571f08f84d8139301754d5624R31-R46

Operation.IN: lambda left, right: left.is_in(right),
Operation.NOT_IN: lambda left, right: ~left.is_in(right),

  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/dataset.py", line 5864, in to_pandas
    for batch in self.iter_batches(batch_format="pandas", batch_size=None):
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/iterator.py", line 190, in _create_iterator
    ) = self._to_ref_bundle_iterator()
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/iterator/iterator_impl.py", line 27, in _to_ref_bundle_iterator
    ref_bundles_iterator, stats = self._base_dataset._execute_to_iterator()
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/dataset.py", line 6515, in _execute_to_iterator
    bundle_iter, stats, executor = self._plan.execute_to_iterator()
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/exceptions.py", line 89, in handle_trace
    raise e.with_traceback(None) from SystemException()
ray.exceptions.RayTaskError(AttributeError): ray::Filter(NoneType)() (pid=52569, ip=127.0.0.1)
    for b_out in map_transformer.apply_transform(iter(blocks), ctx):
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 201, in _udf_timed_iter
    output = next(input)
             ^^^^^^^^^^^
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 102, in __call__
    yield from self._post_process(results)
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 84, in _shape_blocks
    for result in results:
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 412, in _apply_transform
    yield from self._block_fn(blocks, ctx)
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 229, in filter_block_fn
    filtered_block = block_accessor.filter(predicate_expr)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/pandas_block.py", line 633, in filter
    mask = eval_expr(predicate_expr, self._table)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 194, in eval_expr
    return _eval_expr_recursive(expr, batch, _PANDAS_EXPR_OPS_MAP)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 159, in _eval_expr_recursive
    return ops[expr.op](
           ^^^^^^^^^^^^^
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 58, in <lambda>
    Operation.IN: lambda left, right: left.is_in(right),
                                      ^^^^^^^^^^
  File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/pandas/core/generic.py", line 6299, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'is_in'

Versions / Dependencies

ray: 2.50.0

Reproduction script

import pandas as pd
from ray.data import from_pandas
from ray.data.expressions import col

ray_ds = from_pandas(pd.concat([pd.DataFrame({"value": [1, 2, 3], "_hash": ["A", "B", "C"]}) for _ in range(100)]))

result = ray_ds.filter(expr=col("_hash").is_in(["A"])).to_pandas()

print(result)

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Labels

bugSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesstability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions