-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Data] error filtering using isin filters. #57849
Copy link
Copy link
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesRay Data-related issuesstability
Description
What happened + What you expected to happen
when filtering a dataset using an isin expression, there is an attribute error raised.
it looks like there was a typo in this commit: left.is_in(...) instead of left.isin(...)
ad2362e#diff-3a1c07f04f2ce517cd3b74f2387e8a88d73cb72571f08f84d8139301754d5624R31-R46
ray/python/ray/data/_expression_evaluator.py
Lines 47 to 48 in edfd287
| Operation.IN: lambda left, right: left.is_in(right), | |
| Operation.NOT_IN: lambda left, right: ~left.is_in(right), |
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/dataset.py", line 5864, in to_pandas
for batch in self.iter_batches(batch_format="pandas", batch_size=None):
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/iterator.py", line 190, in _create_iterator
) = self._to_ref_bundle_iterator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/iterator/iterator_impl.py", line 27, in _to_ref_bundle_iterator
ref_bundles_iterator, stats = self._base_dataset._execute_to_iterator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/dataset.py", line 6515, in _execute_to_iterator
bundle_iter, stats, executor = self._plan.execute_to_iterator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/exceptions.py", line 89, in handle_trace
raise e.with_traceback(None) from SystemException()
ray.exceptions.RayTaskError(AttributeError): ray::Filter(NoneType)() (pid=52569, ip=127.0.0.1)
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 201, in _udf_timed_iter
output = next(input)
^^^^^^^^^^^
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 102, in __call__
yield from self._post_process(results)
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 84, in _shape_blocks
for result in results:
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 412, in _apply_transform
yield from self._block_fn(blocks, ctx)
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 229, in filter_block_fn
filtered_block = block_accessor.filter(predicate_expr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_internal/pandas_block.py", line 633, in filter
mask = eval_expr(predicate_expr, self._table)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 194, in eval_expr
return _eval_expr_recursive(expr, batch, _PANDAS_EXPR_OPS_MAP)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 159, in _eval_expr_recursive
return ops[expr.op](
^^^^^^^^^^^^^
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/ray/data/_expression_evaluator.py", line 58, in <lambda>
Operation.IN: lambda left, right: left.is_in(right),
^^^^^^^^^^
File "/Users/dandrews/miniconda3/envs/fa-dev/lib/python3.11/site-packages/pandas/core/generic.py", line 6299, in __getattr__
return object.__getattribute__(self, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'is_in'Versions / Dependencies
ray: 2.50.0
Reproduction script
import pandas as pd
from ray.data import from_pandas
from ray.data.expressions import col
ray_ds = from_pandas(pd.concat([pd.DataFrame({"value": [1, 2, 3], "_hash": ["A", "B", "C"]}) for _ in range(100)]))
result = ray_ds.filter(expr=col("_hash").is_in(["A"])).to_pandas()
print(result)Issue Severity
High: It blocks me from completing my task.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesRay Data-related issuesstability