-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Copy link
Labels
P0Issues that should be fixed in short orderIssues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdataRay Data-related issuesRay Data-related issuesstability
Description
What happened + What you expected to happen
A fix might be to update _backfill_missing_fields to cast existing fields to the unified type when they don't match. Currently it only handles nested structs and tensor types, but not primitive type mismatches like int64 -> float64.
Versions / Dependencies
Reproduction script
import ray
def generator_fn(batch):
for i, row_id in enumerate(batch["id"]):
if i % 2 == 0:
# Yield struct with fields (a: int64, b: string)
yield {"data": [{"a": 1, "b": "hello"}]}
else:
# Yield struct with fields (a: float64, c: int32)
# Field 'a' has different type, field 'b' missing, field 'c' new
yield {"data": [{"a": 1.5, "c": 100}]}
ds = ray.data.range(4, override_num_blocks=1)
ds = ds.map_batches(generator_fn, batch_size=4)
ds.materialize()Issue Severity
High: It blocks me from completing my task.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P0Issues that should be fixed in short orderIssues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdataRay Data-related issuesRay Data-related issuesstability