-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does
Description
Describe the bug
A LargeList(Struct({"foo": LargeUtf8})) cannot be coerced to List(Struct({"foo": Utf8})). It however it works fine for LargeList(LargeUtf8) -> List(Utf8) and Struct({"foo": LargeUtf8}) -> Struct({"foo": Utf8}).
To Reproduce
import polars as pl
from deltalake import DeltaTable
tmp_path = "test_table__"
df = pl.DataFrame({"foo": [1], "bar": [[{"foo": "!"}]]})
df.write_delta(tmp_path, mode="overwrite", overwrite_schema=True)
DeltaTable(tmp_path).merge(
df.to_arrow(compat_level=1),
predicate="s.foo = t.foo",
source_alias="s",
target_alias="t",
large_dtypes=None,
).when_matched_update_all().execute()DeltaError: Generic DeltaTable error: type_coercion
caused by
Error during planning: Failed to coerce then ([LargeList(Field { name: "item", data_type: Struct([Field { name: "foo", data_type: Utf8View, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "element", data_type: Struct([Field { name: "foo", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "element", data_type: Struct([Field { name: "foo", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "element", data_type: Struct([Field { name: "foo", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })]) and else (None) to common types in CASE WHEN expression
Expected behavior
Be able to coerce Large/view and normal arrow types in deeply nested types.
Additional context
Luckly we still can downcast in python using the large_dtypes=False, but datafusion should be able to coerce any deeply nested dtype.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does