-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Queries that Group By columns of type List<Dictionary<(),()>> fail with the following error:
Expected infallible creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("[Large]ListArray's child datatype Utf8 does not correspond to the List's datatype Dictionary(Int8, Utf8)")
This happens when doing a roundtrip from ArrayRef -> Row -> ArrayRef, it panics in convert_row. I believe this is because upon encoding a Dict to Row we doesn't seem to preserve dict encoding (see here)
The error arises from Arrow, but on DataFusion it may hapen here since it uses Arrow's RowConverter and does roundtrip conversions.
There is already an open issue on Arrow apache/arrow-rs#7165
To Reproduce
#[tokio::test]
async fn df_list_of_dict_should_error() -> Result<()> {
// build List<Dictionary<Int8,Utf8>>
let mut dict_builder = StringDictionaryBuilder::<Int8Type>::new();
for s in ["foo","bar","baz","foo"] { dict_builder.append(s)?; }
let mut list_builder = ListBuilder::new(dict_builder);
list_builder.values().append("foo")?;
list_builder.values().append("bar")?;
list_builder.append(true);
list_builder.values().append("baz")?;
list_builder.append(true);
let list_dict = list_builder.finish();
let schema = Arc::new(Schema::new(vec![
Field::new("a", DataType::Int32, false),
Field::new("c", list_dict.data_type().clone(), false),
]));
let batch = RecordBatch::try_new(
schema.clone(),
vec![Arc::new(Int32Array::from(vec![1,2])), Arc::new(list_dict)],
)?;
let ctx = SessionContext::new();
ctx.register_batch("x", batch)?;
// GROUP BY forces Aggregate (RowConverter pass)
let df = ctx.sql(
r#"
SELECT c, COUNT(*) AS cnt
FROM x
GROUP BY c
"#,
).await?;
df.collect().await?;
Ok(())
}
Expected behavior
Not throw an error
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working