Skip to content

Query grouping by column with datatype List<Dictionary<(),()>> is failing #17012

@LiaCastaneda

Description

@LiaCastaneda

Describe the bug

Queries that Group By columns of type List<Dictionary<(),()>> fail with the following error:

Expected infallible creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("[Large]ListArray's child datatype Utf8 does not correspond to the List's datatype Dictionary(Int8, Utf8)")

This happens when doing a roundtrip from ArrayRef -> Row -> ArrayRef, it panics in convert_row. I believe this is because upon encoding a Dict to Row we doesn't seem to preserve dict encoding (see here)

The error arises from Arrow, but on DataFusion it may hapen here since it uses Arrow's RowConverter and does roundtrip conversions.

There is already an open issue on Arrow apache/arrow-rs#7165

To Reproduce

#[tokio::test]
async fn df_list_of_dict_should_error() -> Result<()> {
    // build List<Dictionary<Int8,Utf8>>
    let mut dict_builder = StringDictionaryBuilder::<Int8Type>::new();
    for s in ["foo","bar","baz","foo"] { dict_builder.append(s)?; }
    let mut list_builder = ListBuilder::new(dict_builder);
    list_builder.values().append("foo")?; 
    list_builder.values().append("bar")?;
    list_builder.append(true);
    list_builder.values().append("baz")?; 
    list_builder.append(true);
    let list_dict = list_builder.finish();

    let schema = Arc::new(Schema::new(vec![
        Field::new("a", DataType::Int32, false),
        Field::new("c", list_dict.data_type().clone(), false),
    ]));
    let batch = RecordBatch::try_new(
        schema.clone(),
        vec![Arc::new(Int32Array::from(vec![1,2])), Arc::new(list_dict)],
    )?;

    let ctx = SessionContext::new();
    ctx.register_batch("x", batch)?;

    // GROUP BY forces Aggregate (RowConverter pass)
    let df = ctx.sql(
        r#"
        SELECT c, COUNT(*) AS cnt
        FROM   x
        GROUP  BY c
        "#,
    ).await?;

    df.collect().await?;

    Ok(())
}

Expected behavior

Not throw an error

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions