Skip to content

Field Metadata Lost on COUNT DISTINCT queries resulting in Internal Error: Physical input schema should be the same as the one converted from logical input schema #12687

@alamb

Description

@alamb

Describe the bug

During upgrade of DataFusion in InfluxDB @itsjunetime found that somewhere Field level metadata (aka Field::with_metadata is being lost during DataFusion logical planning

This manifests itself as an error during the physical planning

Internal Error: Physical input schema should be the same as the one converted from logical input schema

@itsjunetime found a workaround in #12631 (which is to ignore the Field metadata). This ticket tracks fixing it for real.

To Reproduce

(we are working on a reproducer)

Expected behavior

No error during physical planning

Additional context

Copying @itsjunetime's description here: #12560 (comment)

I'm running into this behavior after #11989, specifically seeing schema mismatches where the only thing that is different is that a field's metadata disappears at some point (so the schemas are the same except for a field's metadata). E.g.:

&physical_input_schema = Schema {
    fields: [
        Field {
            name: "alias1",
            data_type: Utf8,
            nullable: true,
            dict_id: 0,
            dict_is_ordered: false,
            metadata: {},
        },
    ],
    metadata: {},
}
&physical_input_schema_from_logical = Schema {
    fields: [
        Field {
            name: "alias1",
            data_type: Utf8,
            nullable: true,
            dict_id: 0,
            dict_is_ordered: false,
            metadata: {
                "some_key": "some_value"
            },
        },
    ],
    metadata: {},
}

I've yet to figure out exactly where the metadata is being dropped and I haven't figured out a reproducer either. I suggested comparing only the fields' non-metadata fields here, but @jayzhan211 pointed out that that's more of a workaround than an actual fix, as it's still a problem if the metadata is disappearing.

The issue that I'm running into, though, seems to be somewhat different than the issue that others (like @phillipleblanc) are running into, where some fields completely disappear from the schema (see here). I don't think these are the same issue, exactly (since they manifest differently), but they may have the same root cause/solution, so I think it's fair to keep them all under this issue unless needed otherwise.

I'll work on getting a fix or reproducer today

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingregressionSomething that used to work no longer does

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions