Skip to content

Null fields are omitted by infer_json_schema_from_seekable #4814

@kskalski

Description

@kskalski

Describe the bug
Fields with only null values are skipped when inferring Schema, which makes reader in strict mode fail as it stumbles upon field which is not included in the schema. In any case, silently removing the fields that are in the input seems wrong - maybe this should be controlled by an option to inference function or it should be left up to the user to filter out null fields.

To Reproduce

#[cfg(test)]
mod tests {
    const DATA2: &str = r#"{"a": 1, "b": "str", "c": null}"#;

    #[test]
    fn test_json_infers_null_schema() {
        let input_buf = std::io::Cursor::new(DATA2.as_bytes());
        let mut buf_reader = std::io::BufReader::new(input_buf);
        let schema = arrow::json::reader::infer_json_schema_from_seekable(&mut buf_reader, None).unwrap();
        let field = schema
            .field_with_name("a")
            .expect("should contain numeric field");
        assert_eq!(&arrow::datatypes::DataType::Int64, field.data_type());
        let field = schema
            .field_with_name("c")
            .expect("should contain null field");
        assert_eq!(&arrow::datatypes::DataType::Null, field.data_type());
    }
}

produces

thread parquet::tests::test_json_infers_null_schema panicked at lakeshore-history/src/parquet.rs:197:14:
should contain null field: SchemaError("Unable to get field named \"c\". Valid fields: [\"a\", \"b\"]")
stack backtrace:

Expected behavior
test passes

Additional context
tested with arrow 46

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions