-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
Fields with only null values are skipped when inferring Schema, which makes reader in strict mode fail as it stumbles upon field which is not included in the schema. In any case, silently removing the fields that are in the input seems wrong - maybe this should be controlled by an option to inference function or it should be left up to the user to filter out null fields.
To Reproduce
#[cfg(test)]
mod tests {
const DATA2: &str = r#"{"a": 1, "b": "str", "c": null}"#;
#[test]
fn test_json_infers_null_schema() {
let input_buf = std::io::Cursor::new(DATA2.as_bytes());
let mut buf_reader = std::io::BufReader::new(input_buf);
let schema = arrow::json::reader::infer_json_schema_from_seekable(&mut buf_reader, None).unwrap();
let field = schema
.field_with_name("a")
.expect("should contain numeric field");
assert_eq!(&arrow::datatypes::DataType::Int64, field.data_type());
let field = schema
.field_with_name("c")
.expect("should contain null field");
assert_eq!(&arrow::datatypes::DataType::Null, field.data_type());
}
}produces
thread parquet::tests::test_json_infers_null_schema panicked at lakeshore-history/src/parquet.rs:197:14:
should contain null field: SchemaError("Unable to get field named \"c\". Valid fields: [\"a\", \"b\"]")
stack backtrace:
Expected behavior
test passes
Additional context
tested with arrow 46