-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix arrow-avro type resolver register bug #8046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jecsand838 can you please help review this PR? |
arrow-avro/src/codec.rs
Outdated
| // under the License. | ||
|
|
||
| use crate::schema::{Attributes, ComplexType, PrimitiveType, Record, Schema, TypeName}; | ||
| use crate::schema::{Attributes, ComplexType, Enum, PrimitiveType, Record, Schema, TypeName}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to bring in Enum for this.
| use crate::schema::{Attributes, ComplexType, Enum, PrimitiveType, Record, Schema, TypeName}; | |
| use crate::schema::{Attributes, ComplexType, PrimitiveType, Record, Schema, TypeName}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will remove this import and clean up relevant unneeded code.
| #[test] | ||
| fn test_nested_record_type_reuse_without_namespace() { | ||
| let nested_record = create_record("Nested", None, vec![ | ||
| create_primitive_field("nested_int", PrimitiveType::Int), | ||
| ]); | ||
| let main_record = create_record("Record", None, vec![ | ||
| create_record_field("nested", nested_record), | ||
| create_ref_field("nestedRecord", "Nested"), | ||
| create_array_field("nestedArray", Schema::TypeName(TypeName::Ref("Nested"))), | ||
| create_map_field("nestedMap", Schema::TypeName(TypeName::Ref("Nested"))), | ||
| ]); | ||
| let schema = Schema::Complex(ComplexType::Record(main_record)); | ||
|
|
||
| run_schema_test(schema, |avro_field| { | ||
| let codec = &avro_field.data_type().codec; | ||
|
|
||
| if let Codec::Struct(fields) = codec { | ||
| assert_eq!(fields.len(), 4); | ||
| let field_names: Vec<&str> = fields.iter().map(|f| f.name()).collect(); | ||
| assert_eq!(field_names, vec!["nested", "nestedRecord", "nestedArray", "nestedMap"]); | ||
| } | ||
|
|
||
| validate_struct_field(codec, 0, "nested", |c| validate_nested_record(c, "nested_int", |codec| matches!(codec, Codec::Int32))); | ||
| validate_struct_field(codec, 1, "nestedRecord", |c| validate_nested_record(c, "nested_int", |codec| matches!(codec, Codec::Int32))); | ||
| validate_struct_field(codec, 2, "nestedArray", |c| { | ||
| validate_collection_items(c, |item_codec| validate_nested_record(item_codec, "nested_int", |codec| matches!(codec, Codec::Int32))); | ||
| }); | ||
| validate_struct_field(codec, 3, "nestedMap", |c| { | ||
| validate_collection_items(c, |value_codec| validate_nested_record(value_codec, "nested_int", |codec| matches!(codec, Codec::Int32))); | ||
| }); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably simplify these tests using an approach like this.
This is the test I used to replicate the bug you reported.
#[test]
fn test_nested_record_resolution() {
let schema_str = r#"
{
"type": "record",
"name": "Sample",
"namespace": "test",
"fields": [
{ "name": "key", "type": "string" },
{
"name": "nested",
"type": {
"type": "record",
"name": "Nested",
"fields": [
{ "name": "nested_int", "type": "int" }
]
}
},
{ "name": "nestedAgain", "type": "Nested" },
{ "name": "nestedArray", "type": { "type": "array", "items": "Nested" } },
{ "name": "nestedMap", "type": { "type": "map", "values": "Nested" } }
]
}
"#;
let schema: Schema = serde_json::from_str(schema_str).unwrap();
let mut resolver = Resolver::default();
let avro_data_type = make_data_type(&schema, None, &mut resolver, false, false).unwrap();
if let Codec::Struct(fields) = avro_data_type.codec() {
assert_eq!(fields.len(), 5);
// key
assert_eq!(fields[0].name(), "key");
assert!(matches!(fields[0].data_type().codec(), Codec::Utf8));
// nested
assert_eq!(fields[1].name(), "nested");
let nested_data_type = fields[1].data_type();
if let Codec::Struct(nested_fields) = nested_data_type.codec() {
assert_eq!(nested_fields.len(), 1);
assert_eq!(nested_fields[0].name(), "nested_int");
assert!(matches!(nested_fields[0].data_type().codec(), Codec::Int32));
} else {
panic!(
"'nested' field is not a struct but {:?}",
nested_data_type.codec()
);
}
// nestedAgain
assert_eq!(fields[2].name(), "nestedAgain");
let nested_again_data_type = fields[2].data_type();
assert_eq!(
nested_again_data_type.codec().data_type(),
nested_data_type.codec().data_type()
);
// nestedArray
assert_eq!(fields[3].name(), "nestedArray");
if let Codec::List(item_type) = fields[3].data_type().codec() {
assert_eq!(
item_type.codec().data_type(),
nested_data_type.codec().data_type()
);
} else {
panic!("'nestedArray' field is not a list");
}
// nestedMap
assert_eq!(fields[4].name(), "nestedMap");
if let Codec::Map(value_type) = fields[4].data_type().codec() {
assert_eq!(
value_type.codec().data_type(),
nested_data_type.codec().data_type()
);
} else {
panic!("'nestedMap' field is not a map");
}
} else {
panic!("Top-level schema is not a struct");
}
}|
Thank you for catching this bug and getting a fix for it up! I left a few nits, but LMGTM. The recommendation regarding the test should eliminate the need for the helpers and be easier to lint imo. Only other thing I'd recommend is creating an Avro file using this schema in |
|
@jecsand838 Thanks for the good suggestions! I'll apply the feedback and push again in a few hours. |
|
LGTM! Thank you again for this @yongkyunlee! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @yongkyunlee and @jecsand838
I skimmed the PR and while I am not a technical expert (I am relying on @jecsand838 for this) it looks well commented and tested to me
Thanks again!
|
@alamb Thanks! Could you merge the PR? |
Which issue does this PR close?
Rationale for this change
Simple bug fix where the order of
nameandnamespaceis wrong in Resolver field registration method.What changes are included in this PR?
One-line bug fix and corresponding tests.
Are these changes tested?
Are there any user-facing changes?
No