Skip to content

Conversation

@yongkyunlee
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Simple bug fix where the order of name and namespace is wrong in Resolver field registration method.

What changes are included in this PR?

One-line bug fix and corresponding tests.

Are these changes tested?

  • Added a test in the same file to test the fix. Made sure it doesn't pass with the original code.

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 5, 2025
@alamb
Copy link
Contributor

alamb commented Aug 5, 2025

@jecsand838 can you please help review this PR?

// under the License.

use crate::schema::{Attributes, ComplexType, PrimitiveType, Record, Schema, TypeName};
use crate::schema::{Attributes, ComplexType, Enum, PrimitiveType, Record, Schema, TypeName};
Copy link
Contributor

@jecsand838 jecsand838 Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to bring in Enum for this.

Suggested change
use crate::schema::{Attributes, ComplexType, Enum, PrimitiveType, Record, Schema, TypeName};
use crate::schema::{Attributes, ComplexType, PrimitiveType, Record, Schema, TypeName};

Copy link
Contributor Author

@yongkyunlee yongkyunlee Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will remove this import and clean up relevant unneeded code.

Comment on lines 1014 to 1045
#[test]
fn test_nested_record_type_reuse_without_namespace() {
let nested_record = create_record("Nested", None, vec![
create_primitive_field("nested_int", PrimitiveType::Int),
]);
let main_record = create_record("Record", None, vec![
create_record_field("nested", nested_record),
create_ref_field("nestedRecord", "Nested"),
create_array_field("nestedArray", Schema::TypeName(TypeName::Ref("Nested"))),
create_map_field("nestedMap", Schema::TypeName(TypeName::Ref("Nested"))),
]);
let schema = Schema::Complex(ComplexType::Record(main_record));

run_schema_test(schema, |avro_field| {
let codec = &avro_field.data_type().codec;

if let Codec::Struct(fields) = codec {
assert_eq!(fields.len(), 4);
let field_names: Vec<&str> = fields.iter().map(|f| f.name()).collect();
assert_eq!(field_names, vec!["nested", "nestedRecord", "nestedArray", "nestedMap"]);
}

validate_struct_field(codec, 0, "nested", |c| validate_nested_record(c, "nested_int", |codec| matches!(codec, Codec::Int32)));
validate_struct_field(codec, 1, "nestedRecord", |c| validate_nested_record(c, "nested_int", |codec| matches!(codec, Codec::Int32)));
validate_struct_field(codec, 2, "nestedArray", |c| {
validate_collection_items(c, |item_codec| validate_nested_record(item_codec, "nested_int", |codec| matches!(codec, Codec::Int32)));
});
validate_struct_field(codec, 3, "nestedMap", |c| {
validate_collection_items(c, |value_codec| validate_nested_record(value_codec, "nested_int", |codec| matches!(codec, Codec::Int32)));
});
});
}
Copy link
Contributor

@jecsand838 jecsand838 Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably simplify these tests using an approach like this.

This is the test I used to replicate the bug you reported.

    #[test]
    fn test_nested_record_resolution() {
        let schema_str = r#"
        {
          "type": "record",
          "name": "Sample",
          "namespace": "test",
          "fields": [
            { "name": "key", "type": "string" },
            {
              "name": "nested",
              "type": {
                "type": "record",
                "name": "Nested",
                "fields": [
                  { "name": "nested_int", "type": "int" }
                ]
              }
            },
            { "name": "nestedAgain", "type": "Nested" },
            { "name": "nestedArray", "type": { "type": "array", "items": "Nested" } },
            { "name": "nestedMap", "type": { "type": "map", "values": "Nested" } }
          ]
        }
        "#;

        let schema: Schema = serde_json::from_str(schema_str).unwrap();

        let mut resolver = Resolver::default();
        let avro_data_type = make_data_type(&schema, None, &mut resolver, false, false).unwrap();

        if let Codec::Struct(fields) = avro_data_type.codec() {
            assert_eq!(fields.len(), 5);

            // key
            assert_eq!(fields[0].name(), "key");
            assert!(matches!(fields[0].data_type().codec(), Codec::Utf8));

            // nested
            assert_eq!(fields[1].name(), "nested");
            let nested_data_type = fields[1].data_type();
            if let Codec::Struct(nested_fields) = nested_data_type.codec() {
                assert_eq!(nested_fields.len(), 1);
                assert_eq!(nested_fields[0].name(), "nested_int");
                assert!(matches!(nested_fields[0].data_type().codec(), Codec::Int32));
            } else {
                panic!(
                    "'nested' field is not a struct but {:?}",
                    nested_data_type.codec()
                );
            }

            // nestedAgain
            assert_eq!(fields[2].name(), "nestedAgain");
            let nested_again_data_type = fields[2].data_type();
            assert_eq!(
                nested_again_data_type.codec().data_type(),
                nested_data_type.codec().data_type()
            );

            // nestedArray
            assert_eq!(fields[3].name(), "nestedArray");
            if let Codec::List(item_type) = fields[3].data_type().codec() {
                assert_eq!(
                    item_type.codec().data_type(),
                    nested_data_type.codec().data_type()
                );
            } else {
                panic!("'nestedArray' field is not a list");
            }

            // nestedMap
            assert_eq!(fields[4].name(), "nestedMap");
            if let Codec::Map(value_type) = fields[4].data_type().codec() {
                assert_eq!(
                    value_type.codec().data_type(),
                    nested_data_type.codec().data_type()
                );
            } else {
                panic!("'nestedMap' field is not a map");
            }
        } else {
            panic!("Top-level schema is not a struct");
        }
    }

@jecsand838
Copy link
Contributor

@yongkyunlee

Thank you for catching this bug and getting a fix for it up!

I left a few nits, but LMGTM. The recommendation regarding the test should eliminate the need for the helpers and be easier to lint imo.

Only other thing I'd recommend is creating an Avro file using this schema in arrow-avro/test/data and adding a test for it to arrow-avro/src/reader/mod.rs. That way we can be sure it works E2E.

@yongkyunlee
Copy link
Contributor Author

@jecsand838 Thanks for the good suggestions! I'll apply the feedback and push again in a few hours.

@jecsand838
Copy link
Contributor

LGTM! Thank you again for this @yongkyunlee!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yongkyunlee and @jecsand838

I skimmed the PR and while I am not a technical expert (I am relying on @jecsand838 for this) it looks well commented and tested to me

Thanks again!

@yongkyunlee
Copy link
Contributor Author

@alamb Thanks! Could you merge the PR?

@alamb alamb merged commit c6887ff into apache:main Aug 6, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[avro] Bug in resolving avro schema with named type

3 participants