Skip to content

Support default value semantics in Iceberg  #2039

@wmoustafa

Description

@wmoustafa

Hive tables written in Avro file format face a challenge when they are migrated to Iceberg due to the lack of support of default value semantics in Iceberg. If a field is assigned a default value, users get a null if the field is optional and get an exception (shown below) if the field is required, compared to getting a non-null default value when the default value in the Avro schema is non-null. This issue is to track work required to support default values in Iceberg. It will be a plus if the semantics and expectations remain the same regardless of the underlying file format (e.g., support should ideally extend to ORC as well).

java.lang.IllegalArgumentException: Missing required field: xyz
at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)
at org.apache.iceberg.avro.BuildAvroProjection.record(BuildAvroProjection.java:98)
at org.iceberg.avro.BuildAvroProjection.record(BuildAvroProjection.java:42)
at org.apache.iceberg.avro.AvroCustomOrderSchemaVisitor.visit(AvroCustomOrderSchemaVisitor.java:51)
at org.apache.iceberg.avro.AvroSchemaUtil.buildAvroProjection(AvroSchemaUtil.java:104)
at org.apache.iceberg.avro.ProjectionDatumReader.setSchema(ProjectionDatumReader.java:68)
at org.apache.iceberg.shaded.org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:132)
at org.apache.iceberg.shaded.org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:106)
at org.apache.iceberg.shaded.org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:98)
at org.apache.iceberg.shaded.org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:66)
at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:100)
at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:77)
at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:103)
at org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:81)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions