currently native parquet scan didn't support match columns use filedId when reading, so the example case below would fallback with this PR ready #2563 or return null.
This item use to track let native parquet scan can match column use filedId.
import org.apache.spark.sql.{AnalysisException, Column, DataFrame, Row}
import org.apache.spark.sql.types.{ArrayType, IntegerType, MapType, Metadata, MetadataBuilder, StringType, StructType}
import scala.collection.JavaConverters._
val FIELD_ID_METADATA_KEY = "parquet.field.id"
def withId(id: Int): Metadata =
new MetadataBuilder().putLong(FIELD_ID_METADATA_KEY, id).build()
val readSchema =
new StructType()
.add("a", StringType, true, withId(0))
.add("b", IntegerType, true, withId(1))
val writeSchema =
new StructType()
.add("random", IntegerType, true, withId(0))
.add("name", StringType, true,withId(1))
val writeData = Seq(Row(100, "text"), Row(200, "more"))
spark.createDataFrame(writeData.asJava, writeSchema).write.mode("overwrite").parquet("/tmp/spark1/data")
val df = spark.read.schema(readSchema).parquet("/tmp/spark1/data")
currently native parquet scan didn't support match columns use filedId when reading, so the example case below would fallback with this PR ready #2563 or return null.
This item use to track let native parquet scan can match column use filedId.