-
Notifications
You must be signed in to change notification settings - Fork 590
[VL][Spark 3.3+] support match columns use filedIds in native insteads of fallback #2619
Copy link
Copy link
Closed as not planned
Closed as not planned
Copy link
Labels
Description
currently native parquet scan didn't support match columns use filedId when reading, so the example case below would fallback with this PR ready #2563 or return null.
This item use to track let native parquet scan can match column use filedId.
import org.apache.spark.sql.{AnalysisException, Column, DataFrame, Row}
import org.apache.spark.sql.types.{ArrayType, IntegerType, MapType, Metadata, MetadataBuilder, StringType, StructType}
import scala.collection.JavaConverters._
val FIELD_ID_METADATA_KEY = "parquet.field.id"
def withId(id: Int): Metadata =
new MetadataBuilder().putLong(FIELD_ID_METADATA_KEY, id).build()
val readSchema =
new StructType()
.add("a", StringType, true, withId(0))
.add("b", IntegerType, true, withId(1))
val writeSchema =
new StructType()
.add("random", IntegerType, true, withId(0))
.add("name", StringType, true,withId(1))
val writeData = Seq(Row(100, "text"), Row(200, "more"))
spark.createDataFrame(writeData.asJava, writeSchema).write.mode("overwrite").parquet("/tmp/spark1/data")
val df = spark.read.schema(readSchema).parquet("/tmp/spark1/data")Reactions are currently unavailable