We should give options to an advanced user to control partitioning on ingest. We can expose to the user numPartitions or partitionSize to control the number of partitions. partitionSize can be either in number of files or in MB. Then we can have a boolean flag such as partitionByFileSize that toggles whether to equalize the partitions by number of files or by total size in MB.
These settings are for advanced users and by default we equalize by number of files, and have spark.executor.instances * spark.executor.cores * 3 number of partitions as discussed in #1255.
We should give options to an advanced user to control partitioning on ingest. We can expose to the user
numPartitionsorpartitionSizeto control the number of partitions.partitionSizecan be either in number of files or in MB. Then we can have a boolean flag such aspartitionByFileSizethat toggles whether to equalize the partitions by number of files or by total size in MB.These settings are for advanced users and by default we equalize by number of files, and have
spark.executor.instances*spark.executor.cores* 3 number of partitions as discussed in #1255.