Skip to content

User specified options for partition distribution on distributed ingest #1256

@rfecher

Description

@rfecher

We should give options to an advanced user to control partitioning on ingest. We can expose to the user numPartitions or partitionSize to control the number of partitions. partitionSize can be either in number of files or in MB. Then we can have a boolean flag such as partitionByFileSize that toggles whether to equalize the partitions by number of files or by total size in MB.
These settings are for advanced users and by default we equalize by number of files, and have spark.executor.instances * spark.executor.cores * 3 number of partitions as discussed in #1255.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions