ESQL: Add split coalescing, cost-aware distribution, and file splitting for external sources

Workloads with many small files (Iceberg micro-partitions, Firehose delivery streams) create excessive scheduling overhead — one split per file means thousands of splits competing for a handful of drivers. Meanwhile, distribution strategies ignore file size entirely, leading to unbalanced node assignments where one node gets a few large files and another gets thousands of tiny ones. This work adds split coalescing, size-aware distribution, and sub-file splitting to make external source execution efficient across diverse file layouts.

**Split coalescing** — Group many small splits into composite units so the scheduler processes them in batches rather than individually, reducing per-split overhead without changing operator semantics.
- [x] #143335

**Cost-aware distribution** — Use file size metadata to balance split assignments across nodes, ensuring each node receives roughly equal total work instead of equal split counts.
- [x] #143349

**File splitting** — Enable sub-file parallelism by splitting large files into byte-range chunks, so a single oversized file does not become a sequential bottleneck.
- [x] #143349

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Add split coalescing, cost-aware distribution, and file splitting for external sources #143329

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ESQL: Add split coalescing, cost-aware distribution, and file splitting for external sources #143329

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions