Skip to content

ESQL: Add parallel execution for Arrow Flight multi-endpoint sources#143345

Merged
costin merged 2 commits intoelastic:mainfrom
costin:ws-a/flight-parallel-splits
Mar 1, 2026
Merged

ESQL: Add parallel execution for Arrow Flight multi-endpoint sources#143345
costin merged 2 commits intoelastic:mainfrom
costin:ws-a/flight-parallel-splits

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Mar 1, 2026

FlightConnector always connected to the original endpoint, ignoring
per-split locations returned by getFlightInfo(). When a Flight server
advertises multiple endpoints (each serving a partition of the data),
the connector now creates separate clients for each distinct location,
enabling true parallel reads across drivers.

Relates #143327

@costin costin added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.4.0 ES|QL|DS ES|QL datasources labels Mar 1, 2026
@costin costin requested a review from bpintea March 1, 2026 16:28
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @costin, I've created a changelog YAML for you.

FlightConnector always connected to the original endpoint, ignoring
per-split locations returned by getFlightInfo(). When a Flight server
advertises multiple endpoints (each serving a partition of the data),
the connector now creates separate clients for each distinct location,
enabling true parallel reads across drivers.

- FlightConnector: location-aware client routing with per-split clients
- EmployeeFlightServer: multi-endpoint partitioning mode for tests
- FlightSplitProviderTests: split discovery for single and multi-endpoint
- AsyncConnectorFactoryFlightTests: parallel multi-split execution tests

Developed using AI-assisted tooling
@costin costin force-pushed the ws-a/flight-parallel-splits branch from 4d8e481 to f6162c8 Compare March 1, 2026 18:36
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @costin, I've created a changelog YAML for you.

Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
One note: the FlightConnector so far still never execute FlightSplits, right? Split.SINGLE are the only ones passed so far.

@costin
Copy link
Copy Markdown
Member Author

costin commented Mar 1, 2026

LGTM. One note: the FlightConnector so far still never execute FlightSplits, right? Split.SINGLE are the only ones passed so far.

Right, will address this is a follow-up PR shortly

@costin costin enabled auto-merge (squash) March 1, 2026 19:33
@costin costin disabled auto-merge March 1, 2026 19:51
@costin costin enabled auto-merge (squash) March 1, 2026 19:51
@costin costin merged commit 44f33b6 into elastic:main Mar 1, 2026
35 checks passed
@costin costin deleted the ws-a/flight-parallel-splits branch March 1, 2026 20:07
costin added a commit to costin/elasticsearch that referenced this pull request Mar 1, 2026
Adds graceful degradation, cost-aware distribution, sub-file splitting,
and transport serialization tests for external data sources. Builds on
the Arrow Flight parallel execution merged in elastic#143345.

- DataNodeComputeHandler: graceful degradation with per-node isolation
- WeightedRoundRobinStrategy: LPT-based cost-aware split distribution
- AdaptiveStrategy: auto-select weighted distribution when size available
- ComputeService: register weighted_round_robin strategy
- FileSplitProvider: sub-file splitting for row-based formats
- RangeStorageObject: byte-range view over StorageObject for split reads
- ExternalSourceOperatorFactory: RangeStorageObject integration
- FlightSplitCollectionSerializationTests: transport serialization coverage

Developed using AI-assisted tooling
costin added a commit to costin/elasticsearch that referenced this pull request Mar 1, 2026
Apply review fixes on top of the merged elastic#143345: host validation in
FlightConnector, FQN-to-import cleanup, and page.releaseBlocks() calls
in tests to prevent resource leaks.

- FlightConnector: validate URI host is not null/blank
- FlightConnector: use IOUtils import instead of FQN
- AsyncConnectorFactoryFlightTests: use ExternalSourceDrainUtils import,
  add page.releaseBlocks() in multi-split tests
- EmployeeFlightServer: use Booleans import instead of FQN
- EmployeeFlightServerTests: use StandardCharsets/Set/TreeSet imports

Relates elastic#143327

Developed using AI-assisted tooling
costin added a commit to costin/elasticsearch that referenced this pull request Mar 2, 2026
Adds graceful degradation, cost-aware distribution, sub-file splitting,
and transport serialization tests for external data sources. Builds on
the Arrow Flight parallel execution merged in elastic#143345.

- DataNodeComputeHandler: graceful degradation with per-node isolation
- WeightedRoundRobinStrategy: LPT-based cost-aware split distribution
- AdaptiveStrategy: auto-select weighted distribution when size available
- ComputeService: register weighted_round_robin strategy
- FileSplitProvider: sub-file splitting for row-based formats
- RangeStorageObject: byte-range view over StorageObject for split reads,
  with Check.notNull() validation and Math.addExact() overflow protection
- ExternalSourceOperatorFactory: RangeStorageObject integration
- FlightSplitCollectionSerializationTests: transport serialization coverage

Developed using AI-assisted tooling
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
…lastic#143345)

FlightConnector always connected to the original endpoint, ignoring
per-split locations returned by getFlightInfo(). When a Flight server
advertises multiple endpoints (each serving a partition of the data),
the connector now creates separate clients for each distinct location,
enabling true parallel reads across drivers.

Relates elastic#143327
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement ES|QL|DS ES|QL datasources Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants