ESQL: Add parallel execution for Arrow Flight multi-endpoint sources#143345
Merged
costin merged 2 commits intoelastic:mainfrom Mar 1, 2026
Merged
ESQL: Add parallel execution for Arrow Flight multi-endpoint sources#143345costin merged 2 commits intoelastic:mainfrom
costin merged 2 commits intoelastic:mainfrom
Conversation
Collaborator
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Collaborator
|
Hi @costin, I've created a changelog YAML for you. |
FlightConnector always connected to the original endpoint, ignoring per-split locations returned by getFlightInfo(). When a Flight server advertises multiple endpoints (each serving a partition of the data), the connector now creates separate clients for each distinct location, enabling true parallel reads across drivers. - FlightConnector: location-aware client routing with per-split clients - EmployeeFlightServer: multi-endpoint partitioning mode for tests - FlightSplitProviderTests: split discovery for single and multi-endpoint - AsyncConnectorFactoryFlightTests: parallel multi-split execution tests Developed using AI-assisted tooling
4d8e481 to
f6162c8
Compare
Collaborator
|
Hi @costin, I've created a changelog YAML for you. |
bpintea
approved these changes
Mar 1, 2026
Contributor
bpintea
left a comment
There was a problem hiding this comment.
LGTM.
One note: the FlightConnector so far still never execute FlightSplits, right? Split.SINGLE are the only ones passed so far.
Member
Author
Right, will address this is a follow-up PR shortly |
3 tasks
costin
added a commit
to costin/elasticsearch
that referenced
this pull request
Mar 1, 2026
Adds graceful degradation, cost-aware distribution, sub-file splitting, and transport serialization tests for external data sources. Builds on the Arrow Flight parallel execution merged in elastic#143345. - DataNodeComputeHandler: graceful degradation with per-node isolation - WeightedRoundRobinStrategy: LPT-based cost-aware split distribution - AdaptiveStrategy: auto-select weighted distribution when size available - ComputeService: register weighted_round_robin strategy - FileSplitProvider: sub-file splitting for row-based formats - RangeStorageObject: byte-range view over StorageObject for split reads - ExternalSourceOperatorFactory: RangeStorageObject integration - FlightSplitCollectionSerializationTests: transport serialization coverage Developed using AI-assisted tooling
costin
added a commit
to costin/elasticsearch
that referenced
this pull request
Mar 1, 2026
Apply review fixes on top of the merged elastic#143345: host validation in FlightConnector, FQN-to-import cleanup, and page.releaseBlocks() calls in tests to prevent resource leaks. - FlightConnector: validate URI host is not null/blank - FlightConnector: use IOUtils import instead of FQN - AsyncConnectorFactoryFlightTests: use ExternalSourceDrainUtils import, add page.releaseBlocks() in multi-split tests - EmployeeFlightServer: use Booleans import instead of FQN - EmployeeFlightServerTests: use StandardCharsets/Set/TreeSet imports Relates elastic#143327 Developed using AI-assisted tooling
costin
added a commit
to costin/elasticsearch
that referenced
this pull request
Mar 2, 2026
Adds graceful degradation, cost-aware distribution, sub-file splitting, and transport serialization tests for external data sources. Builds on the Arrow Flight parallel execution merged in elastic#143345. - DataNodeComputeHandler: graceful degradation with per-node isolation - WeightedRoundRobinStrategy: LPT-based cost-aware split distribution - AdaptiveStrategy: auto-select weighted distribution when size available - ComputeService: register weighted_round_robin strategy - FileSplitProvider: sub-file splitting for row-based formats - RangeStorageObject: byte-range view over StorageObject for split reads, with Check.notNull() validation and Math.addExact() overflow protection - ExternalSourceOperatorFactory: RangeStorageObject integration - FlightSplitCollectionSerializationTests: transport serialization coverage Developed using AI-assisted tooling
tballison
pushed a commit
to tballison/elasticsearch
that referenced
this pull request
Mar 3, 2026
…lastic#143345) FlightConnector always connected to the original endpoint, ignoring per-split locations returned by getFlightInfo(). When a Flight server advertises multiple endpoints (each serving a partition of the data), the connector now creates separate clients for each distinct location, enabling true parallel reads across drivers. Relates elastic#143327
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
FlightConnector always connected to the original endpoint, ignoring
per-split locations returned by getFlightInfo(). When a Flight server
advertises multiple endpoints (each serving a partition of the data),
the connector now creates separate clients for each distinct location,
enabling true parallel reads across drivers.
Relates #143327