ESQL: External source parallel execution and distribution#143349
Merged
costin merged 1 commit intoelastic:mainfrom Mar 2, 2026
Merged
ESQL: External source parallel execution and distribution#143349costin merged 1 commit intoelastic:mainfrom
costin merged 1 commit intoelastic:mainfrom
Conversation
Collaborator
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Collaborator
|
Hi @costin, I've created a changelog YAML for you. |
This was referenced Mar 1, 2026
0b37dc1 to
b93afad
Compare
bpintea
approved these changes
Mar 2, 2026
Comment on lines
+29
to
+34
| if (offset < 0) { | ||
| throw new IllegalArgumentException("offset must be >= 0, got: " + offset); | ||
| } | ||
| if (length < 0) { | ||
| throw new IllegalArgumentException("length must be >= 0, got: " + length); | ||
| } |
|
|
||
| @Override | ||
| public InputStream newStream(long position, long rangeLength) throws IOException { | ||
| return delegate.newStream(offset + position, rangeLength); |
Contributor
There was a problem hiding this comment.
Should we do here exact math?
Adds graceful degradation, cost-aware distribution, sub-file splitting, and transport serialization tests for external data sources. Builds on the Arrow Flight parallel execution merged in elastic#143345. - DataNodeComputeHandler: graceful degradation with per-node isolation - WeightedRoundRobinStrategy: LPT-based cost-aware split distribution - AdaptiveStrategy: auto-select weighted distribution when size available - ComputeService: register weighted_round_robin strategy - FileSplitProvider: sub-file splitting for row-based formats - RangeStorageObject: byte-range view over StorageObject for split reads, with Check.notNull() validation and Math.addExact() overflow protection - ExternalSourceOperatorFactory: RangeStorageObject integration - FlightSplitCollectionSerializationTests: transport serialization coverage Developed using AI-assisted tooling
951be6d to
0f78ca7
Compare
Member
Author
|
Addressed both review comments:
Also rebased onto latest |
szybia
added a commit
to szybia/elasticsearch
that referenced
this pull request
Mar 2, 2026
…locations
* upstream/main: (94 commits)
Mute org.elasticsearch.xpack.esql.qa.mixed.EsqlClientYamlIT test {p0=esql/40_tsdb/TS Command grouping on text field} elastic#142544
Mute org.elasticsearch.index.store.StoreDirectoryMetricsIT testDirectoryMetrics elastic#143419
Mute org.elasticsearch.xpack.esql.qa.multi_node.GenerativeIT test elastic#143023
TS_INFO information retrieval command (elastic#142721)
ESQL: External source parallel execution and distribution (elastic#143349)
Mute org.elasticsearch.index.mapper.blockloader.FlattenedFieldRootBlockLoaderTests testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=DOC_VALUES]} elastic#143414
Mute org.elasticsearch.index.mapper.blockloader.FlattenedFieldRootBlockLoaderTests testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=NONE]} elastic#143413
Mute org.elasticsearch.index.mapper.blockloader.FlattenedFieldRootBlockLoaderTests testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=STORED]} elastic#143412
Removing ingest random sampling (elastic#143289)
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#143023
[Transform] Clean up internal tests (elastic#143246)
Skip time series field type merge for non-TS agg queries (elastic#143262)
Enable zero-copy SIMD vector scoring on searchable snapshots (frozen tier) (elastic#141718)
Mute org.elasticsearch.xpack.search.CrossClusterAsyncSearchIT testCancelViaExpirationOnRemoteResultsWithMinimizeRoundtrips elastic#143407
Fix MemorySegmentUtilsTests (elastic#143391)
Unmute testWorkflowsRestrictionAllowsAccess (elastic#143308)
Cancel async query on expiry (elastic#143016)
ESQL: Finish migrating error testing (elastic#143322)
Reduce LuceneOperator.Status memory consumption with large QueryDSL queries (elastic#143175)
ESQL: Generative testing with full text functions (elastic#142961)
...
tballison
pushed a commit
to tballison/elasticsearch
that referenced
this pull request
Mar 3, 2026
…3349) Adds parallel execution, graceful degradation, cost-aware distribution, and sub-file splitting for external data sources. This enables ESQL to distribute external source queries across data nodes with resilience and load balancing. Arrow Flight connectors can now discover multiple endpoints and read partitions in parallel. When nodes fail during distributed execution, partial results are returned instead of failing the entire query. The adaptive strategy automatically selects weighted round-robin distribution when split size information is available, balancing load across nodes proportionally. Large row-based files (CSV, NDJSON) can be split into byte-range chunks for finer-grained parallelism. Relates elastic#143327
This was referenced Mar 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds parallel execution, graceful degradation, cost-aware distribution,
and sub-file splitting for external data sources. This enables ESQL to
distribute external source queries across data nodes with resilience
and load balancing.
Arrow Flight connectors can now discover multiple endpoints and read
partitions in parallel. When nodes fail during distributed execution,
partial results are returned instead of failing the entire query.
The adaptive strategy automatically selects weighted round-robin
distribution when split size information is available, balancing load
across nodes proportionally. Large row-based files (CSV, NDJSON) can
be split into byte-range chunks for finer-grained parallelism.
Relates #143327