Skip to content

ESQL: Add distribution property tests for external sources#143341

Merged
costin merged 10 commits intoelastic:mainfrom
costin:ws-d/distribution-property-tests
Mar 1, 2026
Merged

ESQL: Add distribution property tests for external sources#143341
costin merged 10 commits intoelastic:mainfrom
costin:ws-d/distribution-property-tests

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Feb 28, 2026

Randomized property tests that verify mathematical invariants of the external source split assignment algorithm. These are pure unit tests (no cluster needed) that catch correctness regressions in the distribution strategies.

The tests validate: completeness (every split assigned exactly once), bounded load imbalance (no node exceeds ceil(splits/nodes)), determinism (same inputs always produce same assignments), and correct behavior at boundary conditions (single split, zero eligible nodes, more nodes than splits, empty splits).

Builds on #143336

Part of #143330

Developed using AI-assisted tooling

External source distributed execution has no end-to-end test coverage.
Existing tests only assert plan structure at the unit level. This adds
a multi-node integration test that runs the same csv-spec queries across
all three distribution strategies (coordinator_only, round_robin, adaptive)
and asserts identical results — any divergence flags a split assignment,
exchange, or aggregation bug.

- multi-node build.gradle: add S3/GCS fixtures, datasource plugins
- ExternalDistributedClusters: 3-node cluster with S3 fixture wiring
- ExternalDistributedSpecIT: parameterized test runner cross-producting
  external-basic specs with storage backends and distribution modes

Developed using AI-assisted tooling
Randomized tests that verify mathematical invariants of the split
assignment algorithm: every split assigned exactly once, bounded load
imbalance (no node exceeds ceil(splits/nodes)), deterministic output,
and correct behavior at boundary conditions (single split, zero nodes,
more nodes than splits).

Builds on PR-D1 (elastic#143336) distributed integration test infrastructure.

- ExternalDistributionPropertyTests: 11 property-based tests covering
  RoundRobinStrategy, AdaptiveStrategy, and CoordinatorOnlyStrategy

Developed using AI-assisted tooling
@costin costin added >test Issues or PRs that are addressing/adding tests Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL ES|QL|DS ES|QL datasources labels Feb 28, 2026
@costin costin requested a review from bpintea February 28, 2026 17:48
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

When test resources (iceberg-fixtures) are bundled inside a JAR by the
internal-test-artifact plugin, the existing code silently skipped loading
or asserted file: protocol URLs and failed. Expand S3FixtureUtils and
AbstractExternalSourceSpecTestCase to enumerate and load fixtures from
both filesystem and JAR sources.

- S3FixtureUtils: added forEachFixtureEntry() and resolveLocalFixturesPath()
  utilities; loadFixturesFromResources() now dispatches to loadFixturesFromJar()
- AbstractExternalSourceSpecTestCase: loadGcsFixtures(), generateCompressedFixtures(),
  resolveLocalFixturesPath() now use the shared utility instead of direct file walks

Developed using AI-assisted tooling
@costin costin force-pushed the ws-d/distribution-property-tests branch from 968d15f to e6fb78b Compare February 28, 2026 18:44
Resolve conflicts in S3FixtureUtils and AbstractExternalSourceSpecTestCase
by keeping JAR-aware fixture loading. Fix AzureFixtureUtils to use
forEachFixtureEntry() for JAR support (same pattern as S3/GCS).
When test resources are bundled inside a JAR (CI), there is no
filesystem path for the LOCAL storage backend. Exclude LOCAL from
the test matrix when resolveLocalFixturesPath returns null. Also
filter Azure SDK reactor-netty thread leaks in the distributed
integration test suite.

- AbstractExternalSourceSpecTestCase: BACKENDS list now computed
  at class-load time, excluding LOCAL when fixtures are in a JAR
- ExternalDistributedSpecIT: added AzureReactorThreadFilter for
  reactor-http-nio and boundedElastic threads from Azure SDK

Developed using AI-assisted tooling
ExternalDistributedClusters called ElasticsearchCluster.local().nodes(3)
directly, which fails in serverless mode where changing the node count
is not supported. Delegate to Clusters.testCluster() — the same pattern
used by all other multi-node tests — so the serverless infrastructure
can substitute its own cluster builder.

- ExternalDistributedClusters: delegate to Clusters.testCluster()

Developed using AI-assisted tooling
In serverless mode the datasource connector plugins (esql-datasource-s3,
esql-datasource-gcs, etc.) are not installed, causing "Unsupported
storage scheme" errors. Add a runtime probe that detects missing
connectors and skips the test via assumeTrue.

- ExternalDistributedSpecIT: override shouldSkipTest with connector check

Developed using AI-assisted tooling
@costin costin enabled auto-merge (squash) March 1, 2026 18:32
@bpintea bpintea disabled auto-merge March 1, 2026 18:45
throw new IllegalStateException("Failed to resolve fixtures path", e);
}
}
return "/tmp";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably want to get the temporary directory system/os-dependent? A la System.getProperty("java.io.tmpdir")

public class ExternalDistributedSpecIT extends AbstractExternalSourceSpecTestCase {

/** Filters Azure SDK reactor-netty threads started by the Azure blob fixture. */
public static class AzureReactorThreadFilter implements ThreadFilter {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be replaced by existing org.elasticsearch.test.AzureReactorThreadFilter? Though I see here we filter an additional "azure-sdk-". Might be worth dedup'ing.
(Here or a following PR?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

costin added 2 commits March 1, 2026 21:33
- ExternalDistributedClusters: use System.getProperty("java.io.tmpdir")
  instead of hardcoded "/tmp" for the fixtures path fallback
- ExternalDistributedSpecIT: reuse framework AzureReactorThreadFilter,
  keep a small AzureSdkThreadFilter for the extra "azure-sdk-" prefix

Developed using AI-assisted tooling
…roperty-tests

# Conflicts:
#	x-pack/plugin/esql/qa/server/multi-node/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/multi_node/ExternalDistributedClusters.java
#	x-pack/plugin/esql/qa/server/multi-node/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/multi_node/ExternalDistributedSpecIT.java
@costin costin merged commit 30c94b2 into elastic:main Mar 1, 2026
2 of 5 checks passed
@costin costin deleted the ws-d/distribution-property-tests branch March 1, 2026 19:36
tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026
…43341)

* ESQL: Add spec-driven distributed integration tests for external sources

External source distributed execution has no end-to-end test coverage.
Existing tests only assert plan structure at the unit level. This adds
a multi-node integration test that runs the same csv-spec queries across
all three distribution strategies (coordinator_only, round_robin, adaptive)
and asserts identical results — any divergence flags a split assignment,
exchange, or aggregation bug.

- multi-node build.gradle: add S3/GCS fixtures, datasource plugins
- ExternalDistributedClusters: 3-node cluster with S3 fixture wiring
- ExternalDistributedSpecIT: parameterized test runner cross-producting
  external-basic specs with storage backends and distribution modes

Developed using AI-assisted tooling

* ESQL: Add distribution property tests for external sources

Randomized tests that verify mathematical invariants of the split
assignment algorithm: every split assigned exactly once, bounded load
imbalance (no node exceeds ceil(splits/nodes)), deterministic output,
and correct behavior at boundary conditions (single split, zero nodes,
more nodes than splits).

Builds on PR-D1 (elastic#143336) distributed integration test infrastructure.

- ExternalDistributionPropertyTests: 11 property-based tests covering
  RoundRobinStrategy, AdaptiveStrategy, and CoordinatorOnlyStrategy

Developed using AI-assisted tooling

* ESQL: Handle JAR-packaged fixture resources in tests

When test resources (iceberg-fixtures) are bundled inside a JAR by the
internal-test-artifact plugin, the existing code silently skipped loading
or asserted file: protocol URLs and failed. Expand S3FixtureUtils and
AbstractExternalSourceSpecTestCase to enumerate and load fixtures from
both filesystem and JAR sources.

- S3FixtureUtils: added forEachFixtureEntry() and resolveLocalFixturesPath()
  utilities; loadFixturesFromResources() now dispatches to loadFixturesFromJar()
- AbstractExternalSourceSpecTestCase: loadGcsFixtures(), generateCompressedFixtures(),
  resolveLocalFixturesPath() now use the shared utility instead of direct file walks

Developed using AI-assisted tooling

* ESQL: Skip LOCAL backend tests when fixtures are JAR-packaged

When test resources are bundled inside a JAR (CI), there is no
filesystem path for the LOCAL storage backend. Exclude LOCAL from
the test matrix when resolveLocalFixturesPath returns null. Also
filter Azure SDK reactor-netty thread leaks in the distributed
integration test suite.

- AbstractExternalSourceSpecTestCase: BACKENDS list now computed
  at class-load time, excluding LOCAL when fixtures are in a JAR
- ExternalDistributedSpecIT: added AzureReactorThreadFilter for
  reactor-http-nio and boundedElastic threads from Azure SDK

Developed using AI-assisted tooling

* ESQL: Use Clusters.testCluster for distributed tests

ExternalDistributedClusters called ElasticsearchCluster.local().nodes(3)
directly, which fails in serverless mode where changing the node count
is not supported. Delegate to Clusters.testCluster() — the same pattern
used by all other multi-node tests — so the serverless infrastructure
can substitute its own cluster builder.

- ExternalDistributedClusters: delegate to Clusters.testCluster()

Developed using AI-assisted tooling

* ESQL: Skip external distributed tests when connectors unavailable

In serverless mode the datasource connector plugins (esql-datasource-s3,
esql-datasource-gcs, etc.) are not installed, causing "Unsupported
storage scheme" errors. Add a runtime probe that detects missing
connectors and skips the test via assumeTrue.

- ExternalDistributedSpecIT: override shouldSkipTest with connector check

Developed using AI-assisted tooling

* ESQL: Address review feedback on distributed tests

- ExternalDistributedClusters: use System.getProperty("java.io.tmpdir")
  instead of hardcoded "/tmp" for the fixtures path fallback
- ExternalDistributedSpecIT: reuse framework AzureReactorThreadFilter,
  keep a small AzureSdkThreadFilter for the extra "azure-sdk-" prefix

Developed using AI-assisted tooling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL ES|QL|DS ES|QL datasources Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test Issues or PRs that are addressing/adding tests v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants