Skip to content

[ES|QL] Add schema reconciliation for multi-file external sources#145220

Merged
costin merged 5 commits intoelastic:mainfrom
costin:esql/schema-reconciliation
Mar 31, 2026
Merged

[ES|QL] Add schema reconciliation for multi-file external sources#145220
costin merged 5 commits intoelastic:mainfrom
costin:esql/schema-reconciliation

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Mar 30, 2026

Adds schema reconciliation for external file sources that span multiple files
with potentially different schemas. Users select a strategy via the WITH clause:

FROM "s3://bucket/data/*.parquet" WITH {"schema_resolution": "strict"}
FROM "s3://bucket/data/*.parquet" WITH {"schema_resolution": "union_by_name"}

Problem

FIRST_FILE_WINS (the current default) silently ignores schema differences
across files. This causes wrong column values when files have different column
ordering, or runtime errors when types don't match — with no clear message
pointing to the root cause.

Strategies

  • STRICT — all files must share the exact same schema. Fails at planning
    time with a descriptive error naming the offending file and column, and a
    hint to use union_by_name.
  • UNION_BY_NAME — merges schemas into a superset by column name. Missing
    columns are NULL-filled. Only lossless type widening is allowed (INTEGER→LONG,
    INTEGER→DOUBLE, DATETIME→DATE_NANOS). LONG→DOUBLE is rejected (lossy above
    2^53), matching DuckDB and Spark consensus.

The default remains first_file_wins for backward compatibility.

Design rationale

Informed by a survey of DuckDB, Spark, ClickHouse, and Cribl — see
SCHEMA_RECONCILIATION_DESIGN.md. Both strategies scan all file metadata
eagerly at planning time (footer reads only, parallelized with bounded
concurrency). This also collects per-file statistics for aggregate pushdown
Phase 2 at zero extra cost.

Type widening uses a custom schemaWiden() — not EsqlDataTypeConverter.commonType()
which allows lossy LONG→DOUBLE. Column matching is case-sensitive, consistent
with ESQL's field resolution semantics.

Developed with AI-assisted tooling

@costin costin requested a review from bpintea March 30, 2026 16:41
@costin costin enabled auto-merge (squash) March 30, 2026 16:41
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 30, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @costin, I've created a changelog YAML for you.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

costin added 4 commits March 31, 2026 00:59
Adds planning-time schema reconciliation for external file sources that
span multiple files with potentially different schemas. Supports STRICT
(exact match) and UNION_BY_NAME (merge by column name with safe type
widening) strategies via the schema_resolution WITH clause parameter.

Core changes:
- SchemaReconciliation: reconciliation algorithms with ColumnMapping that
  handles both planning-time use and wire serialization (CastType enum
  ordinals, no strings on the wire)
- SchemaAdaptingIterator: execution-time adapter that reorders columns,
  inserts NULL blocks for missing columns, and casts blocks for type
  widening (INTEGER→LONG, INTEGER→DOUBLE, DATETIME→DATE_NANOS)
- ExternalSourceResolver: parallel metadata reading with bounded
  concurrency (semaphore + MAX_PARALLEL_METADATA_READS)
- FileSplit carries nullable ColumnMapping; FileSplitProvider shares
  the same instance across all splits from the same file via dedup cache

Developed with AI-assisted tooling
Wire SchemaAdaptingIterator into the async execution path
and fix partition column dimensionality in both sync/async
paths by using attributes.subList(0, columnCount()). Demote
per-query schema reconciliation timing log to debug. Add
ColumnMapping round-trip serialization tests and unit tests
for SchemaAdaptingIterator covering all cast types, null
fill, reorder, empty page, failure cleanup, and close
delegation.
- Sync factory: mapping-aware column projection in openFileSplit
- SchemaAdaptingIterator: constructor invariant for schema/mapping size
- Async factory: wire adaptSchema into startMultiFileRead path
- Remove unused logger field and trivial FileSplit adaptSchema overload
@costin costin force-pushed the esql/schema-reconciliation branch from c8489aa to 2f1bcad Compare March 31, 2026 09:45
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 31, 2026

🔍 Preview links for changed docs

⏳ Building and deploying preview... View progress

This comment will be updated with preview links when the build is complete.

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, except the serialization.

if (ordinal < 0 || ordinal >= VALUES.length) {
throw new IllegalArgumentException("Unknown cast ordinal: " + ordinal);
}
return VALUES[ordinal];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably cause bwc issues when we update this enum. We usually serialize the strings to prevent that.

@costin costin force-pushed the esql/schema-reconciliation branch from 190084c to e5ef28e Compare March 31, 2026 12:51
Switch from hand-rolled ordinal byte encoding to the standard
ES writeEnum/readEnum pattern. Add assertEnumSerialization test
to pin ordinal-to-value mapping per ES convention.
@costin costin force-pushed the esql/schema-reconciliation branch from e5ef28e to b69c1d8 Compare March 31, 2026 13:03
Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖-assisted review.

@costin costin merged commit ef3486e into elastic:main Mar 31, 2026
33 of 35 checks passed
@costin costin deleted the esql/schema-reconciliation branch March 31, 2026 14:26
@bpintea
Copy link
Copy Markdown
Contributor

bpintea commented Mar 31, 2026

@costin, just noticed, the PR description needs updating too.

szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 31, 2026
…rics

* upstream/main: (21 commits)
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.topSnippetsFunction} elastic#145353
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.scoreFunction} elastic#145352
  [DiskBBQ] Fix bug in NeighborQueue#popRawAndAddRaw (elastic#145324)
  Fix dense_vector default index options when using BFLOAT16 (elastic#145202)
  Use checked exceptions in entitlement constructor rules (elastic#145234)
  ESQL: DS: datasource file plugins should not return TEXT types (elastic#145334)
  Plumb DLM error store through to DlmFrozenTransition classes (elastic#145243)
  Make Settings.Builder.remove() fluent (elastic#145294)
  Add FLS tests for METRICS_INFO and TS_INFO (elastic#145211)
  Fix flaky SecurityFeatureResetTests (elastic#145063)
  [DOCS] Fix conflict markers in ESQL processing command list (elastic#145338)
  Skip certain metric assertions on Windows (elastic#144933)
  [ES|QL] Add schema reconciliation for multi-file external sources (elastic#145220)
  Simplify DiskBBQ dynamic visit ratio to linear (elastic#142784)
  ESQL: Disallow unmapped_fields=load with partial non-KEYWORD (elastic#144109)
  [Transform] Track Linked Projects (elastic#144399)
  Fix bulk scoring to process last batch instead of falling through to scalar tail (elastic#145316)
  Clean up TickerScheduleEngineTests (elastic#145303)
  [CI] ShardBulkInferenceActionFilterIT testRestart - Ensuring that secrets-inference index is available after full restart and unmuting test (elastic#145317)
  Add CRUD doc to the DistributedArchitectureGuide (elastic#144710)
  ...
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Apr 1, 2026
…astic#145220)

* [ES|QL] Add schema reconciliation for multi-file external sources

Adds planning-time schema reconciliation for external file sources that
span multiple files with potentially different schemas. Supports STRICT
(exact match) and UNION_BY_NAME (merge by column name with safe type
widening) strategies via the schema_resolution WITH clause parameter.

Core changes:
- SchemaReconciliation: reconciliation algorithms with ColumnMapping that
  handles both planning-time use and wire serialization (CastType enum
  ordinals, no strings on the wire)
- SchemaAdaptingIterator: execution-time adapter that reorders columns,
  inserts NULL blocks for missing columns, and casts blocks for type
  widening (INTEGER→LONG, INTEGER→DOUBLE, DATETIME→DATE_NANOS)
- ExternalSourceResolver: parallel metadata reading with bounded
  concurrency (semaphore + MAX_PARALLEL_METADATA_READS)
- FileSplit carries nullable ColumnMapping; FileSplitProvider shares
  the same instance across all splits from the same file via dedup cache

Developed with AI-assisted tooling

* Fix schema reconciliation gaps for multi-file sources

Wire SchemaAdaptingIterator into the async execution path
and fix partition column dimensionality in both sync/async
paths by using attributes.subList(0, columnCount()). Demote
per-query schema reconciliation timing log to debug. Add
ColumnMapping round-trip serialization tests and unit tests
for SchemaAdaptingIterator covering all cast types, null
fill, reorder, empty page, failure cleanup, and close
delegation.

* Harden schema reconciliation gaps and clean up async factory

- Sync factory: mapping-aware column projection in openFileSplit
- SchemaAdaptingIterator: constructor invariant for schema/mapping size
- Async factory: wire adaptSchema into startMultiFileRead path
- Remove unused logger field and trivial FileSplit adaptSchema overload

* Update docs/changelog/145220.yaml

* Use writeEnum/readEnum for CastType serialization

Switch from hand-rolled ordinal byte encoding to the standard
ES writeEnum/readEnum pattern. Add assertEnumSerialization test
to pin ordinal-to-value mapping per ES convention.
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
…astic#145220)

* [ES|QL] Add schema reconciliation for multi-file external sources

Adds planning-time schema reconciliation for external file sources that
span multiple files with potentially different schemas. Supports STRICT
(exact match) and UNION_BY_NAME (merge by column name with safe type
widening) strategies via the schema_resolution WITH clause parameter.

Core changes:
- SchemaReconciliation: reconciliation algorithms with ColumnMapping that
  handles both planning-time use and wire serialization (CastType enum
  ordinals, no strings on the wire)
- SchemaAdaptingIterator: execution-time adapter that reorders columns,
  inserts NULL blocks for missing columns, and casts blocks for type
  widening (INTEGER→LONG, INTEGER→DOUBLE, DATETIME→DATE_NANOS)
- ExternalSourceResolver: parallel metadata reading with bounded
  concurrency (semaphore + MAX_PARALLEL_METADATA_READS)
- FileSplit carries nullable ColumnMapping; FileSplitProvider shares
  the same instance across all splits from the same file via dedup cache

Developed with AI-assisted tooling

* Fix schema reconciliation gaps for multi-file sources

Wire SchemaAdaptingIterator into the async execution path
and fix partition column dimensionality in both sync/async
paths by using attributes.subList(0, columnCount()). Demote
per-query schema reconciliation timing log to debug. Add
ColumnMapping round-trip serialization tests and unit tests
for SchemaAdaptingIterator covering all cast types, null
fill, reorder, empty page, failure cleanup, and close
delegation.

* Harden schema reconciliation gaps and clean up async factory

- Sync factory: mapping-aware column projection in openFileSplit
- SchemaAdaptingIterator: constructor invariant for schema/mapping size
- Async factory: wire adaptSchema into startMultiFileRead path
- Remove unused logger field and trivial FileSplit adaptSchema overload

* Update docs/changelog/145220.yaml

* Use writeEnum/readEnum for CastType serialization

Switch from hand-rolled ordinal byte encoding to the standard
ES writeEnum/readEnum pattern. Add assertEnumSerialization test
to pin ordinal-to-value mapping per ES convention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement ES|QL|DS ES|QL datasources Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants