Skip to content

ESQL: Add pluggable partition detection and virtual columns#143120

Merged
costin merged 7 commits intoelastic:mainfrom
costin:esql/ds-distributed/stage-2.5
Feb 26, 2026
Merged

ESQL: Add pluggable partition detection and virtual columns#143120
costin merged 7 commits intoelastic:mainfrom
costin:esql/ds-distributed/stage-2.5

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Feb 25, 2026

Introduce a PartitionDetector SPI with Hive, template, and auto
strategies so partition columns are discovered from file paths and
injected as virtual columns into the schema and execution pages.

  • PartitionDetector, PartitionConfig, HivePartitionDetector,
    TemplatePartitionDetector, AutoPartitionDetector: pluggable detection
  • VirtualColumnInjector: injects constant blocks for partition values
  • ExternalSourceResolver: enriches schema with ReferenceAttribute
    partition columns (not FieldAttribute) preserving path order
  • GlobExpander: template-aware glob rewriting with filter hints
  • PartitionMetadata, FileSplit, SourceOperatorContext: order-preserving
    maps/sets via LinkedHashMap/LinkedHashSet with Maps preallocation
  • AsyncExternalSourceOperatorFactory, FileSourceFactory: wired virtual
    column injection into the read pipeline
  • FileSplitProvider, PartitionFilterHintExtractor: switch patterns
  • Config key strings extracted to PartitionConfig constants

Relates #142996

Developed using AI-assisted tooling

Introduce a PartitionDetector SPI with Hive, template, and auto
strategies so partition columns are discovered from file paths and
injected as virtual columns into the schema and execution pages.

- PartitionDetector, PartitionConfig, HivePartitionDetector,
  TemplatePartitionDetector, AutoPartitionDetector: pluggable detection
- VirtualColumnInjector: injects constant blocks for partition values
- ExternalSourceResolver: enriches schema with ReferenceAttribute
  partition columns (not FieldAttribute) preserving path order
- GlobExpander: template-aware glob rewriting with filter hints
- PartitionMetadata, FileSplit, SourceOperatorContext: order-preserving
  maps/sets via LinkedHashMap/LinkedHashSet with Maps preallocation
- AsyncExternalSourceOperatorFactory, FileSourceFactory: wired virtual
  column injection into the read pipeline
- FileSplitProvider, PartitionFilterHintExtractor: switch patterns
- Config key strings extracted to PartitionConfig constants

Developed using AI-assisted tooling
@costin costin added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.4.0 labels Feb 25, 2026
@costin costin added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.4.0 labels Feb 25, 2026
@costin costin requested a review from bpintea February 25, 2026 22:22
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @costin, I've created a changelog YAML for you.

elasticsearchmachine and others added 4 commits February 25, 2026 22:35
The rebase onto main dropped imports from FileSplitProvider and
SourceOperatorContext that were introduced by the stage-2 commit.

- FileSplitProvider: restored Expression, comparison type, and
  BiFunction imports needed by partition filter evaluation
- SourceOperatorContext: restored Nullable and ExternalSplit imports,
  added missing split field to Builder

Developed using AI-assisted tooling
Add @SuppressWarnings for the /**/ glob pattern in string literal
that checkstyle misinterprets as an empty javadoc comment.

Developed using AI-assisted tooling
@bpintea bpintea removed their assignment Feb 26, 2026
Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
It'd be great to have qa tests added too, though.

return doExpandCommaSeparated(pathList, provider, hints, hivePartitioning, null, null);
}

static FileSet expandCommaSeparated(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not used.

Map<String, Object> partitionValues,
BlockFactory blockFactory
) {
if (fullOutput == null || fullOutput.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check.isTrue(Strings.hasText(fulOutput), "...") and .notNull() for the others.

final class VirtualColumnInjector {

private final List<Attribute> fullOutput;
private final Set<String> partitionColumnNames;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could be local var.

@costin costin merged commit 538e40d into elastic:main Feb 26, 2026
35 checks passed
@costin costin deleted the esql/ds-distributed/stage-2.5 branch February 26, 2026 15:54
costin added a commit to costin/elasticsearch that referenced this pull request Feb 26, 2026
Remove unused 6-arg expandCommaSeparated overload from GlobExpander.
In VirtualColumnInjector, replace manual null/empty checks with
Check.isTrue and Check.notNull, and demote partitionColumnNames
from a field to a constructor-local variable.

Developed using AI-assisted tooling
costin added a commit to costin/elasticsearch that referenced this pull request Feb 26, 2026
Remove unused 6-arg expandCommaSeparated overload from GlobExpander.
In VirtualColumnInjector, replace manual null/empty checks with
Check.isTrue and Check.notNull, and demote partitionColumnNames
from a field to a constructor-local variable.

Developed using AI-assisted tooling
PeteGillinElastic pushed a commit to PeteGillinElastic/elasticsearch that referenced this pull request Feb 27, 2026
…143120)

Introduce a PartitionDetector SPI with Hive, template, and auto
strategies so partition columns are discovered from file paths and
injected as virtual columns into the schema and execution pages.

PartitionDetector, PartitionConfig, HivePartitionDetector,
TemplatePartitionDetector, AutoPartitionDetector: pluggable detection
VirtualColumnInjector: injects constant blocks for partition values
ExternalSourceResolver: enriches schema with ReferenceAttribute
partition columns (not FieldAttribute) preserving path order
GlobExpander: template-aware glob rewriting with filter hints
PartitionMetadata, FileSplit, SourceOperatorContext: order-preserving
maps/sets via LinkedHashMap/LinkedHashSet with Maps preallocation
AsyncExternalSourceOperatorFactory, FileSourceFactory: wired virtual
column injection into the read pipeline
FileSplitProvider, PartitionFilterHintExtractor: switch patterns
Config key strings extracted to PartitionConfig constants

Relates elastic#142996

Developed using AI-assisted tooling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants