Skip to content

ES|QL - dense_vector support for COUNT, PRESENT, ABSENT aggregator functions#139914

Merged
carlosdelest merged 40 commits intoelastic:mainfrom
carlosdelest:enhancement/esql-support-dense-vector-agg-functions
Jan 12, 2026
Merged

ES|QL - dense_vector support for COUNT, PRESENT, ABSENT aggregator functions#139914
carlosdelest merged 40 commits intoelastic:mainfrom
carlosdelest:enhancement/esql-support-dense-vector-agg-functions

Conversation

@carlosdelest
Copy link
Copy Markdown
Member

@carlosdelest carlosdelest commented Dec 22, 2025

Adds support for COUNT, PRESENT, ABSENT aggregator functions for the dense_vector type.

Closes #135688

@carlosdelest carlosdelest added >enhancement Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch Team:Search - Relevance The Search organization Search Relevance team :Search Relevance/ES|QL Search functionality in ES|QL v9.4.0 labels Dec 22, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

carlosdelest and others added 7 commits December 23, 2025 08:59
…ort-dense-vector-agg-functions

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Absent.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Count.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Present.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/aggregate/AbsentErrorTests.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/aggregate/CountErrorTests.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/aggregate/PresentErrorTests.java
@carlosdelest
Copy link
Copy Markdown
Member Author

@elasticsearchmachine run elasticsearch-ci/bwc-snapshots-part3

@carlosdelest carlosdelest added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 8, 2026
@carlosdelest carlosdelest marked this pull request as ready for review January 8, 2026 06:30
@elasticsearchmachine elasticsearchmachine removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search - Relevance The Search organization Search Relevance team labels Jan 8, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@carlosdelest carlosdelest requested review from a team, afoucret, ioanatia and kkharbas January 8, 2026 06:33
"cartesian_shape",
"date",
"date_nanos",
"dense_vector",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't throw tomatoes at me, but we don't have any CSV tests for absent_over_time/count_over_time/present_over_time 🙈

but since these inherit from absent/count and present I am not 100% sure they are needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No 🍅s, thanks for pointing that out. I added tests and data in 5d287e0

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's not that straightforward to implement aggregations over time. I've decided to open a new issue for supporting them in #140522

if (field.foldable()) {
if (field instanceof Literal l) {
if (l.value() != null && (l.value() instanceof List<?>) == false) {
if (l.value() != null && ((l.value() instanceof List<?>) == false || l.dataType() == DENSE_VECTOR)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get a csv test for this? I think this is the right behaviour, but I'd like a test to catch any regression.
for example, the following returns 3 as we are dealing with a multi-valued field:

ROW a = [1,2, 3]
| STATS b = count(a)

but if we deal with a dense vector we should be returning 1, which is the case in your PR but we need a test:

ROW a = [1,2, 3]::dense_vector
| STATS b = count(a)

returns 1 which is correct.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should be pretty careful with these. A paranoid level of csv tests is good here. i haven't checked what you have, but this is absolutely a place where too many tests is better than just enough.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I added a CSV test for this case in 528e2a5

import java.util.List;

public class CountAggregatorFunction implements AggregatorFunction {
public static AggregatorFunctionSupplier supplier() {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored into a named class so it can be overriden by subclasses

"cartesian_shape",
"date",
"date_nanos",
"dense_vector",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No 🍅s, thanks for pointing that out. I added tests and data in 5d287e0

if (field.foldable()) {
if (field instanceof Literal l) {
if (l.value() != null && (l.value() instanceof List<?>) == false) {
if (l.value() != null && ((l.value() instanceof List<?>) == false || l.dataType() == DENSE_VECTOR)) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I added a CSV test for this case in 528e2a5

This reverts commit b891cdf.
This reverts commit 2277a99.
This reverts commit 6a4cd10.
This reverts commit 0336ac0.
…ort-dense-vector-agg-functions

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/dense_vector.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@carlosdelest carlosdelest merged commit d2f5917 into elastic:main Jan 12, 2026
35 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Jan 12, 2026
…i-project-tests

* upstream/main: (23 commits)
  Fix `testAckListenerReceivesNacksIfPublicationTimesOut` (elastic#140514)
  Reduce priority of clear-cache tasks (elastic#139685)
  Add docs and tests about `StreamOutput` to memory (elastic#140365)
  ES|QL - dense_vector support for COUNT, PRESENT, ABSENT aggregator functions (elastic#139914)
  Add release notes for v9.2.4 release (elastic#140487)
  Add release notes for v9.1.10 release (elastic#140488)
  Add conncectors release notes for 9.1.10, 9.2.4 (elastic#140499)
  Add parameter support in PromQL query durations (elastic#139873)
  Improve testing of STS credentials reloading (elastic#140114)
  Fix zstd native binary publishing script to support newer versions (elastic#140485)
  Add FlattenedFieldBinaryVsSortedSetDocValuesSyntheticSourceIT (elastic#140489)
  Store fallback match only text fields in binary doc values (elastic#140189)
  [DiskBBQ] Use the new merge executor for intra-merge parallelism (elastic#139942)
  ESQL: introduce support for mapping-unavailable fields (elastic#140463)
  Add ESNextOSQVectorsScorerTests (elastic#140436)
  Disable high cardinality tests on release builds (elastic#140503)
  ESQL: TRange timezone support (elastic#139911)
  Directly compressing `StreamOutput` (elastic#140502)
  ES|QL - fix dense vector enrich bug (elastic#139774)
  Use CrossProjectModeDecider in RemoteClusterService (elastic#140481)
  ...
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/ES|QL Search functionality in ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES|QL - Support PRESENT, ABSENT, COUNT aggregation functions for dense_vector

5 participants