ES|QL - dense_vector support for COUNT, PRESENT, ABSENT aggregator functions#139914
Conversation
…ense-vector-agg-functions' into enhancement/esql-support-dense-vector-agg-functions
|
Hi @carlosdelest, I've created a changelog YAML for you. |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
…ttps://github.com/carlosdelest/elasticsearch into enhancement/esql-support-dense-vector-agg-functions
…ort-dense-vector-agg-functions
…ort-dense-vector-agg-functions # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Absent.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Count.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Present.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/aggregate/AbsentErrorTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/aggregate/CountErrorTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/aggregate/PresentErrorTests.java
|
@elasticsearchmachine run elasticsearch-ci/bwc-snapshots-part3 |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
| "cartesian_shape", | ||
| "date", | ||
| "date_nanos", | ||
| "dense_vector", |
There was a problem hiding this comment.
don't throw tomatoes at me, but we don't have any CSV tests for absent_over_time/count_over_time/present_over_time 🙈
but since these inherit from absent/count and present I am not 100% sure they are needed?
There was a problem hiding this comment.
No 🍅s, thanks for pointing that out. I added tests and data in 5d287e0
There was a problem hiding this comment.
Actually, it's not that straightforward to implement aggregations over time. I've decided to open a new issue for supporting them in #140522
| if (field.foldable()) { | ||
| if (field instanceof Literal l) { | ||
| if (l.value() != null && (l.value() instanceof List<?>) == false) { | ||
| if (l.value() != null && ((l.value() instanceof List<?>) == false || l.dataType() == DENSE_VECTOR)) { |
There was a problem hiding this comment.
can we get a csv test for this? I think this is the right behaviour, but I'd like a test to catch any regression.
for example, the following returns 3 as we are dealing with a multi-valued field:
ROW a = [1,2, 3]
| STATS b = count(a)
but if we deal with a dense vector we should be returning 1, which is the case in your PR but we need a test:
ROW a = [1,2, 3]::dense_vector
| STATS b = count(a)
returns 1 which is correct.
There was a problem hiding this comment.
Yeah, we should be pretty careful with these. A paranoid level of csv tests is good here. i haven't checked what you have, but this is absolutely a place where too many tests is better than just enough.
There was a problem hiding this comment.
That makes sense - I added a CSV test for this case in 528e2a5
| import java.util.List; | ||
|
|
||
| public class CountAggregatorFunction implements AggregatorFunction { | ||
| public static AggregatorFunctionSupplier supplier() { |
There was a problem hiding this comment.
Refactored into a named class so it can be overriden by subclasses
| "cartesian_shape", | ||
| "date", | ||
| "date_nanos", | ||
| "dense_vector", |
There was a problem hiding this comment.
No 🍅s, thanks for pointing that out. I added tests and data in 5d287e0
| if (field.foldable()) { | ||
| if (field instanceof Literal l) { | ||
| if (l.value() != null && (l.value() instanceof List<?>) == false) { | ||
| if (l.value() != null && ((l.value() instanceof List<?>) == false || l.dataType() == DENSE_VECTOR)) { |
There was a problem hiding this comment.
That makes sense - I added a CSV test for this case in 528e2a5
This reverts commit b891cdf.
This reverts commit 2277a99.
This reverts commit 6a4cd10.
This reverts commit 0336ac0.
This reverts commit 5d287e0.
…ort-dense-vector-agg-functions # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/dense_vector.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
…i-project-tests * upstream/main: (23 commits) Fix `testAckListenerReceivesNacksIfPublicationTimesOut` (elastic#140514) Reduce priority of clear-cache tasks (elastic#139685) Add docs and tests about `StreamOutput` to memory (elastic#140365) ES|QL - dense_vector support for COUNT, PRESENT, ABSENT aggregator functions (elastic#139914) Add release notes for v9.2.4 release (elastic#140487) Add release notes for v9.1.10 release (elastic#140488) Add conncectors release notes for 9.1.10, 9.2.4 (elastic#140499) Add parameter support in PromQL query durations (elastic#139873) Improve testing of STS credentials reloading (elastic#140114) Fix zstd native binary publishing script to support newer versions (elastic#140485) Add FlattenedFieldBinaryVsSortedSetDocValuesSyntheticSourceIT (elastic#140489) Store fallback match only text fields in binary doc values (elastic#140189) [DiskBBQ] Use the new merge executor for intra-merge parallelism (elastic#139942) ESQL: introduce support for mapping-unavailable fields (elastic#140463) Add ESNextOSQVectorsScorerTests (elastic#140436) Disable high cardinality tests on release builds (elastic#140503) ESQL: TRange timezone support (elastic#139911) Directly compressing `StreamOutput` (elastic#140502) ES|QL - fix dense vector enrich bug (elastic#139774) Use CrossProjectModeDecider in RemoteClusterService (elastic#140481) ...
Adds support for
COUNT,PRESENT,ABSENTaggregator functions for thedense_vectortype.Closes #135688