Skip to content

Speed up dense/sparse vector stats#111729

Merged
jimczi merged 4 commits intoelastic:mainfrom
jimczi:vector_stats_optim
Aug 12, 2024
Merged

Speed up dense/sparse vector stats#111729
jimczi merged 4 commits intoelastic:mainfrom
jimczi:vector_stats_optim

Conversation

@jimczi
Copy link
Copy Markdown
Contributor

@jimczi jimczi commented Aug 9, 2024

This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.
This PR is marked as a performance bug since deployments with lots of fields/segments are impacted when performing index stats even if they don't define a sparse/dense vector field.

Closes #111715

jimczi added 2 commits August 9, 2024 08:58
This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes elastic#111715
@jimczi jimczi requested a review from kderusso August 9, 2024 00:03
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 9, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

Copy link
Copy Markdown
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change so quickly!

@jimczi jimczi merged commit 59cf661 into elastic:main Aug 12, 2024
@jimczi jimczi deleted the vector_stats_optim branch August 12, 2024 00:02
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.15

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Aug 12, 2024
This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes elastic#111715
elasticsearchmachine pushed a commit that referenced this pull request Aug 12, 2024
This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes #111715
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes elastic#111715
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Sep 5, 2024
This change ensures that we don't try to compute stats on mappings that don't have dense or sparse vector fields. We don't need to go through all the fields on every segment, instead we can extract the vector fields upfront and limit the work to only indices that define these types.

Closes elastic#111715
dnhatn added a commit that referenced this pull request Sep 10, 2024
If a segment doesn't contain any documents with a dense_vector field, 
but the mapping defines it, an NPE can occur when retrieving the
dense_vector stats.

Relates #111729
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Sep 10, 2024
If a segment doesn't contain any documents with a dense_vector field, 
but the mapping defines it, an NPE can occur when retrieving the
dense_vector stats.

Relates elastic#111729
elasticsearchmachine pushed a commit that referenced this pull request Sep 10, 2024
If a segment doesn't contain any documents with a dense_vector field, 
but the mapping defines it, an NPE can occur when retrieving the
dense_vector stats.

Relates #111729
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.15.1 v8.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Engine#getSparseVectorValueCount seems rather expensive

3 participants