Add the total dense vector count in the indices stats output#98275
Add the total dense vector count in the indices stats output#98275jimczi merged 22 commits intoelastic:mainfrom
Conversation
This change adds the total dense vector count to the output of the indices stats. This is useful for observability in order to track the number of indexed vectors in a cluster.
|
Documentation preview: |
|
Pinging @elastic/es-search (Team:Search) |
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
I was unsure about the appropriate location to place the count, so I opted for the simplest solution, which involves adding it into the docs section. If there is a strong consensus that it doesn't belong there, I am willing to create a new section. However, it's essential to note that all the failing tests currently anticipate the docs section to be without the extra field. Until we finalize the best section for the new metric, I will hold off on making any fixes to these tests |
For me this kinda depends on what, if any, related data we may want to add in the future. E.g. Separately, the total number of byte / float vectors? Or maybe the total size of byte / float vectors. (What else would be interesting for Observability purposes? ) But maybe these things are not all that interesting, or more appropriate at a different (lower) level API. |
|
@jimczi could I understand what actions are useful for this o11y? Are we talking about just knowing vectors for telemetry? Or do we want to know how much off-heap ram would be required given a vector count (if this is the case, we need to know dims & kind or store size...)? It seems like a "doc_field_stats" object should be added if all we want to do is count the number of fields a document has that fits within a certain mapped category kind. |
Yes, this is just to know the number of vectors indexed per deployment for telemetry.
Not sure I understand, do you mean a top level section? This is not about the number of fields though, we want to know the total number of vectors indexed. |
I explained my idea poorly. I am talking about indexed field kind stats. So, we would have "text_value_count" or "keyword_value_count" or "numeric_value_count" or "dense_vector_value_count" |
Ah, thank you for the explanation. Although I'm uncertain about the necessity of the value count for the other types. In my opinion, for a detailed examination of the fields and their costs, the disk usage API should be the preferred option. Continuing on the concept introduced in the This would enable the incorporation of additional statistics related to |
|
I proceeded with the implementation of the new section concept and introduced the dense_vector at the root level. This approach allows for the potential inclusion of additional statistics related to the indexed dense vector without disrupting other sections. Consequently, the integrity of tests and external expectations for the docs section remains unaffected. |
server/src/main/java/org/elasticsearch/index/engine/Engine.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/shard/DenseVectorStatsTests.java
Outdated
Show resolved
Hide resolved
benwtrent
left a comment
There was a problem hiding this comment.
I like the new top level thing
server/src/test/java/org/elasticsearch/action/admin/cluster/node/stats/NodeStatsTests.java
Show resolved
Hide resolved
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
…#98275) This change adds the total dense vector count to the output of the indices stats. This is useful for observability in order to track the number of indexed vectors in a cluster. --------- Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
This change adds the total dense vector count to the output of the indices stats. This is useful for observability in order to track the number of indexed vectors in a cluster.