Adds new bit element_type for dense_vectors#110059
Adds new bit element_type for dense_vectors#110059elasticsearchmachine merged 28 commits intoelastic:mainfrom
bit element_type for dense_vectors#110059Conversation
|
Pinging @elastic/es-search (Team:Search) |
|
Hi @benwtrent, I've created a changelog YAML for you. Note that since this PR is labelled |
|
I am planning on adding docs soonish |
...inless/src/yamlRestTest/resources/rest-api-spec/test/painless/146_dense_vector_bit_basic.yml
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/vectors/ES815BitFlatVectorsFormat.java
Outdated
Show resolved
Hide resolved
|
@mayya-sharipova just pushed:
|
|
@elasticmachine update branch |
|
@benwtrent Thanks, the code LGTM. One thing that concerns me how we use because we don't actually use One suggestion I have is NOT allow to provide any "similarity" metric for bit vectors in mappings for users, and say in documentation that "similarity" is not defined for bit vectors. What do you think of this suggestion? |
So, our Lets take some
For |
|
@mayya-sharipova also, if users don't provide any similarity, we default to |
|
@benwtrent Thanks for the detailed explanation. I guess it is ok to keep |
|
@elasticmachine update branch |
…114407) **Description:** This PR addresses the issue described in [#114402](#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [#114402](#114402) - Introduced in [#110059](#110059)
…lastic#114407) **Description:** This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [elastic#114402](elastic#114402) - Introduced in [elastic#110059](elastic#110059)
…lastic#114407) **Description:** This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [elastic#114402](elastic#114402) - Introduced in [elastic#110059](elastic#110059) (cherry picked from commit 465c65c)
…114407) (#114756) **Description:** This PR addresses the issue described in [#114402](#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [#114402](#114402) - Introduced in [#110059](#110059) Co-authored-by: Rassyan <yjkhngds@gmail.com>
…lastic#114407) **Description:** This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [elastic#114402](elastic#114402) - Introduced in [elastic#110059](elastic#110059)
…114407) **Description:** This PR addresses the issue described in [#114402](#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [#114402](#114402) - Introduced in [#110059](#110059)
…ld (#114407) (#114759) * Fix Synthetic Source Handling for `bit` Type in `dense_vector` Field (#114407) **Description:** This PR addresses the issue described in [#114402](#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [#114402](#114402) - Introduced in [#110059](#110059) (cherry picked from commit 465c65c) * fixing backport of search capabilities * fixing license header * adding capabilities to RestSearchAction * fixing backport * spotless * muting teset for ccs * adding capabilities to the ccs test runner --------- Co-authored-by: Rassyan <yjkhngds@gmail.com>
…lastic#114407) **Description:** This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [elastic#114402](elastic#114402) - Introduced in [elastic#110059](elastic#110059)
This commit adds
bitvector support by addingelement_type: bitfor vectors. This new element type works for indexed and non-indexed vectors. Additionally, it works withhnswandflatindex types. No quantization based codec works with this element type, this is consistent withbytevectors.bitvectors accept up to32768dimensions in size and expect vectors that are being indexed to be encoded either as a hexidecimal string or abyte[]array where each element of thebytearray represents8bits of the vector.bitvectors support script usage and regular query usage. When indexed, all comparisons done arexorandpopcountsummations (aka, hamming distance), and the scores are transformed and normalized given the vector dimensions. Note, indexed bit vectors requirel2_normto be the similarity.For scripts,
l1normis the same ashammingdistance andl2normissqrt(l1norm).dotProductandcosineSimilarityare not supported.Note, the dimensions expected by this element_type are always to be divisible by
8, and thebyte[]vectors provided for index must be have sizedim/8size, where each byte element represents8bits of the vectors.closes: #48322