Skip to content

Fix Synthetic Source Handling for bit Type in dense_vector Field#114407

Merged
benwtrent merged 10 commits intoelastic:mainfrom
Rassyan:fix-bit-vector-synthetic-source
Oct 14, 2024
Merged

Fix Synthetic Source Handling for bit Type in dense_vector Field#114407
benwtrent merged 10 commits intoelastic:mainfrom
Rassyan:fix-bit-vector-synthetic-source

Conversation

@Rassyan
Copy link
Copy Markdown
Contributor

@Rassyan Rassyan commented Oct 9, 2024

Description:

This PR addresses the issue described in #114402, where the synthetic_source feature does not correctly handle the bit type in dense_vector fields when index is set to false. The root cause of the issue was that the bit type was not properly accounted for, leading to an array that is 8 times the size of the actual dims value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

Changes:

  • Adjusted the synthetic_source logic to correctly handle the bit type by ensuring the array size accounts for the 8x difference in dimensions.
  • Added yaml test to cover the bit type scenario in dense_vector fields with index set to false.

Related Issues:

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team v9.0.0 labels Oct 9, 2024
@Rassyan
Copy link
Copy Markdown
Contributor Author

Rassyan commented Oct 9, 2024

@benwtrent

Could you please take a look at this PR? I would greatly appreciate your review and feedback.
Thank you for your time and consideration.

Best regards,
Rassyan

@benwtrent benwtrent self-requested a review October 9, 2024 13:33
@benwtrent benwtrent self-assigned this Oct 9, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed needs:triage Requires assignment of a team area label labels Oct 9, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@benwtrent
Copy link
Copy Markdown
Member

@Rassyan thank you for the PR, a couple of things:

  • We need a new capability added here: org.elasticsearch.rest.action.search.SearchCapabilities so that we can appropriately handle test cases.
  • Add a changelog file for the bug fix (there are many examples of bug fix change logs).
  • Update your yaml test to require the capability (there are many examples of other yaml tests where this happens).
  • Run spotlessApply to fix formatting.

@Rassyan
Copy link
Copy Markdown
Contributor Author

Rassyan commented Oct 9, 2024

@benwtrent

Thank you for the feedback. I will start working on the changes immediately.

Update your yaml test to require the capability (there are many examples of other yaml tests where this happens)

Could you please point me to specific examples of yaml tests that demonstrate this?

Thank you again for your guidance.

Best regards,

Rassyan

@benwtrent
Copy link
Copy Markdown
Member

@Rassyan search for something like:

  - requires:
      capabilities:

In *.yml files to see it in action.

@benwtrent
Copy link
Copy Markdown
Member

buildkite test this please

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good. Could we add some more testing lower down in the mapper testing? The yaml test and the fix looks good otherwise.

@benwtrent
Copy link
Copy Markdown
Member

buildkite test this please

@benwtrent
Copy link
Copy Markdown
Member

buildkite test this please

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for going through the additional testing. This fix is great :)

Comment on lines +1467 to +1473
List<String> order = Arrays.stream(params).map(p -> p.name).toList();
Iterator<Map.Entry<String, Object>> iterator = fieldNode.entrySet().stream().sorted((e1, e2) -> {
int i1 = order.indexOf(e1.getKey());
int i2 = order.indexOf(e2.getKey());
return Integer.compare(i1, i2);
}).iterator();
while (iterator.hasNext()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am against this, let's remove to try and find a better fix if we still run into weird parsing ordering failures.

It would be better to update the dense mapper validations than to fundamentally change how all mappers iterate values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand your concerns. I will move the commits related to the mapper bug to a separate PR for further discussion. This PR will remain focused solely on fixing the Synthetic Source issue. Does that sound good to you?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move the commits related to the mapper bug to a separate PR for further discussion.

thank you

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, new PR #114516

@Rassyan Rassyan force-pushed the fix-bit-vector-synthetic-source branch from 0f5ca00 to 2473b13 Compare October 10, 2024 15:56
@benwtrent
Copy link
Copy Markdown
Member

@Rassyan we should be able to update testing here like before without any of the parsing changes. Please revert back to your testing updates, including my updates to synthetic source so we can figure out the testing failures there.

@Rassyan
Copy link
Copy Markdown
Contributor Author

Rassyan commented Oct 10, 2024

@Rassyan we should be able to update testing here like before without any of the parsing changes. Please revert back to your testing updates, including my updates to synthetic source so we can figure out the testing failures there.

moved to new PR #114516

@benwtrent
Copy link
Copy Markdown
Member

buildkite test this please

@benwtrent
Copy link
Copy Markdown
Member

@Rassyan I have updated the test, I verified it all passes ok, even for 1k dimensions and bits. I think this is good :). I will merge when green.

@benwtrent
Copy link
Copy Markdown
Member

buildkite test this please

@benwtrent benwtrent added the auto-backport Automatically create backport pull requests when merged label Oct 14, 2024
@benwtrent benwtrent merged commit 465c65c into elastic:main Oct 14, 2024
benwtrent pushed a commit to benwtrent/elasticsearch that referenced this pull request Oct 14, 2024
…lastic#114407)

**Description:**

This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

**Changes:**

- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.

**Related Issues:**

- Closes [elastic#114402](elastic#114402)
- Introduced in [elastic#110059](elastic#110059)
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💔 Backport failed

Status Branch Result
8.x
8.15 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 114407

@benwtrent
Copy link
Copy Markdown
Member

💚 All backports created successfully

Status Branch Result
8.15

Questions ?

Please refer to the Backport tool documentation

benwtrent pushed a commit to benwtrent/elasticsearch that referenced this pull request Oct 14, 2024
…lastic#114407)

**Description:**

This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

**Changes:**

- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.

**Related Issues:**

- Closes [elastic#114402](elastic#114402)
- Introduced in [elastic#110059](elastic#110059)

(cherry picked from commit 465c65c)
elasticsearchmachine pushed a commit that referenced this pull request Oct 14, 2024
…114407) (#114756)

**Description:**

This PR addresses the issue described in [#114402](#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

**Changes:**

- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.

**Related Issues:**

- Closes [#114402](#114402)
- Introduced in [#110059](#110059)

Co-authored-by: Rassyan <yjkhngds@gmail.com>
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Oct 14, 2024
…lastic#114407)

**Description:**

This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

**Changes:**

- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.

**Related Issues:**

- Closes [elastic#114402](elastic#114402)
- Introduced in [elastic#110059](elastic#110059)
davidkyle pushed a commit that referenced this pull request Oct 15, 2024
…114407)

**Description:**

This PR addresses the issue described in [#114402](#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

**Changes:**

- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.

**Related Issues:**

- Closes [#114402](#114402)
- Introduced in [#110059](#110059)
elasticsearchmachine pushed a commit that referenced this pull request Oct 15, 2024
…ld (#114407) (#114759)

* Fix Synthetic Source Handling for `bit` Type in `dense_vector` Field (#114407)

**Description:**

This PR addresses the issue described in [#114402](#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

**Changes:**

- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.

**Related Issues:**

- Closes [#114402](#114402)
- Introduced in [#110059](#110059)

(cherry picked from commit 465c65c)

* fixing backport of search capabilities

* fixing license header

* adding capabilities to RestSearchAction

* fixing backport

* spotless

* muting teset for ccs

* adding capabilities to the ccs test runner

---------

Co-authored-by: Rassyan <yjkhngds@gmail.com>
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
…lastic#114407)

**Description:**

This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

**Changes:**

- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.

**Related Issues:**

- Closes [elastic#114402](elastic#114402)
- Introduced in [elastic#110059](elastic#110059)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.15.4 v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Synthetic Source Bug with bit Type in dense_vector Field When index is Set to false

3 participants