Skip to content

Investigate and Address Dense Vector Mapper Parsing Issue#114516

Closed
Rassyan wants to merge 6 commits intoelastic:mainfrom
Rassyan:fix-bit-vector-mapper-parse-bug
Closed

Investigate and Address Dense Vector Mapper Parsing Issue#114516
Rassyan wants to merge 6 commits intoelastic:mainfrom
Rassyan:fix-bit-vector-mapper-parse-bug

Conversation

@Rassyan
Copy link
Copy Markdown
Contributor

@Rassyan Rassyan commented Oct 10, 2024

Description:

This PR is created to investigate and address the issue related to the DenseVectorFieldMapper parsing. The current implementation has led to some parsing inconsistencies, which need to be resolved in a more elegant manner.

Example Issue:

Consider the following example:

PUT test
{
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "dims": 8000,
        "index": false,
        "element_type": "bit"
      }
    }
  }
}

This request results in the following error:

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "The number of dimensions for field [vector] should be in the range [1, 4096] but was [8000]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping: The number of dimensions for field [vector] should be in the range [1, 4096] but was [8000]",
    "caused_by": {
      "type": "mapper_parsing_exception",
      "reason": "The number of dimensions for field [vector] should be in the range [1, 4096] but was [8000]"
    }
  },
  "status": 400
}

This issue highlights the need for better validation and parsing logic within the DenseVectorFieldMapper.

Proposed Changes:

  • Investigate the Root Cause: Conduct a thorough investigation to understand the root cause of the parsing inconsistencies.
  • Update Validations: Instead of changing the parsing order, update the validations within the DenseVectorFieldMapper to ensure that all necessary fields are correctly parsed and validated.
  • Add Tests: Introduce additional tests to cover edge cases and ensure that the parsing logic is robust and reliable.

Rationale:

The goal is to find a more elegant and sustainable solution to the parsing issue without fundamentally changing how all mappers iterate values. This approach aligns with the feedback received and aims to improve the overall reliability of the DenseVectorFieldMapper.

Next Steps:

  • Review the proposed investigation and validation updates.
  • Discuss potential impacts and further improvements.
  • Implement the agreed-upon changes and validate through comprehensive testing.

@elasticsearchmachine elasticsearchmachine added v9.0.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 10, 2024
@benwtrent
Copy link
Copy Markdown
Member

@Rassyan the way to fix this is to move the validation out of the parse logic for the dims parameter, and instead call addValidator that then validates the parsed dim value. This should fix it. To me, this is a separate bug indeed, please remove all synthetic source updates, but we should add a test like:

    public void testLargeDimsBit() throws IOException {
        createMapperService(fieldMapping(b -> {
            b.field("type", "dense_vector");
            b.field("dims", 1024 * Byte.SIZE);
            b.field("element_type", ElementType.BIT.toString());
        }));
    }

That simply does what you do and verifies that it passes. This pr should be pretty simple:

  • adjust the dims parameter to add a validator instead of validating during parsing
  • add a test to verify.

All other changes related to synthetic source and such will be a separate bug fix (like your other PR).

@benwtrent
Copy link
Copy Markdown
Member

@Rassyan I am taking care of the parsing bug here: #114533

There were some minor "gotcha's" that cropped up so I figured it would be better for me to do the PR. Once that one is merged, we can switch back to your synthetic source fix :) and see what fallout remains.

@Rassyan
Copy link
Copy Markdown
Contributor Author

Rassyan commented Oct 11, 2024

Hi @benwtrent,

Thank you for taking care of the parsing bug in #114533. I appreciate your help in addressing the minor "gotcha's" that cropped up.

I would like to first complete the fix for #114407, as it seems to be independent of the current mapper issue. I will closely monitor the progress of your PR #114533. Once it is merged, I am willing to address any remaining mapper bugs (parsing order) in this PR #114516.

Thanks again for your support!

Best regards,
Rassyan

@mosche mosche added the :Search Relevance/Vectors Vector search label Oct 11, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 11, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Oct 11, 2024
@benwtrent
Copy link
Copy Markdown
Member

I think all the issues are now addressed over the original PR combined with my PR. Closing this one.

@benwtrent benwtrent closed this Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants