Fix default index options when dimensions are unset for legacy indices#130540
Merged
jimczi merged 4 commits intoelastic:mainfrom Jul 3, 2025
Merged
Fix default index options when dimensions are unset for legacy indices#130540jimczi merged 4 commits intoelastic:mainfrom
jimczi merged 4 commits intoelastic:mainfrom
Conversation
In elastic#129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set. This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes: ``` [2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}] ``` This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
Collaborator
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Collaborator
💔 Backport failed
You can use sqren/backport to manually backport by running |
jimczi
added a commit
to jimczi/elasticsearch
that referenced
this pull request
Jul 3, 2025
elastic#130540) In elastic#129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set. This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes: ``` [2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}] ``` This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
mayya-sharipova
added a commit
to mayya-sharipova/elasticsearch
that referenced
this pull request
Mar 2, 2026
PR elastic#130540 introduced a bug where indexing a document with a dotted field name (e.g. "my_vectors.vector1") matched by a dynamic template for dense_vector or rank_vectors without explicit dims would fail with: IllegalStateException: Missing intermediate object. The bug only manifests when dims are not set in the mapping or template, triggering the code path that dynamically determines dimensions from the first indexed document. The root cause is that when DenseVectorFieldMapper and RankVectorsFieldMapper dynamically determine dims, they rebuild the mapper using the document parser's content path. For dynamically mapped fields with dotted names, the content path includes the field name itself rather than just the parent object path, producing a duplicated full path in the rebuilt mapper. This was fixed on main/9.4 by PR elastic#142754, which refactored dynamic mapper tracking to use Mapper.Builder objects. This commit provides a targeted fix for 9.3.x by deriving the correct parent path from the existing mapper's fullPath instead of relying on the content path. The same fix is applied to both DenseVectorFieldMapper and RankVectorsFieldMapper, which shared the same bug.
mayya-sharipova
added a commit
that referenced
this pull request
Mar 3, 2026
PR #130540 introduced a bug where indexing a document with a dotted field name (e.g. "my_vectors.vector1") matched by a dynamic template for dense_vector or rank_vectors without explicit dims would fail with: IllegalStateException: Missing intermediate object. The bug only manifests when dims are not set in the mapping or template, triggering the code path that dynamically determines dimensions from the first indexed document. The root cause is that when DenseVectorFieldMapper and RankVectorsFieldMapper dynamically determine dims, they rebuild the mapper using the document parser's content path. For dynamically mapped fields with dotted names, the content path includes the field name itself rather than just the parent object path, producing a duplicated full path in the rebuilt mapper. This was fixed on main/9.4 by PR #142754, which refactored dynamic mapper tracking to use Mapper.Builder objects. This commit provides a targeted fix for 9.3.x by deriving the correct parent path from the existing mapper's fullPath instead of relying on the content path. The same fix is applied to both DenseVectorFieldMapper and RankVectorsFieldMapper, which shared the same bug.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In #129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.
This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:
This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
Closes #130085