Skip to content

Fix default index options when dimensions are unset for legacy indices#130540

Merged
jimczi merged 4 commits intoelastic:mainfrom
jimczi:default_dense_vector_dims
Jul 3, 2025
Merged

Fix default index options when dimensions are unset for legacy indices#130540
jimczi merged 4 commits intoelastic:mainfrom
jimczi:default_dense_vector_dims

Conversation

@jimczi
Copy link
Copy Markdown
Contributor

@jimczi jimczi commented Jul 3, 2025

In #129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.

This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:

[2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}]

This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.

Closes #130085

In elastic#129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.

This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:
```
[2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}]
```

This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
@jimczi jimczi requested a review from pmpailis July 3, 2025 09:57
@jimczi jimczi added >test Issues or PRs that are addressing/adding tests :Search Relevance/Vectors Vector search v9.1.0 v9.2.0 labels Jul 3, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 3, 2025
Copy link
Copy Markdown
Contributor

@pmpailis pmpailis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks @jimczi !

@jimczi jimczi added the auto-backport Automatically create backport pull requests when merged label Jul 3, 2025
Copy link
Copy Markdown
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@jimczi jimczi merged commit f91124a into elastic:main Jul 3, 2025
32 checks passed
@jimczi jimczi deleted the default_dense_vector_dims branch July 3, 2025 13:13
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💔 Backport failed

Status Branch Result
9.1 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 130540

@jimczi jimczi removed the auto-backport Automatically create backport pull requests when merged label Jul 3, 2025
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jul 3, 2025
elastic#130540)

In elastic#129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.

This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:
```
[2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}]
```

This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Mar 2, 2026
PR elastic#130540 introduced a bug where indexing a
document with a dotted field name (e.g. "my_vectors.vector1") matched
by a dynamic template for dense_vector or rank_vectors without
explicit dims would fail with:
IllegalStateException: Missing intermediate object.

The bug only manifests when dims are not set in the mapping or
template, triggering the code path that dynamically determines
dimensions from the first indexed document.

The root cause is that when DenseVectorFieldMapper and
RankVectorsFieldMapper dynamically determine dims, they rebuild the
mapper using the document parser's content path. For dynamically
mapped fields with dotted names, the content path includes the field
name itself rather than just the parent object path, producing a
duplicated full path in the rebuilt mapper.

This was fixed on main/9.4 by PR elastic#142754, which refactored dynamic
mapper tracking to use Mapper.Builder objects. This commit provides a
targeted fix for 9.3.x by deriving the correct parent path from the
existing mapper's fullPath instead of relying on the content path.

The same fix is applied to both DenseVectorFieldMapper and
RankVectorsFieldMapper, which shared the same bug.
mayya-sharipova added a commit that referenced this pull request Mar 3, 2026
PR #130540 introduced a bug where indexing a
document with a dotted field name (e.g. "my_vectors.vector1") matched
by a dynamic template for dense_vector or rank_vectors without
explicit dims would fail with:
IllegalStateException: Missing intermediate object.

The bug only manifests when dims are not set in the mapping or
template, triggering the code path that dynamically determines
dimensions from the first indexed document.

The root cause is that when DenseVectorFieldMapper and
RankVectorsFieldMapper dynamically determine dims, they rebuild the
mapper using the document parser's content path. For dynamically
mapped fields with dotted names, the content path includes the field
name itself rather than just the parent object path, producing a
duplicated full path in the rebuilt mapper.

This was fixed on main/9.4 by PR #142754, which refactored dynamic
mapper tracking to use Mapper.Builder objects. This commit provides a
targeted fix for 9.3.x by deriving the correct parent path from the
existing mapper's fullPath instead of relying on the content path.

The same fix is applied to both DenseVectorFieldMapper and
RankVectorsFieldMapper, which shared the same bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport pending :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.1.0 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] MixedClusterEsqlSpecIT failling on main->9.0 bwc tests

4 participants