Skip to content

Update text_similarity_rank_retriever to support inference ID as an argument when reranking on chunks#137397

Merged
mridula-s109 merged 63 commits intoelastic:mainfrom
mridula-s109:add-inferenceid-support-textsimilarity
Dec 19, 2025
Merged

Update text_similarity_rank_retriever to support inference ID as an argument when reranking on chunks#137397
mridula-s109 merged 63 commits intoelastic:mainfrom
mridula-s109:add-inferenceid-support-textsimilarity

Conversation

@mridula-s109
Copy link
Copy Markdown
Contributor

@mridula-s109 mridula-s109 commented Oct 30, 2025

Adding support to automatically use the best chunking size based on the input 'inference_id' provided.

Tested these below scenarios

# Full chunking control
POST my-index/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": { "standard": { "query": { "match": { "text": "query" } } } },
      "field": "text",
      "inference_id": ".rerank-v1-elasticsearch",
      "inference_text": "search query",
      "chunk_rescorer": {
        "size": 3,
        "chunking_settings": {
          "strategy": "sentence",
          "max_chunk_size": 50,
          "sentence_overlap": 0
        }
      },
      "rank_window_size": 10
    }
  }
}

# Partial chunking (only max_chunk_size)
POST my-index/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": { "standard": { "query": { "match": { "text": "query" } } } },
      "field": "text",
      "inference_id": ".rerank-v1-elasticsearch",
      "inference_text": "search query",
      "chunk_rescorer": {
        "size": 3,
        "chunking_settings": {
          "max_chunk_size": 100
        }
      },
      "rank_window_size": 10
    }
  }
}

# Auto-resolve to defaults
POST my-index/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": { "standard": { "query": { "match": { "text": "query" } } } },
      "field": "text",
      "inference_id": ".rerank-v1-elasticsearch",
      "inference_text": "search query",
      "chunk_rescorer": {
        "size": 3
      },
      "rank_window_size": 10
    }
  }
}

# No chunking
POST my-index/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": { "standard": { "query": { "match": { "text": "query" } } } },
      "field": "text",
      "inference_id": ".rerank-v1-elasticsearch",
      "inference_text": "search query",
      "rank_window_size": 10
    }
  }
}

@mridula-s109
Copy link
Copy Markdown
Contributor Author

@kderusso, this is a WIP, would love to hear your thoughts on the POC when you have a moment.

Copy link
Copy Markdown
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I performed a very high level review of the approach. Please update this PR with the suggested changes.

@mridula-s109
Copy link
Copy Markdown
Contributor Author

mridula-s109 commented Nov 12, 2025

I am still unable to run the yaml tests locally, but opening up for early feedback! Work in progress.

@mridula-s109 mridula-s109 marked this pull request as ready for review November 12, 2025 16:19
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Nov 12, 2025
@mridula-s109 mridula-s109 added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Nov 12, 2025
Copy link
Copy Markdown
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level functional review, still needs to be addressed.

@mridula-s109 mridula-s109 requested a review from Copilot December 16, 2025 20:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the text_similarity_rank_retriever to automatically determine optimal chunking settings based on the inference endpoint's window size when chunking settings are not explicitly provided. The retriever now queries the inference endpoint for its window size and uses it to configure chunking, while still allowing users to override these defaults with explicit settings.

Key changes:

  • Automatic resolution of chunking settings from inference endpoint window size when not explicitly provided
  • Support for partial chunking configuration where only max_chunk_size is specified
  • Introduction of async query rewriting to fetch window size from inference endpoints

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
70_text_similarity_rank_retriever.yml Added integration tests covering auto-resolution, partial configuration, and explicit override scenarios
TextSimilarityRankFeaturePhaseRankCoordinatorContextTests.java Added unit tests for chunking settings resolution logic and removed obsolete tests
ChunkScorerConfigTests.java New test file covering serialization, deserialization, and chunking settings creation
TextSimilarityRerankingRankFeaturePhaseRankShardContext.java Added validation to ensure chunking settings are resolved before shard execution
TextSimilarityRankRetrieverBuilder.java Implemented async query rewriting to fetch window size and resolve chunking settings
TextSimilarityRankFeaturePhaseRankCoordinatorContext.java Added chunking settings resolution logic and integrated window size fetching
ChunkScorerConfig.java Modified to support null chunking settings and improved settings creation methods
InferenceFeatures.java Added feature flag for the new chunking behavior
137397.yaml Added changelog entry for the enhancement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mridula-s109
Copy link
Copy Markdown
Contributor Author

@kderusso Thanks for your patience! I have addressed the comments and also made sure the functionality is working as intended for different reranking models. Please let me know if there are any concerns or optimisation needed.

Copy link
Copy Markdown
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for all your hard work iterating!

Can you please also update https://github.com/elastic/elasticsearch/blob/main/docs/reference/elasticsearch/rest-apis/retrievers/text-similarity-reranker-retriever.md to say that we default to chunking settings that will fit into the model associated with inference_id's token window? (This can be done as a followup if you want).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 19, 2025

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@mridula-s109 mridula-s109 force-pushed the add-inferenceid-support-textsimilarity branch from ef75bea to e6f41d3 Compare December 19, 2025 13:01
@mridula-s109
Copy link
Copy Markdown
Contributor Author

Looks good, thanks for all your hard work iterating!

Can you please also update https://github.com/elastic/elasticsearch/blob/main/docs/reference/elasticsearch/rest-apis/retrievers/text-similarity-reranker-retriever.md to say that we default to chunking settings that will fit into the model associated with inference_id's token window? (This can be done as a followup if you want).

Thanks @kderusso for approving. I have addressed the doc update. Please do have a look and let me know if there are suggestions or happy to go ahead with the merge.

@mridula-s109 mridula-s109 enabled auto-merge (squash) December 19, 2025 14:35
@mridula-s109 mridula-s109 merged commit e577424 into elastic:main Dec 19, 2025
35 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Dec 19, 2025
* upstream/main:
  Update text_similarity_rank_retriever to support inference ID as an argument when reranking on chunks (elastic#137397)
  Mute org.elasticsearch.test.rest.yaml.CssSearchYamlTestSuiteIT test {p0=search.retrievers/result-diversification/10_mmr_result_diversification_retriever/Test MMR result diversification multiple indexes} elastic#139826
  Mute org.elasticsearch.index.mapper.SkipperSettingsTests testTSDBSkipperSettingDefaults elastic#139824
  Unmute test fix elastic#129517 (elastic#139782)
  Add back support for deserializing old refresh token in test (elastic#139811)
  Add documentation for exponential_histogram field type (elastic#139684)
  make ES|QL sample CSV test looser (elastic#139814)
  Add `frozen_after` field to data stream lifecycle (elastic#139042)
  Quieten many `ERROR` logs to `WARN` (elastic#139799)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants