Skip to content

fix: apply score threshold after reranking in hybrid search#35263

Merged
fatelei merged 2 commits into
langgenius:mainfrom
aayushbaluni:fix/35233-rerank-score-threshold
Apr 16, 2026
Merged

fix: apply score threshold after reranking in hybrid search#35263
fatelei merged 2 commits into
langgenius:mainfrom
aayushbaluni:fix/35233-rerank-score-threshold

Conversation

@aayushbaluni

Copy link
Copy Markdown
Contributor

Summary

Fixes #35233

In hybrid search with reranking enabled, the score threshold is applied to the pre-rerank/fusion score instead of the post-rerank score. This causes documents with high reranked scores (0.84-0.96) to be filtered out because their pre-rerank scores were below the threshold.

Root Cause

RetrievalService.embedding_search passes score_threshold to the vector DB query for HYBRID_SEARCH, which filters on pre-rerank similarity scores. After reranking produces new scores, the threshold-filtered results are already gone.

Fix

  • For HYBRID_SEARCH, pass score_threshold=None to the vector DB search step so no pre-filtering occurs
  • Let the reranking pipeline (RerankModelRunner, WeightRerankRunner) apply the threshold to the final reranked/fused scores
  • When no rerank runner is configured, fall back to filtering by vector score threshold after retrieval

Made with Cursor

The score threshold was applied to pre-rerank/fusion scores before
the reranker ran. Documents with high reranked scores (0.84-0.96)
were incorrectly filtered out because their pre-rerank scores were
below the threshold.

Move score threshold filtering to after the reranking step so it
uses the final scores users see in the UI.

Fixes langgenius#35233

Made-with: Cursor
@aayushbaluni aayushbaluni requested a review from JohnJyong as a code owner April 15, 2026 10:36
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 15, 2026
Comment thread api/core/rag/datasource/retrieval_service.py
…search

Per reviewer feedback, set embedding_score_threshold to 0.0 rather
than None when deferring threshold to post-rerank filtering.

Made-with: Cursor
@github-actions

Copy link
Copy Markdown
Contributor

Pyrefly Diff

No changes detected.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 16, 2026
@fatelei fatelei added this pull request to the merge queue Apr 16, 2026
Merged via the queue into langgenius:main with commit 54e51be Apr 16, 2026
28 checks passed
@d5devgodai-blip

Copy link
Copy Markdown

Hi, I am using Dify Cloud and tested this after the PR was merged but the behavior appears unchanged — score threshold is still being applied before reranking in hybrid search. Could you confirm whether this fix has been deployed to Dify Cloud, or if I need to wait for a specific release? Thanks.

@fatelei

fatelei commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Hi, I am using Dify Cloud and tested this after the PR was merged but the behavior appears unchanged — score threshold is still being applied before reranking in hybrid search. Could you confirm whether this fix has been deployed to Dify Cloud, or if I need to wait for a specific release? Thanks.

yes, in next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Score threshold not applied to reranked score in hybrid search

3 participants