Self Checks
Dify version
1.13.3
Cloud or Self Hosted
Cloud
Steps to reproduce
Environment
- Platform: Dify Cloud
- Retrieval mode: Hybrid search
- Embedding model:
text-embedding-3-large (OpenAI)
- Rerank model:
jina-reranker-v1-base-en (Jina)
Describe the bug
When using hybrid search with a rerank model configured, setting a score threshold of 0.60 returns empty results. However, lowering the threshold to 0.45 returns chunks that display scores of 0.84 – 0.96.
If the returned chunks are displaying scores of 0.84 – 0.96, they should have easily passed a threshold of 0.60. This is contradictory and confusing behavior.
This indicates that the threshold is being applied against a different score than what is displayed in the result. The user sets a threshold expecting it to filter against the displayed score — but it does not.
Steps to reproduce
- Create a knowledge base with hybrid search enabled
- Configure Jina reranker
- Set score threshold to
0.60 on retrieval node
- Send a query
- Retrieval node returns empty result
- Lower threshold to
0.45 on a second retrieval node with the same query
- Returns 10 chunks with displayed scores of
0.84 – 0.96
Proof
| Node |
Threshold |
Chunks returned |
Displayed score range |
| Node 1 |
0.60 |
0 |
— |
| Node 2 |
0.45 |
10 |
0.84 – 0.96 |
The same chunks that were blocked at threshold 0.60 display scores of 0.84 – 0.96 when retrieved at threshold 0.45. This means the threshold and the displayed score are not measuring the same thing.
✔️ Expected Behavior
Expected behavior
The score threshold should filter against the reranked score — which is the score displayed to the user in the result.
If a chunk displays a score of 0.84, it must pass a threshold of 0.60. The threshold value and the displayed score must be consistent so that users can set a meaningful threshold based on what they see.
❌ Actual Behavior
Actual behavior
The threshold filters against the raw hybrid score before reranking. The displayed score is the reranked score. These are two completely different scoring systems with different value distributions — but the same threshold value is silently applied to both without any indication to the user.
This means:
- A user sets threshold
0.60 expecting to filter out chunks scoring below 0.60
- But chunks scoring
0.84 – 0.96 (displayed) are blocked
- The user has no visibility into the raw hybrid score that is actually used for filtering
- The displayed score gives a false impression of what passed the threshold
Impact
- Valid queries return empty results at reasonable threshold values
- The score threshold setting is misleading — it does not behave as the UI implies
- Users are forced to use unnecessary workarounds such as multiple retrieval nodes with different thresholds to compensate for this behavior
Current workaround
Two retrieval nodes are required:
| Node |
Threshold |
Purpose |
| Node 1 |
0.60 |
Primary retrieval |
| Node 2 |
0.45 |
Fallback when Node 1 returns empty |
| Code node |
0.65 |
Manual filter on displayed reranked score |
This workaround should not be necessary.
Suggested fix
When a rerank model is configured, the score threshold should be applied against the reranked score — the same score that is displayed to the user. The threshold and the displayed score must always refer to the same value.
Related issue
#3146
Self Checks
Dify version
1.13.3
Cloud or Self Hosted
Cloud
Steps to reproduce
Environment
text-embedding-3-large(OpenAI)jina-reranker-v1-base-en(Jina)Describe the bug
When using hybrid search with a rerank model configured, setting a score threshold of
0.60returns empty results. However, lowering the threshold to0.45returns chunks that display scores of0.84–0.96.If the returned chunks are displaying scores of
0.84–0.96, they should have easily passed a threshold of0.60. This is contradictory and confusing behavior.This indicates that the threshold is being applied against a different score than what is displayed in the result. The user sets a threshold expecting it to filter against the displayed score — but it does not.
Steps to reproduce
0.60on retrieval node0.45on a second retrieval node with the same query0.84–0.96Proof
0.600.450.84–0.96The same chunks that were blocked at threshold
0.60display scores of0.84–0.96when retrieved at threshold0.45. This means the threshold and the displayed score are not measuring the same thing.✔️ Expected Behavior
Expected behavior
The score threshold should filter against the reranked score — which is the score displayed to the user in the result.
If a chunk displays a score of
0.84, it must pass a threshold of0.60. The threshold value and the displayed score must be consistent so that users can set a meaningful threshold based on what they see.❌ Actual Behavior
Actual behavior
The threshold filters against the raw hybrid score before reranking. The displayed score is the reranked score. These are two completely different scoring systems with different value distributions — but the same threshold value is silently applied to both without any indication to the user.
This means:
0.60expecting to filter out chunks scoring below0.600.84–0.96(displayed) are blockedImpact
Current workaround
Two retrieval nodes are required:
0.600.450.65This workaround should not be necessary.
Suggested fix
When a rerank model is configured, the score threshold should be applied against the reranked score — the same score that is displayed to the user. The threshold and the displayed score must always refer to the same value.
Related issue
#3146