Skip to content

Doubt about the topk and threshold usage in rerank stage settings #3146

@zfanswer

Description

@zfanswer

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.5.10

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

image enable the rerank model, and set threshold value and topk value, debug the recall behavior.

✔️ Expected Behavior

topk and threshold value should only affect rerank stage, like the doc describes.
(sorry, I didn't find this explaination in English doc...)
image
the last line words says, the topk and threshold value will only affect the rerank stage.
(there is another doubt, somehow, I assume that dify has a mechanism that recall a limited item set from vectordb and send them to rerank stage)
so the RAG with rerank steps will be like:
sim search and get a bunch of doc chunks ==> somehow? to get n (n>k) chunks ==> rerank stage ==> filtered by the treshold
==> get top-k from reordered and filtered chunk list ==> got final reference

❌ Actual Behavior

image
from the code, seems the recall logic use topk and threshold both in the first retrival and rerank stage.
so current RAG with rerank steps are:
sim search ==> filtered by threshold and get top-k chunks ==> rerank stage to get reodered chunk list ==> again, filtered by threshold and get top-k chunks ==> got final reference
I think reranking chunks within the top-k chunks got from the first retrival stage is kind of meaningless, if user always tend to use very small top-k value like 2, 3... there is no meaning for reranking with in this kind of small chunk set.
what do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    🙋‍♂️ questionThis issue does not contain proper reproduce steps or it only has limited words without details.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions