Conversation
ggml-ci
|
Hmm yeah I think I misunderstood So just to confirm, the server does support having some slots running embd and some slots running completion at the same time, right? I'm asking this because I can't find |
I'm not really sure what is the state of this functionality, and AFAIK most people use the "embedding + completion" mode just for testing purposes (i.e. avoid starting 2 separate instances of |
ngxson
left a comment
There was a problem hiding this comment.
OK thanks for the explanation. That sounds good.
ref #3815 (comment)
I think 0d6f6a7 messed up the endpoint checks. Per the readme, the
--embeddingsflag should restrict to just/embeddingsendpoint, while--rerankingshould enable the/rerankendpoint.