Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model#1186
Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model#1186Ying1123 merged 2 commits intosgl-project:mainfrom zhaochenyang20:support_qwn2
Conversation
|
@Ying1123 I added gte in the generation model test. Note that I changed the prefill tolerance accordingly and added the rouge-l metric instead of assert output_strs exactly the same. |
| import multiprocessing as mp | ||
|
|
||
| try: | ||
| mp.set_start_method("spawn") |
There was a problem hiding this comment.
Why would this be needed?
|
@zhaochenyang20 prompt = “hello world” response = client.embeddings.create( transformer: max_length = 8192 batch_dict = tokenizer(prompt, max_length=max_length, padding=True, truncation=True, return_tensors='pt') embeddings = F.normalize(embeddings, p=2, dim=1) |
|
@llmforever hello. Sorry, I haven't noticed this before. Do you still need to fix this? Actually, we have a unit test for this in Also, I don't understand what did you mean by "perform not so well". Could you provide your running snifts and your serving command for SGLang. And, does e5-mistral also have this problem? Or only get? |
|
Yeah. The embedding could be different due to a lot of reasons. @llmforever You can check this unit test: https://github.com/sgl-project/sglang/blob/main/test/srt/models/test_embedding_models.py We set a tolerance value for the embedding difference. Also, please try the e5-mistral model and give us the embedding difference. https://huggingface.co/intfloat/e5-mistral-7b-instruct @Ying1123 Do you think the difference provided is tolerable? |
I test about 10 cases,each accuracy drop from 80% to less than 10%,i think the difference is not tolerable,but the result of the e5-mistral-7b-instruct model is the same,can you please help me look that? Here is the code i use to generate the embedding: for transformer: import torch from torch import Tensor input_texts = ['hello'] max_length = 8192 batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt') embeddings = F.normalize(embeddings, p=2, dim=1) for sglang: input_texts = ['hello'] queres = client.embeddings.create( |
|
@Ying1123 I think he provides an intolerable difference hummm? I gonna check it these days. |
…t#1186) Co-authored-by: Ying Sheng <sqy1415@gmail.com>

Motivation
Current SGLang only supports the e5-mistral embedding model. I added Alibaba-NLP/gte-Qwen2-7B-instruct model in this PR.
Also, previously SGLang determines a model as an embedding model through its
hf_config.architectures. But gte model has the same architecture as CausalLM. So I added a new parameter in theserver_argsand changed the forward function ofQwen2ForCausalLM.Modifications
Qwen2ForCausalLM.is_embeddinginserver_args.Checklist