Currently, it seems that a single instance of SGLang only supports serving one model at a time. Your suggestion is excellent. To ensure compatibility with the OpenAI API, I will likely add support for the /v1/models/{model_id} endpoint. The response from this new endpoint should be consistent with the output of /get_model_info.
Originally posted by @yhyang201 in #7179 (comment)
Currently, it seems that a single instance of SGLang only supports serving one model at a time. Your suggestion is excellent. To ensure compatibility with the OpenAI API, I will likely add support for the
/v1/models/{model_id}endpoint. The response from this new endpoint should be consistent with the output of/get_model_info.Originally posted by @yhyang201 in #7179 (comment)