Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
Model Input Dumps
No response
🐛 Describe the bug
With glm-4 model, prefix caching is automatically disabled, because the engine is treating it as MLLM.
Maybe related to following codes:
|
@MULTIMODAL_REGISTRY.register_image_input_mapper(mm_input_mapper_for_glmv) |
|
@MULTIMODAL_REGISTRY.register_max_image_tokens(get_max_glmv_image_tokens) |
|
@INPUT_REGISTRY.register_dummy_data(dummy_data_for_glmv) |
|
@INPUT_REGISTRY.register_input_processor(input_processor_for_glmv) |
|
class ChatGLMForCausalLM(ChatGLMBaseModel, SupportsLoRA, SupportsPP, |
|
SupportsMultiModal): |
|
# Ensure that the LoRA support check passes when the class is not |
|
# initialized, but set all these attributes to empty. |
|
packed_modules_mapping = {} |
|
supported_lora_modules = [] |
|
embedding_modules = {} |
|
embedding_padding_modules = [] |
|
|
|
def __new__( |
|
cls, |
|
vllm_config: VllmConfig, |
|
prefix: str = "", |
|
) -> None: |
|
config = vllm_config.model_config.hf_config |
|
# Initialize VL |
|
if hasattr(config, "visual"): |
|
return ChatGLMV(vllm_config=vllm_config, prefix=prefix) |
|
# Initialize LLM |
|
else: |
|
return ChatGLM(vllm_config=vllm_config, prefix=prefix) |
|
if (model_config.is_multimodal_model and not envs.VLLM_USE_V1 |
|
and self.enable_prefix_caching): |
|
logger.warning("--enable-prefix-caching is currently not " |
|
"supported for multimodal models in v0 and " |
|
"has been disabled.") |
|
self.enable_prefix_caching = False |
Unfortunately, the glm-4 and glm-4v models have the same model_type value, how can I override this behavior without changing the code?
Before submitting a new issue...
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
With glm-4 model, prefix caching is automatically disabled, because the engine is treating it as MLLM.
Maybe related to following codes:
vllm/vllm/model_executor/models/chatglm.py
Lines 758 to 782 in d427e5c
vllm/vllm/engine/arg_utils.py
Lines 1046 to 1051 in d427e5c
Unfortunately, the glm-4 and glm-4v models have the same
model_typevalue, how can I override this behavior without changing the code?Before submitting a new issue...