-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
community-backlogdataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityfeature-requestllm
Description
Description
Currently, Ray’s multimodal/vision model support for batch processing is limited to images only.
However, libraries like vLLM have multimodal support for video:
- https://github.com/QwenLM/Qwen2.5-VL?tab=readme-ov-file#inference-locally
- https://docs.vllm.ai/en/latest/features/multimodal_inputs.html#video-inputs
- https://github.com/vllm-project/vllm/blob/main/vllm/multimodal/utils.py#L277-L297
Problem
Today, if a user wants to batch process videos in Ray they would need to manually download and preprocess videos into frames and feed them as individual images. This preparation and preprocessing should be handled by Ray.
Proposal
Extend the existing PrepareImageStage to handle additional media types, making it a more generalised PrepareMediaStage. (Alternative, could be adding a PrepareVideoStage)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
community-backlogdataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityfeature-requestllm