Description
Ray data supports making batch chat completion batch predictions with vLLM through vLLMEngineProcessorConfig and build_llm_processor. Would be great to have something similar for /v1/embeddings
Use case
Make batch predictions with embeddings models