Skip to content

[Data] [LLM] Allow vLLM deployments to be shared by sequential processors #52277

@hainesmichaelc

Description

@hainesmichaelc

Description

Allow sequential Ray Data processor steps to optionally reuse an existing vLLM deployment.

Use case

To do sequential batch inference (e.g. using the output from one LLM completion as a prompt for a second LLM request), the Ray Data LLM API makes it simple to define multiple processors. However, often the only thing changing between sequential steps is the prompt. In this case, it would be ideal to re-use the existing vLLM deployment rather than creating a new instance. Given the current behavior of Ray Data, which manages mapper state and resources for each stage separately, this is not currently supported. This is inconvenient primarily because it greatly increases the resource requirements to support workloads with many sequential processing steps, as each step requires dedicated resources.

I am planning to work around this by deploying the model through Ray Serve and using HttpRequestProcessorConfig instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalcommunity-backlogdataRay Data-related issuesenhancementRequest for new feature and/or capabilityllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions