-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[data] OOM killer kicks in but vLLM gpu processes are not cleaned up #54364
Description
What happened + What you expected to happen
The issue is when we have a stage with vLLM engine and oom killer kicks in, the process associated with the killed process is not killed. This is separate from this issue #53124 that was mitigated by using ray as the distributed executor backend.
This issue is there regardless of whether we use distributed_exector_backend = ray or mp. In case of using ray, the respawning may request a new gpu in a multi gpu cluster but the old process still lingers on its gpu.
Versions / Dependencies
N/A
Reproduction script
import ray
from vllm import LLM
class UDF:
def __init__(self):
self.memory = []
self.llm = LLM(
model="unsloth/Llama-3.2-1B-Instruct",
enforce_eager=True,
# If it's MP it the zombie process and the new process will collide on the same GPU.
# If it's ray it can choose another gpu and not collide but eventually it can hit the same GPU.
distributed_executor_backend="ray",
)
def __call__(self, batch):
# ~400x4 MB of data per batch
GIANT_OBJECT = "🤗" * 400_000_000
self.memory.append(GIANT_OBJECT)
return batch
ds = ray.data.range(2000)
ds = ds.map_batches(UDF, batch_size=2, concurrency=1, num_gpus=1)
ds = ds.materialize()
print(ds.take_all())In this repro, I have a UDF that instantiates an LLM engine inside. In its call function I am creating a 400MB object and append it to the memory state effectively increasing the heap of the udf to trigger cpu oom.
When CPU OOM killer kicks in and restarts the actor the gpu that was occupied by the previous dead process remains occupied. I ran this on 4xL40S with 380GB of VRAM.
Issue Severity
Medium: It is a significant difficulty but I can work around it.