Skip to content

[bug] Local CPU memory leaking when using remote connector #2017

@ningziwen

Description

@ningziwen

Label
Please label your issue with "bug" and any other relevant labels so that it can easily be easily categorized under LMCache Onboarding

Describe the bug
When using Redis remote connector, I noticed local memory leaking issue and TTFT P50 turns from ~0.4s to 3s after several hours of tests using https://github.com/LMCache/LMBenchmark/blob/main/real-multi-round-qa/multi-round-qa.py#L104. The hours taken to increase seems linear to the max_local_cpu_size config. This issue should not be specific to Redis connector since other connectors are interacting local cpu with similar approach (such as SageMaker HyperPod connector). Using Redis connector as example here since it exists for long time.

In logs, I'm seeing lots of No eviction candidates found in local cpu backend..

With enabling logs in this PR, #1972

I'm seeing multiple cases of local cpu memory leaking.

  1. All of items are pinned and never unpinned. Local CPU backend state: total_items=18, pinned_count=18, ref_count_distribution={1: 18}. I added some pin/unpin logs locally and noticed it is from VLLM. By some reason, the wait_for_save() which has lookup_pin() is not called after lookup().

  2. No items are in hot cache. Local CPU backend state: total_items=0, pinned_count=0, ref_count_distribution={}. I suspect it is caused by the items are not registered in hot cache for batched_get()

    . In get(), it is registered.
    local_cpu_backend.submit_put_task(key, memory_obj)

To Reproduce

---

apiVersion: v1
kind: ConfigMap
metadata:
  name: lmcache-config
  namespace: lmcache
data:
  lmcache.yaml: |
    local_cpu: true  # Set to 'true' in production to enable both LMCache native CPU offload AND distributed caching
    chunk_size: 6082
    max_local_cpu_size: 5
    remote_url: "redis://redis.lmcache.svc.cluster.local:6379"

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  namespace: lmcache
spec:
  replicas: 40
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      namespace: lmcache
      labels:
        app: test
    spec:
      containers:
        - name: vllm
          image: lmcache/vllm-openai:v0.3.9post2
          command:
            - /opt/venv/bin/vllm
            - serve
            - meta-llama/Llama-3.1-8B-Instruct
            - --host
            - 0.0.0.0
            - --port
            - "8000"
            - --enable-prefix-caching
            - --max-model-len
            - "70000"
            - --tensor-parallel-size
            - "4"
            - --kv-transfer-config
            - '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'
          resources:
            limits:
              nvidia.com/gpu: "4"
            requests:
              nvidia.com/gpu: "4"
          startupProbe:
            failureThreshold: 60
            httpGet:
              path: /health
              port: 8000
              scheme: HTTP
            initialDelaySeconds: 15
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          env:
            - name: HF_TOKEN
              valueFrom:
                secretKeyRef:
                  key: hf_token_llama
                  name: vllm-secrets
            - name: LMCACHE_CONFIG_FILE
              value: /etc/lmcache/lmcache.yaml
            - name: PYTHONHASHSEED
              value: "0"
            - name: PROMETHEUS_MULTIPROC_DIR
              value: "/tmp"
            - name: LMCACHE_LOG_LEVEL
              value: "DEBUG"
          volumeMounts:
            - name: lmcache-config
              mountPath: /etc/lmcache
      volumes:
        - name: lmcache-config
          configMap:
            name: lmcache-config

Expected behavior
TTFT should be consistent and items should flow in local memory hot cache and be evicted as needed.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
EKS GPU instances

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions