fix: auto-generate lmcache_instance_id when value is None by can-sun · Pull Request #2732 · LMCache/LMCache

can-sun · 2026-03-10T23:05:11Z

Problem

_post_init in config_base.py uses hasattr() to check if lmcache_instance_id needs auto-generation. Since the attribute always exists with default None from the config schema, the check always passes and auto-generation is skipped.

This causes LMCacheWorker initialization to fail with a bare AssertionError when enable_controller=True and no explicit LMCACHE_LMCACHE_INSTANCE_ID env var is set.

Fix

Use getattr() with None fallback instead of hasattr() to properly detect unset values.

gemini-code-assist · 2026-03-10T23:05:24Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the lmcache configuration initialization process. Previously, the system failed to correctly auto-generate a unique instance ID for the LMCacheWorker when no explicit ID was provided, due to an incorrect attribute check. The implemented fix ensures that the instance ID is reliably generated in such scenarios, thereby preventing initialization errors and enhancing the robustness of the lmcache system.

Highlights

Configuration Fix: Corrected the logic for auto-generating lmcache_instance_id by switching from hasattr() to getattr() with a None fallback. This resolves an issue where the ID was not generated when it should have been, preventing LMCacheWorker initialization failures.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lmcache/v1/config_base.py
- Updated the logic for checking and generating lmcache_instance_id to correctly handle None values, ensuring proper auto-generation when the ID is not explicitly set.

Activity

No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a bug where the lmcache_instance_id was not being auto-generated because the check for its existence was incorrect. The fix replaces hasattr with getattr to correctly check if the instance ID has a value or is None. This change is correct and effectively resolves the issue, ensuring that a unique instance ID is generated when one is not provided.

piyushdaftary · 2026-03-11T02:48:49Z

        except Exception as e:
            self._init_failed = True
-            self._init_failed_reason = str(e)
+            self._init_failed_reason = traceback.format_exc()


One minor nit: traceback.format_exc() relies on implicit sys.exc_info() state. Since you already have e in scope, "".join(traceback.format_exception(e)) would be slightly more explicit and resilient to future refactors where the exception context might shift.

Got it, raised another revision leveraging format_exception instead

Signed-off-by: Can Sun <sucan@amazon.com>

ApostaC

Otherwise LGTM

ApostaC · 2026-03-19T17:43:04Z

        if not lmcache_worker_ids:
            # start lmcache worker on all ranks
-            assert len(config.lmcache_worker_ports) == metadata.world_size
+            assert len(config.lmcache_worker_ports) >= metadata.world_size


Why do we change to >= here?

From our testing, we found that this check seems to be too strict. For simplicity, we are giving 8 ports to be available to be used but TP may only be 4. This strict assertion is failing the check however redundant ports can just be ignored. What we just need to make sure is making the number of available ports greater than what's needed

Same question. I cannot get it that well. why is it strict? Could you help me to understand it more?

Here is the summary provided by AI:

The change from == to >= in both assertions is justified because:

The strict == is unnecessarily restrictive. The code only needs config.lmcache_worker_ports[self.worker_id] (first branch) or config.lmcache_worker_ports[index] (second branch) — it indexes into the list by position. As long as there are enough ports to
cover all workers, the access is safe. Extra ports in the list are simply unused and harmless.

It enables config reuse across different deployment topologies. A user may define a single lmcache_worker_ports list (e.g., [8500, 8501, 8502, 8503]) and reuse the same config file across deployments with varying world_size (e.g., TP=2 vs TP=4). With
the strict ==, they'd have to maintain separate configs for each topology even though the smaller deployment would just use a prefix of the port list.

It's consistent with how _get_lmcache_worker_ids works. That function already returns a subset of worker IDs (e.g., [0] for MLA mode regardless of world_size). The number of active workers can be less than the total ports configured. The >= check
correctly reflects this "at least enough" semantic.

The correctness guarantee is preserved. The actual port selection logic (config.lmcache_worker_ports[self.worker_id] and config.lmcache_worker_ports[index]) is unchanged. The >= still guarantees no out-of-bounds access —
self.worker_id < world_size <= len(ports) in the first branch, and index < len(worker_ids) <= len(ports) in the second branch.

The second assertion (len(ports) >= len(worker_ids)) is more correct than the original. The original len(worker_ids) == len(ports) would fail if a user provides extra ports as a superset. Since index = lmcache_worker_ids.index(self.worker_id) is bounded
by len(lmcache_worker_ids), having len(ports) >= len(worker_ids) is the precise safety condition.

In short: the >= is the minimum necessary precondition for safe indexing, while == was an overly strict invariant that rejected valid configurations.

I still don't get it. Two questions:

Is it related to this PR?

In what case will == fail? (E.g., something like run LMCache with Model X using parallel strategy Y..)

Thanks for the explanation!

Yes, it is related. We encountered two main issues when using the LMCache controller. All of them are due to the assertion failures during the cache engine initialization. One is that instance Id is None when not provided via ENV. And second is this issue due to too strict assertion. We always give 8 ports to be used (p5 instance has 8 GPUs) by worker but this is currently causing the initialization failures.

The condition will fail when you create a model deployment but worker number is not the same as tensor parallelization. In our case, we always give 8 ports to be used but this will fail the assertion on g5.24xlarge because it only has 4 GPUs. Besides, MLA (Multi-head latent attention) models always use 1 LMCache worker regardless of TP size. So it is quite hard for us to determine the right port number per customer's model deployment definition

There is no harm to relax this restriction I think. We just need to guarantee there are sufficient number of ports that can be used to allocate to each worker. This also aligns with P2P ports which does not fail when number of p2p init/lookup ports is greater than TP rank.

# p2p_backend.py self.peer_init_port = config.p2p_init_ports[self.tp_rank] self.peer_lookup_port = config.p2p_lookup_ports[self.tp_rank]

Verified locally that the change works without assertion errors and vLLM can respond successfully without issues. Notice that I gave 8 ports available to be used 8001,8002,8003,8004,8005,8006,8007,8008

apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "17" creationTimestamp: "2026-03-11T20:18:01Z" generation: 26 labels: app: tinyllama-kvcache deploying-service: hyperpod-inference name: tinyllama-kvcache namespace: default resourceVersion: "17420107" uid: 2fb9b9c6-32fa-407c-92c9-a1e32c346089 spec: progressDeadlineSeconds: 3600 replicas: 3 revisionHistoryLimit: 10 selector: matchLabels: app: tinyllama-kvcache deploying-service: hyperpod-inference strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: annotations: kubectl.kubernetes.io/restartedAt: "2026-03-22T23:30:42-07:00" creationTimestamp: null labels: app: tinyllama-kvcache deploying-service: hyperpod-inference namespace: default spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: node.kubernetes.io/instance-type operator: In values: - ml.g5.24xlarge - key: topology.kubernetes.io/region operator: In values: - us-west-2 weight: 100 requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node.kubernetes.io/instance-type operator: In values: - ml.g5.24xlarge - key: topology.kubernetes.io/region operator: In values: - us-west-2 containers: - args: - /opt/ml/model - --max-model-len - "8192" - --tensor-parallel-size - "4" - --kv-transfer-config - '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}' env: - name: OPTION_ROLLING_BATCH value: vllm - name: SAGEMAKER_SUBMIT_DIRECTORY value: /opt/ml/model/code - name: MODEL_CACHE_ROOT value: /opt/ml/model - name: SAGEMAKER_MODEL_SERVER_WORKERS value: "1" - name: SAGEMAKER_MODEL_SERVER_TIMEOUT value: "3600" - name: LMCACHE_CONTROLLER_PULL_URL value: tinyllama-kvcache-default-routing-service.hyperpod-inference-system.svc.cluster.local:9000 - name: LMCACHE_CONTROLLER_URL value: tinyllama-kvcache-default-routing-service.hyperpod-inference-system.svc.cluster.local:9000 - name: LMCACHE_DISTRIBUTED_URL value: localhost:8009 - name: LMCACHE_ENABLE_CONTROLLER value: "true" - name: LMCACHE_LMCACHE_WORKER_PORT value: "8001" - name: LMCACHE_LMCACHE_WORKER_PORTS value: 8001,8002,8003,8004,8005,8006,8007,8008 - name: LMCACHE_LOCAL_CPU value: "true" - name: LMCACHE_MAX_LOCAL_CPU_SIZE value: "32" - name: LMCACHE_P2P_INIT_PORTS value: 9001,9002,9003,9004,9005,9006,9007,9008 - name: LMCACHE_P2P_LOOKUP_PORTS value: 9011,9012,9013,9014,9015,9016,9017,9018 - name: LMCACHE_USE_EXPERIMENTAL value: "true" - name: PROMETHEUS_MULTIPROC_DIR value: /tmp - name: PYTHONHASHSEED - name: LMCACHE_CONTROLLER_REPLY_URL value: tinyllama-kvcache-default-routing-service.hyperpod-inference-system.svc.cluster.local:9001 - name: LMCACHE_LMCACHE_WORKER_HEARTBEAT_TIME value: "10" image: 006280839251.dkr.ecr.us-west-2.amazonaws.com/lmcache/vllm-openai:dev imagePullPolicy: Always name: tinyllama-kvcache ports: - containerPort: 8000 name: http protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /health port: 8000 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: limits: nvidia.com/gpu: "4" requests: cpu: "6" memory: 30Gi nvidia.com/gpu: "4" terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /opt/ml/model name: model-weights subPath: models/Llama-3.1-8B-Instruct - mountPath: /dev/shm name: shm - env: - name: MODEL_HOST value: localhost - name: MODEL_PORT value: "8000" - name: RESOURCE_NAME value: tinyllama-kvcache - name: NAMESPACE value: default image: 861276088053.dkr.ecr.us-west-2.amazonaws.com/nginx-reverse-proxy:v1.0.1 imagePullPolicy: IfNotPresent name: sidecar-reverse-proxy ports: - containerPort: 8081 name: http-proxy protocol: TCP - containerPort: 9113 name: metrics protocol: TCP resources: limits: cpu: 200m memory: 256Mi requests: cpu: 100m memory: 128Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /usr/local/openresty/nginx/conf/nginx.conf name: sidecar-config subPath: nginx.conf - mountPath: /etc/nginx/lua/metrics.lua name: metrics-lua subPath: metrics.lua - args: - --config=/etc/otel/config.yaml env: - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: NODE_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/hyperpod/otel_collector:v1751654772563 imagePullPolicy: IfNotPresent name: otel-collector ports: - containerPort: 4317 name: otlp-grpc protocol: TCP - containerPort: 4318 name: otlp-http protocol: TCP resources: limits: cpu: 100m memory: 128Mi requests: cpu: 50m memory: 64Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/otel/config.yaml name: otel-collector-config subPath: config.yaml dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 topologySpreadConstraints: - labelSelector: matchLabels: app: tinyllama-kvcache deploying-service: hyperpod-inference maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway volumes: - emptyDir: medium: Memory name: shm - name: model-weights persistentVolumeClaim: claimName: inference-s3-pvc-tinyllama-kvcache-inf - configMap: defaultMode: 420 name: iec-tinyllama-kvcache-nginx name: sidecar-config - configMap: defaultMode: 420 name: iec-tinyllama-kvcache-lua name: metrics-lua - configMap: defaultMode: 420 name: iec-tinyllama-kvcache-otel name: otel-collector-config status: availableReplicas: 3 conditions: - lastTransitionTime: "2026-03-23T04:22:56Z" lastUpdateTime: "2026-03-23T04:22:56Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: "2026-03-23T03:51:04Z" lastUpdateTime: "2026-03-23T17:57:59Z" message: ReplicaSet "tinyllama-kvcache-68db6c44df" has successfully progressed. reason: NewReplicaSetAvailable status: "True" type: Progressing observedGeneration: 26 readyReplicas: 3 replicas: 3 updatedReplicas: 3

So this PR includes two solution not just solution for "fix: auto-generate lmcache_instance_id when value is None". Is this correct?

Yes, sorry about the confusion in commit message. I introduced this change after I fix the instance_id issue. I can update the commit message.

DongDongJu

I left one comment.

DongDongJu · 2026-03-20T19:35:52Z

        if not lmcache_worker_ids:
            # start lmcache worker on all ranks
-            assert len(config.lmcache_worker_ports) == metadata.world_size
+            assert len(config.lmcache_worker_ports) >= metadata.world_size


Same question. I cannot get it that well. why is it strict? Could you help me to understand it more?

Signed-off-by: Can Sun <sucan@amazon.com>

Signed-off-by: cansun <80425164+can-sun@users.noreply.github.com>

sammshen

LGTM

DongDongJu

LGTM as well. please make the next one what you mentioned.

can-sun · 2026-03-24T21:54:54Z

LGTM as well. please make the next one what you mentioned.

Thank you!

* fix: auto-generate lmcache_instance_id when value is None Signed-off-by: Can Sun <sucan@amazon.com> * fix: include full traceback in LMCacheManager init failure logs Signed-off-by: Can Sun <sucan@amazon.com> * fix: use traceback.format_exception(e) instead of format_exc() Signed-off-by: Can Sun <sucan@amazon.com> * Relax worker port count assertion Signed-off-by: Can Sun <sucan@amazon.com> * Revert worker port count assertion to strict equality Signed-off-by: Can Sun <sucan@amazon.com> --------- Signed-off-by: Can Sun <sucan@amazon.com> Signed-off-by: cansun <80425164+can-sun@users.noreply.github.com>

gemini-code-assist Bot reviewed Mar 10, 2026

View reviewed changes

can-sun force-pushed the fix/instance-id-auto-generation branch from 6b48980 to 4fe9741 Compare March 10, 2026 23:46

piyushdaftary reviewed Mar 11, 2026

View reviewed changes

ningziwen approved these changes Mar 11, 2026

View reviewed changes

can-sun added 3 commits March 12, 2026 09:53

fix: auto-generate lmcache_instance_id when value is None

a3262d0

Signed-off-by: Can Sun <sucan@amazon.com>

fix: include full traceback in LMCacheManager init failure logs

6ab9caa

Signed-off-by: Can Sun <sucan@amazon.com>

fix: use traceback.format_exception(e) instead of format_exc()

5cfbf25

Signed-off-by: Can Sun <sucan@amazon.com>

can-sun force-pushed the fix/instance-id-auto-generation branch from 4fe9741 to 5cfbf25 Compare March 12, 2026 16:54

can-sun added 2 commits March 12, 2026 09:57

Merge branch 'dev' into fix/instance-id-auto-generation

988c9da

Merge branch 'dev' into fix/instance-id-auto-generation

646b994

piyushdaftary approved these changes Mar 13, 2026

View reviewed changes

hlin99 approved these changes Mar 13, 2026

View reviewed changes

can-sun and others added 5 commits March 13, 2026 21:30

Merge branch 'dev' into fix/instance-id-auto-generation

ca605ff

Merge branch 'dev' into fix/instance-id-auto-generation

1d7f695

Merge branch 'dev' into fix/instance-id-auto-generation

d780e19

Merge branch 'dev' into fix/instance-id-auto-generation

e016188

Relax worker port count assertion

bdde32a

Signed-off-by: Can Sun <sucan@amazon.com>

can-sun force-pushed the fix/instance-id-auto-generation branch from a413268 to bdde32a Compare March 18, 2026 19:16

ApostaC reviewed Mar 19, 2026

View reviewed changes

DongDongJu reviewed Mar 20, 2026

View reviewed changes

Revert worker port count assertion to strict equality

11a95b7

Signed-off-by: Can Sun <sucan@amazon.com>

can-sun force-pushed the fix/instance-id-auto-generation branch from 6a1f046 to 11a95b7 Compare March 24, 2026 18:07

Merge branch 'dev' into fix/instance-id-auto-generation

51bffc8

Signed-off-by: cansun <80425164+can-sun@users.noreply.github.com>

sammshen approved these changes Mar 24, 2026

View reviewed changes

DongDongJu approved these changes Mar 24, 2026

View reviewed changes

DongDongJu enabled auto-merge (squash) March 24, 2026 21:23

github-actions Bot added the full Run comprehensive tests on this PR label Mar 24, 2026

DongDongJu merged commit d2058b2 into LMCache:dev Mar 24, 2026
26 checks passed

This was referenced Mar 25, 2026

Relax worker port count assertion #2866

Closed

fix: relax worker port count assertion #2867

Merged

Conversation

can-sun commented Mar 10, 2026

Problem

Fix

Uh oh!

gemini-code-assist Bot commented Mar 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DongDongJu Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

can-sun Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DongDongJu left a comment

Choose a reason for hiding this comment

Uh oh!

DongDongJu Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

DongDongJu left a comment

Choose a reason for hiding this comment

Uh oh!

can-sun commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

DongDongJu Mar 20, 2026 •

edited

Loading

can-sun Mar 23, 2026 •

edited

Loading

DongDongJu Mar 20, 2026 •

edited

Loading