[serve][llm] Isolate prefix trees among deployments#58835
Merged
ruisearch42 merged 5 commits intoray-project:masterfrom Dec 3, 2025
Merged
[serve][llm] Isolate prefix trees among deployments#58835ruisearch42 merged 5 commits intoray-project:masterfrom
ruisearch42 merged 5 commits intoray-project:masterfrom
Conversation
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
python/ray/llm/_internal/serve/routing_policies/prefix_aware/prefix_aware_router.py
Outdated
Show resolved
Hide resolved
python/ray/llm/_internal/serve/routing_policies/prefix_aware/prefix_aware_router.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
akyang-anyscale
approved these changes
Dec 2, 2025
…iji/ray into fix-prefix-router-pd-dp Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Contributor
|
/gemini review |
Contributor
There was a problem hiding this comment.
Code Review
This pull request correctly addresses an issue with PrefixCacheAffinityRouter using a shared global prefix tree actor, which caused replica ID conflicts in multi-deployment scenarios. By introducing deployment-specific namespaces for the LlmPrefixTreeActor, each deployment now gets an isolated prefix tree, resolving the conflicts. The implementation is sound, and the new tests in TestMultiDeploymentIsolation thoroughly validate the fix by ensuring that prefix trees for different deployments are indeed isolated. I have one minor suggestion to simplify the namespace construction logic for improved conciseness.
python/ray/llm/_internal/serve/routing_policies/prefix_aware/prefix_aware_router.py
Show resolved
Hide resolved
nrghosh
approved these changes
Dec 3, 2025
7 tasks
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
## Description This PR fixes `PrefixCacheAffinityRouter` to use deployment-specific prefix tree actors instead of a single shared global actor. This resolves replica ID conflicts that occur when multiple deployments use the router (e.g., in prefill-decode disaggregation with data parallelism). ### Problem Previously, all `PrefixCacheAffinityRouter` instances shared a single detached actor named `LlmPrefixTreeActor`. In multi-deployment scenarios like PD disaggregation with DP, this caused: ``` KeyError: Replica(id='bzw6m3yr', deployment='Decode:deepseek', app='deepseek-pd-nccl') ``` This happened because Prefill and Decode deployments (each with 16 DP replicas) were all tracked in the same prefix tree, causing replica ID collisions when the router tried to route requests. ### Solution Modified `PrefixCacheAffinityRouter.initialize_state()` to create deployment-specific prefix tree actors using **namespaces** derived from `SERVE_NAMESPACE`, app name, and deployment name: - Single deployment: `serve::LlmPrefixTreeActor` - PD scenario: `serve::deepseek-pd-nccl::Prefill:deepseek::LlmPrefixTreeActor` and `serve::deepseek-pd-nccl::Decode:deepseek::LlmPrefixTreeActor` Each deployment now maintains its own isolated prefix tree state, preventing replica ID conflicts. ## Changes - `python/ray/llm/_internal/serve/routing_policies/prefix_aware/prefix_aware_router.py` - Imports `SERVE_NAMESPACE` from `ray.serve._private.constants` - Builds a namespace from `SERVE_NAMESPACE`, `app_name`, and `deployment_name` (e.g., `serve::app::deployment`) - Creates the actor with this deployment-specific namespace ## Testing - Validated manually with PD + DP deployments using DeepSeek-V2-Lite ## Impact - Enables `PrefixCacheAffinityRouter` to work correctly with PD disaggregation + DP - No breaking changes for single deployment scenarios (backward compatible) - Users can now use prefix-aware routing in complex multi-deployment scenarios --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes
PrefixCacheAffinityRouterto use deployment-specific prefix tree actors instead of a single shared global actor. This resolves replica ID conflicts that occur when multiple deployments use the router (e.g., in prefill-decode disaggregation with data parallelism).Problem
Previously, all
PrefixCacheAffinityRouterinstances shared a single detached actor namedLlmPrefixTreeActor. In multi-deployment scenarios like PD disaggregation with DP, this caused:This happened because Prefill and Decode deployments (each with 16 DP replicas) were all tracked in the same prefix tree, causing replica ID collisions when the router tried to route requests.
Solution
Modified
PrefixCacheAffinityRouter.initialize_state()to create deployment-specific prefix tree actors using namespaces derived fromSERVE_NAMESPACE, app name, and deployment name:serve::LlmPrefixTreeActorserve::deepseek-pd-nccl::Prefill:deepseek::LlmPrefixTreeActorandserve::deepseek-pd-nccl::Decode:deepseek::LlmPrefixTreeActorEach deployment now maintains its own isolated prefix tree state, preventing replica ID conflicts.
Changes
python/ray/llm/_internal/serve/routing_policies/prefix_aware/prefix_aware_router.pySERVE_NAMESPACEfromray.serve._private.constantsSERVE_NAMESPACE,app_name, anddeployment_name(e.g.,serve::app::deployment)Testing
Impact
PrefixCacheAffinityRouterto work correctly with PD disaggregation + DP