Skip to content

[serve][llm] Isolate prefix trees among deployments#58835

Merged
ruisearch42 merged 5 commits intoray-project:masterfrom
eicherseiji:fix-prefix-router-pd-dp
Dec 3, 2025
Merged

[serve][llm] Isolate prefix trees among deployments#58835
ruisearch42 merged 5 commits intoray-project:masterfrom
eicherseiji:fix-prefix-router-pd-dp

Conversation

@eicherseiji
Copy link
Copy Markdown
Contributor

@eicherseiji eicherseiji commented Nov 19, 2025

Description

This PR fixes PrefixCacheAffinityRouter to use deployment-specific prefix tree actors instead of a single shared global actor. This resolves replica ID conflicts that occur when multiple deployments use the router (e.g., in prefill-decode disaggregation with data parallelism).

Problem

Previously, all PrefixCacheAffinityRouter instances shared a single detached actor named LlmPrefixTreeActor. In multi-deployment scenarios like PD disaggregation with DP, this caused:

KeyError: Replica(id='bzw6m3yr', deployment='Decode:deepseek', app='deepseek-pd-nccl')

This happened because Prefill and Decode deployments (each with 16 DP replicas) were all tracked in the same prefix tree, causing replica ID collisions when the router tried to route requests.

Solution

Modified PrefixCacheAffinityRouter.initialize_state() to create deployment-specific prefix tree actors using namespaces derived from SERVE_NAMESPACE, app name, and deployment name:

  • Single deployment: serve::LlmPrefixTreeActor
  • PD scenario: serve::deepseek-pd-nccl::Prefill:deepseek::LlmPrefixTreeActor and serve::deepseek-pd-nccl::Decode:deepseek::LlmPrefixTreeActor

Each deployment now maintains its own isolated prefix tree state, preventing replica ID conflicts.

Changes

  • python/ray/llm/_internal/serve/routing_policies/prefix_aware/prefix_aware_router.py
    • Imports SERVE_NAMESPACE from ray.serve._private.constants
    • Builds a namespace from SERVE_NAMESPACE, app_name, and deployment_name (e.g., serve::app::deployment)
    • Creates the actor with this deployment-specific namespace

Testing

  • Validated manually with PD + DP deployments using DeepSeek-V2-Lite

Impact

  • Enables PrefixCacheAffinityRouter to work correctly with PD disaggregation + DP
  • No breaking changes for single deployment scenarios (backward compatible)
  • Users can now use prefix-aware routing in complex multi-deployment scenarios

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Nov 19, 2025
@eicherseiji eicherseiji marked this pull request as ready for review November 24, 2025 20:36
@eicherseiji eicherseiji requested a review from a team as a code owner November 24, 2025 20:36
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue llm labels Nov 25, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
…iji/ray into fix-prefix-router-pd-dp

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji requested a review from nrghosh December 2, 2025 22:53
@nrghosh
Copy link
Copy Markdown
Contributor

nrghosh commented Dec 3, 2025

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue with PrefixCacheAffinityRouter using a shared global prefix tree actor, which caused replica ID conflicts in multi-deployment scenarios. By introducing deployment-specific namespaces for the LlmPrefixTreeActor, each deployment now gets an isolated prefix tree, resolving the conflicts. The implementation is sound, and the new tests in TestMultiDeploymentIsolation thoroughly validate the fix by ensuring that prefix trees for different deployments are indeed isolated. I have one minor suggestion to simplify the namespace construction logic for improved conciseness.

@ruisearch42 ruisearch42 merged commit 5a0ce23 into ray-project:master Dec 3, 2025
6 checks passed
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
## Description

This PR fixes `PrefixCacheAffinityRouter` to use deployment-specific
prefix tree actors instead of a single shared global actor. This
resolves replica ID conflicts that occur when multiple deployments use
the router (e.g., in prefill-decode disaggregation with data
parallelism).

### Problem

Previously, all `PrefixCacheAffinityRouter` instances shared a single
detached actor named `LlmPrefixTreeActor`. In multi-deployment scenarios
like PD disaggregation with DP, this caused:

```
KeyError: Replica(id='bzw6m3yr', deployment='Decode:deepseek', app='deepseek-pd-nccl')
```

This happened because Prefill and Decode deployments (each with 16 DP
replicas) were all tracked in the same prefix tree, causing replica ID
collisions when the router tried to route requests.

### Solution

Modified `PrefixCacheAffinityRouter.initialize_state()` to create
deployment-specific prefix tree actors using **namespaces** derived from
`SERVE_NAMESPACE`, app name, and deployment name:

- Single deployment: `serve::LlmPrefixTreeActor`
- PD scenario:
`serve::deepseek-pd-nccl::Prefill:deepseek::LlmPrefixTreeActor` and
`serve::deepseek-pd-nccl::Decode:deepseek::LlmPrefixTreeActor`

Each deployment now maintains its own isolated prefix tree state,
preventing replica ID conflicts.

## Changes

-
`python/ray/llm/_internal/serve/routing_policies/prefix_aware/prefix_aware_router.py`
  - Imports `SERVE_NAMESPACE` from `ray.serve._private.constants`
- Builds a namespace from `SERVE_NAMESPACE`, `app_name`, and
`deployment_name` (e.g., `serve::app::deployment`)
  - Creates the actor with this deployment-specific namespace

## Testing

- Validated manually with PD + DP deployments using DeepSeek-V2-Lite

## Impact

- Enables `PrefixCacheAffinityRouter` to work correctly with PD
disaggregation + DP
- No breaking changes for single deployment scenarios (backward
compatible)
- Users can now use prefix-aware routing in complex multi-deployment
scenarios

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants