Skip to content

[Serve] Add generic actor registration API for shutdown cleanup#60067

Merged
abrarsheikh merged 27 commits intoray-project:masterfrom
eicherseiji:prefix-tree-cleanup
Feb 3, 2026
Merged

[Serve] Add generic actor registration API for shutdown cleanup#60067
abrarsheikh merged 27 commits intoray-project:masterfrom
eicherseiji:prefix-tree-cleanup

Conversation

@eicherseiji
Copy link
Copy Markdown
Contributor

@eicherseiji eicherseiji commented Jan 12, 2026

Summary

Adds a register_shutdown_cleanup_actor() method to ServeController that allows deployments to register auxiliary actors for cleanup on serve.shutdown().

The LLM prefix-aware router uses this API to register its PrefixTreeActor, eliminating the Serve → LLM import dependency.

Changes

  • Add _registered_cleanup_actors list and register_shutdown_cleanup_actor() to controller
  • Add _kill_registered_cleanup_actors() called during shutdown
  • Update PrefixCacheAffinityRouter to register tree actor with controller
  • Replace old test with integration test for the new API

Limitations (stopgap solution)

Checks

  • I've signed off every commit
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've added any new APIs to the API Reference.
  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Integration test

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where detached actors in Serve sub-namespaces were not cleaned up during shutdown. The fix introduces a new function, _kill_actors_in_serve_subnamespaces, which is called during the shutdown process to find and kill these actors. The implementation is sound and is accompanied by a good regression test. My review includes a suggestion to improve the efficiency and robustness of the new cleanup function by leveraging more advanced filtering capabilities of the list_actors API and ensuring safe access to actor data.

@eicherseiji eicherseiji force-pushed the prefix-tree-cleanup branch 2 times, most recently from 7aaf127 to 5db0cff Compare January 12, 2026 22:52
@eicherseiji
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a necessary cleanup mechanism for detached actors in serve sub-namespaces during shutdown, preventing resource leaks and stale state across restarts. The implementation is clean and includes a new method, _kill_actors_in_serve_subnamespaces, which is appropriately called during the shutdown sequence. The accompanying regression test is thorough and effectively validates the new functionality. I have one suggestion to make the actor cleanup logic more precise.

@eicherseiji eicherseiji marked this pull request as ready for review January 12, 2026 23:39
@eicherseiji eicherseiji requested a review from a team as a code owner January 12, 2026 23:40
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jan 12, 2026
@eicherseiji
Copy link
Copy Markdown
Contributor Author

Addressed bot review comments:

  • Fixed invalid like predicate - changed to Python filtering with startswith()
  • Added limit=10000 to handle many actors
  • Removed 'detached' from docstring since we kill all actors in sub-namespaces

@eicherseiji eicherseiji changed the title [serve] Clean up detached actors in serve sub-namespaces on shutdown [serve] Clean up actors in serve sub-namespaces on shutdown Jan 13, 2026
@ray-gardener ray-gardener bot added the serve Ray Serve Related Issue label Jan 13, 2026
Previously, actors in namespaces like 'serve::app::deployment'
(e.g., PrefixTreeActor) would survive serve.shutdown() because the cleanup
only targeted the main 'serve' namespace.

This adds _kill_actors_in_serve_subnamespaces() to the controller's shutdown
method to clean up any actors in namespaces starting with 'serve::'
before the controller exits.

The test verifies that:
1. Actors in sub-namespaces are killed on shutdown
2. After restart, a fresh tree is created with no stale state

Fixes stale replica accumulation in PrefixTreeActor after serve restart.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Copy link
Copy Markdown
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is special about PrefixTreeActor compared to other deployments that serve manages? Why are those actors getting killed and not PrefixTreeActor


def _kill_actors_in_serve_subnamespaces(self) -> None:
"""Kill all actors in namespaces that start with 'serve::'."""
from ray.util.state import list_actors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move imports to top of file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to avoid a API docs CI failure: https://buildkite.com/ray-project/premerge/builds/57532#019bb8eb-400f-4eea-9c63-5ea269d4a85c

I will add a comment

@eicherseiji
Copy link
Copy Markdown
Contributor Author

@abrarsheikh PrefixTreeActors are created in a sub-namespace of Serve to isolate the trees by deployment. Ref:

I implemented it this way assuming that all namespaces under serve would be cleaned up with serve.shutdown. I think using subnamespaces is useful, since it reuses the get_or_create boilerplate from Ray Core for accessing detached actors by name (as opposed to creating a thread-safe PrefixTreeActorManager, for example).

Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
@eicherseiji eicherseiji requested a review from a team as a code owner January 20, 2026 17:56
@eicherseiji eicherseiji changed the title [serve] Clean up actors in serve sub-namespaces on shutdown [serve] Use unique actor names for PrefixTreeActor instead of subnamespaces Jan 20, 2026
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Instead of creating PrefixTreeActor in a subnamespace (serve::app::deployment)
and needing special cleanup logic, use a unique actor name
(LlmPrefixTreeActor_app_deployment) in the main serve namespace.

This allows the actor to be cleaned up through normal Serve cleanup mechanisms,
removing the need for _kill_actors_in_serve_subnamespaces().

The test verifies that actors in the serve namespace with unique names are
properly cleaned up on shutdown, preventing stale tenant state after restart.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji and others added 3 commits January 26, 2026 23:24
Move LLM_PREFIX_TREE_ACTOR_NAME_PREFIX import inside _kill_prefix_tree_actors()
to avoid triggering ray.llm observability setup at module load time, which
breaks Ray Client mode. Also switch from list_actors (State API) to
list_named_actors which has proper Ray Client support.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Copy link
Copy Markdown
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @abrarsheikh to chime in.

"""
return self.kv_store.get(CONFIG_CHECKPOINT_KEY) is None

def _kill_prefix_tree_actors(self) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very iffy on this hack. because this is making serve a dependency between serve llm and serve in the opposite way. Not sure how @abrarsheikh feels about it :)

Is there any other proposal that is less hacky that can implemented quickly?

Replace the hacky LLM-specific _kill_prefix_tree_actors() cleanup with a
generic register_shutdown_cleanup_actor() API that allows any deployment
to register auxiliary actors for cleanup on serve.shutdown().

Changes:
- Add _registered_cleanup_actors list and register_shutdown_cleanup_actor()
  to ServeController
- Add _kill_registered_cleanup_actors() called during shutdown
- Update PrefixCacheAffinityRouter to register its tree actor with controller
- Remove LLM_PREFIX_TREE_ACTOR_NAME_PREFIX constant (no longer needed externally)
- Update tests to use injected tree actors where serve controller isn't running

This eliminates the Serve → LLM import dependency while providing a
general-purpose mechanism for deployment-scoped actor cleanup.

Limitations (acceptable for stopgap solution):
- Actors only cleaned on serve.shutdown(), not deployment deletion
- Registration not persisted across controller restarts
- For full lifecycle management: see ray-project#60359

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji changed the title [serve] Clean up PrefixTreeActor on serve.shutdown() to prevent stale state [Serve] Add generic actor registration API for shutdown cleanup Jan 30, 2026
Resolve conflicts:
- controller.py: Keep both _registered_cleanup_actors and _health_metrics_tracker
- test_standalone.py: Keep new test and apply upstream's ray_shutdown param to test_deployment

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Change _registered_cleanup_actors from List to Dict keyed by actor ID.
This prevents the same actor from being registered multiple times when
multiple router instances share a tree actor via get_if_exists=True.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Copy link
Copy Markdown
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implementation and requirement make sense. Leaving the nits / full review to @abrarsheikh . Thanks @eicherseiji


This allows deployments to register auxiliary actors (like caches,
coordinators, etc.) that should be cleaned up when Serve shuts down.
The actors must use lifetime="detached" to survive replica restarts,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why must the actor be "detached"? will it cause any implications if it's not detached?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I understand how we want it to survive the replica restarts, but is there a layer in-between replica and serve shutdown? like deployment shutdown? I don't know if my question makes sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not detached, it will be killed when the replica that created it dies. I.e. autoscaling will break the router. And agree, deployment-scoped would be better. I think we can add that with the full fix in #60359

Comment on lines 119 to 124
self._tree_actor = PrefixTreeActor.options(
name=actor_name,
namespace=actor_namespace,
get_if_exists=True,
lifetime="detached",
).remote()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does core guarantee only one actor is created if this call happens concurrently ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the potential race is handled explicitly in the Core code path:

# Attempt to create it (may race with other attempts).


Note: Registered actors are NOT persisted across controller restarts.
For full persistence, use controller-managed deployment-scoped actors
(see https://github.com/ray-project/ray/issues/60359).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think we should really do what's being proposed in the issue. Make it available through both declarative and imperative serve APIs. We already have a few usecases for this

  1. Queue Monitor Actor for async inferance - deployment scoped
  2. Global singleton priority request queue - deployment scoped

"""
return self.kv_store.get(CONFIG_CHECKPOINT_KEY) is None

def register_shutdown_cleanup_actor(self, actor_handle: ActorHandle) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest keeping this private. only available to LLM folks, since we are friends. We can do a robust implementation as proposed in #60359 and then open it up to public.

Suggested change
def register_shutdown_cleanup_actor(self, actor_handle: ActorHandle) -> None:
def _register_shutdown_cleanup_actor(self, actor_handle: ActorHandle) -> None:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Done

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

# Conflicts:
#	python/ray/serve/tests/test_standalone.py
Rename to _register_shutdown_cleanup_actor since this is an internal API
for LLM folks only. A more robust public API will be implemented later
per issue ray-project#60359.

Addresses review feedback from Abrar on ray-project#60067.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji
Copy link
Copy Markdown
Contributor Author

eicherseiji commented Feb 3, 2026

Investigating test_get_serve_instance_details_json_serializable test failure

Passed on retry, port cleanup issue unrelated to this change

@abrarsheikh abrarsheikh merged commit e1c4ddb into ray-project:master Feb 3, 2026
6 checks passed
rayhhome pushed a commit to rayhhome/ray that referenced this pull request Feb 4, 2026
…project#60067)

## Summary
Adds a `register_shutdown_cleanup_actor()` method to ServeController
that allows deployments to register auxiliary actors for cleanup on
`serve.shutdown()`.

The LLM prefix-aware router uses this API to register its
PrefixTreeActor, eliminating the Serve → LLM import dependency.

## Changes
- Add `_registered_cleanup_actors` list and
`register_shutdown_cleanup_actor()` to controller
- Add `_kill_registered_cleanup_actors()` called during shutdown
- Update PrefixCacheAffinityRouter to register tree actor with
controller
- Replace old test with integration test for the new API

## Limitations (stopgap solution)
- Actors only cleaned on `serve.shutdown()`, not deployment deletion
- Registration not persisted across controller restarts
- Full lifecycle management: see ray-project#60359

## Checks

- [x] I've signed off every commit
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference.
- [x] I've made sure the tests are passing.
- Testing Strategy
   - [x] Unit tests
   - [x] Integration test

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Sparks0219 pushed a commit to Sparks0219/ray that referenced this pull request Feb 9, 2026
…project#60067)

## Summary
Adds a `register_shutdown_cleanup_actor()` method to ServeController
that allows deployments to register auxiliary actors for cleanup on
`serve.shutdown()`.

The LLM prefix-aware router uses this API to register its
PrefixTreeActor, eliminating the Serve → LLM import dependency.

## Changes
- Add `_registered_cleanup_actors` list and
`register_shutdown_cleanup_actor()` to controller
- Add `_kill_registered_cleanup_actors()` called during shutdown
- Update PrefixCacheAffinityRouter to register tree actor with
controller
- Replace old test with integration test for the new API

## Limitations (stopgap solution)
- Actors only cleaned on `serve.shutdown()`, not deployment deletion
- Registration not persisted across controller restarts
- Full lifecycle management: see ray-project#60359

## Checks

- [x] I've signed off every commit
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference.
- [x] I've made sure the tests are passing.
- Testing Strategy
   - [x] Unit tests
   - [x] Integration test

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
## Summary
Adds a `register_shutdown_cleanup_actor()` method to ServeController
that allows deployments to register auxiliary actors for cleanup on
`serve.shutdown()`.

The LLM prefix-aware router uses this API to register its
PrefixTreeActor, eliminating the Serve → LLM import dependency.

## Changes
- Add `_registered_cleanup_actors` list and
`register_shutdown_cleanup_actor()` to controller
- Add `_kill_registered_cleanup_actors()` called during shutdown
- Update PrefixCacheAffinityRouter to register tree actor with
controller
- Replace old test with integration test for the new API

## Limitations (stopgap solution)
- Actors only cleaned on `serve.shutdown()`, not deployment deletion
- Registration not persisted across controller restarts
- Full lifecycle management: see #60359

## Checks

- [x] I've signed off every commit
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference.
- [x] I've made sure the tests are passing.
- Testing Strategy
   - [x] Unit tests
   - [x] Integration test

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
## Summary
Adds a `register_shutdown_cleanup_actor()` method to ServeController
that allows deployments to register auxiliary actors for cleanup on
`serve.shutdown()`.

The LLM prefix-aware router uses this API to register its
PrefixTreeActor, eliminating the Serve → LLM import dependency.

## Changes
- Add `_registered_cleanup_actors` list and
`register_shutdown_cleanup_actor()` to controller
- Add `_kill_registered_cleanup_actors()` called during shutdown
- Update PrefixCacheAffinityRouter to register tree actor with
controller
- Replace old test with integration test for the new API

## Limitations (stopgap solution)
- Actors only cleaned on `serve.shutdown()`, not deployment deletion
- Registration not persisted across controller restarts
- Full lifecycle management: see #60359

## Checks

- [x] I've signed off every commit
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference.
- [x] I've made sure the tests are passing.
- Testing Strategy
   - [x] Unit tests
   - [x] Integration test

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…project#60067)

## Summary
Adds a `register_shutdown_cleanup_actor()` method to ServeController
that allows deployments to register auxiliary actors for cleanup on
`serve.shutdown()`.

The LLM prefix-aware router uses this API to register its
PrefixTreeActor, eliminating the Serve → LLM import dependency.

## Changes
- Add `_registered_cleanup_actors` list and
`register_shutdown_cleanup_actor()` to controller
- Add `_kill_registered_cleanup_actors()` called during shutdown
- Update PrefixCacheAffinityRouter to register tree actor with
controller
- Replace old test with integration test for the new API

## Limitations (stopgap solution)
- Actors only cleaned on `serve.shutdown()`, not deployment deletion
- Registration not persisted across controller restarts
- Full lifecycle management: see ray-project#60359

## Checks

- [x] I've signed off every commit
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference.
- [x] I've made sure the tests are passing.
- Testing Strategy
   - [x] Unit tests
   - [x] Integration test

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…project#60067)

## Summary
Adds a `register_shutdown_cleanup_actor()` method to ServeController
that allows deployments to register auxiliary actors for cleanup on
`serve.shutdown()`.

The LLM prefix-aware router uses this API to register its
PrefixTreeActor, eliminating the Serve → LLM import dependency.

## Changes
- Add `_registered_cleanup_actors` list and
`register_shutdown_cleanup_actor()` to controller
- Add `_kill_registered_cleanup_actors()` called during shutdown
- Update PrefixCacheAffinityRouter to register tree actor with
controller
- Replace old test with integration test for the new API

## Limitations (stopgap solution)
- Actors only cleaned on `serve.shutdown()`, not deployment deletion
- Registration not persisted across controller restarts
- Full lifecycle management: see ray-project#60359

## Checks

- [x] I've signed off every commit
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference.
- [x] I've made sure the tests are passing.
- Testing Strategy
   - [x] Unit tests
   - [x] Integration test

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…project#60067)

## Summary
Adds a `register_shutdown_cleanup_actor()` method to ServeController
that allows deployments to register auxiliary actors for cleanup on
`serve.shutdown()`.

The LLM prefix-aware router uses this API to register its
PrefixTreeActor, eliminating the Serve → LLM import dependency.

## Changes
- Add `_registered_cleanup_actors` list and
`register_shutdown_cleanup_actor()` to controller
- Add `_kill_registered_cleanup_actors()` called during shutdown
- Update PrefixCacheAffinityRouter to register tree actor with
controller
- Replace old test with integration test for the new API

## Limitations (stopgap solution)
- Actors only cleaned on `serve.shutdown()`, not deployment deletion
- Registration not persisted across controller restarts
- Full lifecycle management: see ray-project#60359

## Checks

- [x] I've signed off every commit
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference.
- [x] I've made sure the tests are passing.
- Testing Strategy
   - [x] Unit tests
   - [x] Integration test

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants