[Serve][3/N] Add application-level autoscaling snapshot#59995
[Serve][3/N] Add application-level autoscaling snapshot#59995nadongjun wants to merge 19 commits intoray-project:masterfrom
Conversation
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request extends autoscaling observability to the application level by introducing ApplicationSnapshot logs. The changes are well-structured, reusing existing patterns from deployment-level snapshots, and include comprehensive tests. I've identified a couple of areas for improvement to enhance code clarity and correctness. Overall, this is a solid addition to Ray Serve's observability features.
|
@nadongjun can you please resolve the merge conflicts? |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
@harshit-anyscale Resolved and pushed. Thanks! |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
…name/deployment_name Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
@nadongjun test seems to be failing(link), can you take a look at them? |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
@nadongjun can you take a look at the failing anyscale docs builder step & remaining unresolved comments? i think only few are remaining, let me know if you need any help - happy to jump in! |
|
@harshit-anyscale Thanks for the follow-up. Regarding the errors field: It’s not being populated yet, but I kept it in the data structure for future observability/error-tracking. I think it’s better for future-proofing, but let me know if you’d rather have it removed for now. Also, the docs build error seems to be resolved in the latest CI. Ready for another look! |
@nadongjun I think its better to remove |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
@harshit-anyscale Agreed. I've gone ahead and removed it as suggested. |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com> # Conflicts: # python/ray/serve/_private/common.py
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
ping |
|
@nadongjun please fix the merge conflcits. |
|
@harshit-anyscale I'm working on performance fixes in PR #61611. I'll resolve the conflicts and apply those improvements here once that PR is merged. |
Description
Add application-level autoscaling snapshot support for observability.
This PR extends the existing deployment-level autoscaling snapshot feature (PR #56225) to support application-level autoscaling. When an app-level autoscaling policy is configured, the controller now emits
ApplicationSnapshotlogs containing aggregated metrics across all deployments in the application.Related issues
Related to #55833
Additional information
bash % cat /tmp/ray/session_latest/logs/serve/autoscaling_snapshot_6668.log {"asctime": "2026-01-09 13:56:19,481", "levelname": "INFO", "message": "{'snapshots': [{'snapshot_type': 'application', 'timestamp_str': '2026-01-09T04:56:19Z', 'app': 'app_snap_1767934578', 'num_deployments': 2, 'total_current_replicas': 0, 'total_target_replicas': 2, 'scaling_status': 'scaling up', 'policy_name': 'ray.serve.tests.test_controller.simple_app_policy_for_test', 'errors': []}]}", "filename": "controller.py", "lineno": 511, "process": 6668, "timestamp_ns": 1767934579481838000} {"asctime": "2026-01-09 13:56:19,999", "levelname": "INFO", "message": "{'snapshots': [{'snapshot_type': 'application', 'timestamp_str': '2026-01-09T04:56:19Z', 'app': 'app_snap_1767934578', 'num_deployments': 2, 'total_current_replicas': 2, 'total_target_replicas': 2, 'scaling_status': 'stable', 'policy_name': 'ray.serve.tests.test_controller.simple_app_policy_for_test', 'errors': []}]}", "filename": "controller.py", "lineno": 511, "process": 6668, "timestamp_ns": 1767934579999085000}