[Draft] [serve] Prometheus metrics for AutoscalingConfig#56851
[Draft] [serve] Prometheus metrics for AutoscalingConfig#56851arcyleung wants to merge 57 commits intoray-project:masterfrom
Conversation
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request adds support for collecting Prometheus metrics for autoscaling decisions in Ray Serve. The changes include updates to the configuration and protobuf definitions, and new logic in the replica to fetch metrics from a Prometheus endpoint. While the feature is a valuable addition, I've found a few critical issues in the implementation. The logic for fetching metrics is incorrect for labeled metrics (like histograms), which will lead to unexpected behavior. There's also a potential TypeError if Prometheus metrics are not configured. Additionally, the tests for this new feature could be strengthened to catch these kinds of issues. My review includes specific suggestions to address these points.
| def check_autoscaling_metrics_include_prometheus( | ||
| client, deployment_id: DeploymentID, expected_metrics: List[str] | ||
| ) -> bool: | ||
| """Check that autoscaling metrics include the expected prometheus metrics.""" | ||
|
|
||
| try: | ||
| metrics = get_autoscaling_metrics_from_controller(client, deployment_id) | ||
| # The metrics should include both ongoing requests and prometheus custom metrics | ||
| if not metrics: | ||
| print("No metrics returned from controller!") | ||
| return False | ||
|
|
||
| # For prometheus custom metrics, we expect the keys to be present in the dict | ||
| for expected_metric in expected_metrics: | ||
| if expected_metric not in metrics: | ||
| print(f"Expected metric {expected_metric} not found") | ||
| return False | ||
| return True | ||
| except Exception as e: | ||
| print(f"Error checking metrics: {e}") | ||
| return False |
There was a problem hiding this comment.
The test helper check_autoscaling_metrics_include_prometheus and the associated tests are not strong enough. They only check for the presence of the metric key in the results, not the correctness of its value. Because the implementation falls back to 0.0 on any error (including metric not found), the tests will pass even if the metric fetching logic is broken.
To make the tests more robust, they should be updated to:
- Use a simple, predictable metric for testing (e.g., a gauge that you can set manually).
- Assert that the fetched metric value is correct, not just that the key exists.
This will help catch bugs like the incorrect handling of histogram metrics.
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung+github@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
| f"Fetching prometheus metric {metric_name} for deployment {self._deployment_id}" | ||
| ) | ||
| # Call the prometheus query function with the correct signature | ||
| result = self._prometheus_query_func("localhost:9090", metric_name) |
There was a problem hiding this comment.
Bug: Hardcoded Prometheus Host; needs Configurable Override
The method _get_prometheus_metrics hardcodes "localhost:9090" as the Prometheus server address when calling self._prometheus_query_func("localhost:9090", metric_name). This is inconsistent with how the replica code handles this in replica.py (line 462), which uses the configurable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST environment variable. The hardcoded value should be replaced with a configurable host parameter or environment variable to ensure consistency and allow for flexible Prometheus server configurations.
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Signed-off-by: Arthur Leung <arcyleung+github@gmail.com>
python/ray/serve/_private/replica.py
Outdated
| # Add label selector for this replica | ||
| query = f'{metric}{{replica="{self._replica_id.unique_id}"}}' | ||
| logger.info(f"Querying prometheus with: {query}") | ||
| response = prometheus_handler( |
There was a problem hiding this comment.
Bug: Broken Dependency Injection Prevents Mocking
The _fetch_prometheus_metrics method calls the module-level prometheus_handler function directly instead of using self._prometheus_handler. This breaks the dependency injection pattern established in the constructor and the set_prometheus_handler method, preventing test code from mocking the Prometheus handler as intended. The call should use self._prometheus_handler(...) instead of prometheus_handler(...).
| prom_addr, | ||
| query, | ||
| timeout=RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S, | ||
| ) |
There was a problem hiding this comment.
Bug: Mocked Handler Ignored, Breaking Tests
The _fetch_prometheus_metrics method calls prometheus_handler instead of self._prometheus_handler. This bypasses the instance variable that can be set via set_prometheus_handler for testing, causing the method to always use the module-level import rather than the potentially mocked handler. This breaks the testing infrastructure where set_prometheus_handler is used to inject mock implementations.
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
| event_loop=self._event_loop, | ||
| autoscaling_config=self._deployment_config.autoscaling_config, | ||
| ingress=ingress, | ||
| prometheus_handler=prometheus_handler, |
There was a problem hiding this comment.
Bug: Replica init fails: Unhandled parameter.
The ReplicaBase.__init__ method doesn't accept a prometheus_handler parameter, but line 647 attempts to pass it to create_replica_metrics_manager. When create_replica_impl(**kwargs) is called with prometheus_handler in kwargs, it will fail because ReplicaBase.__init__ doesn't accept this parameter. The prometheus_handler parameter needs to be added to ReplicaBase.__init__'s signature and stored so it can be passed to create_replica_metrics_manager.
python/ray/serve/_private/replica.py
Outdated
| # Add label selector for this replica | ||
| query = f'{metric}{{replica="{self._replica_id.unique_id}"}}' | ||
| logger.info(f"Querying prometheus with: {query}") | ||
| response = prometheus_handler( |
There was a problem hiding this comment.
Bug: Prometheus Handler Bypass Breaks Testability
The _fetch_prometheus_metrics method calls prometheus_handler directly instead of self._prometheus_handler. This bypasses the instance variable that can be set via set_prometheus_handler for testing purposes, making it impossible to inject mock handlers. The call should use self._prometheus_handler to respect the configured handler.
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
| prom_addr, | ||
| query, | ||
| timeout=RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S, | ||
| ) |
There was a problem hiding this comment.
Bug: Prometheus Tests Hit Real Server
The _fetch_prometheus_metrics method calls prometheus_handler instead of self._prometheus_handler, which prevents the injected test handler from being used. This breaks the testing infrastructure where set_prometheus_handler is used to inject mock handlers, causing tests to call the actual Prometheus server instead of the mock.
python/ray/serve/_private/replica.py
Outdated
| # Add label selector for this replica | ||
| query = f'{metric}{{replica="{self._replica_id.unique_id}"}}' | ||
| logger.info(f"Querying prometheus with: {query}") | ||
| response = prometheus_handler( |
There was a problem hiding this comment.
Bug: Prometheus Handler: Mocked Logic Bypassed
The _fetch_prometheus_metrics method uses prometheus_handler (which is fetch_from_prom_server) but passes self._prometheus_handler as a parameter during initialization. However, when calling it on line 474, the code uses the global prometheus_handler directly instead of self._prometheus_handler. This means the injected mock handler for testing won't be used, breaking the testing infrastructure. The call should use self._prometheus_handler instead of the global prometheus_handler.
| f"Fetching prometheus metric {metric_name} for deployment {self._deployment_id}" | ||
| ) | ||
| # Call the prometheus query function with the correct signature | ||
| result = self._prometheus_query_func("localhost:9090", metric_name) |
There was a problem hiding this comment.
Bug: Prometheus Address Inconsistency Breaks Metrics
The Prometheus server address is hardcoded to "localhost:9090" in _get_prometheus_metrics, but replicas use the configurable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST environment variable. This inconsistency means the controller and replicas may query different Prometheus servers, causing metrics collection to fail or return incorrect data. The controller should use the same configurable address as replicas.
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
…into prometheus-metrics
|
|
||
|
|
||
| # Patch the prometheus_handler at module level | ||
| ray.serve._private.replica.prometheus_handler = mock_prometheus_handler |
There was a problem hiding this comment.
Bug: Prometheus Handler: Module Variable Attribute Error
The test attempts to set ray.serve._private.replica.prometheus_handler but this module-level variable doesn't exist in replica.py. The module only imports default_prometheus_handler from ray._common.prometheus_utils and uses it as a default parameter. This will cause an AttributeError or create a new module attribute that won't affect the actual code behavior since the default parameter value is evaluated at function definition time, not at call time.
| "RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE": "0", | ||
| "RAY_SERVE_REPLICA_AUTOSCALING_METRIC_RECORD_INTERVAL_S": "0.5", | ||
| "RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S": "3", | ||
| "RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST": "http://localhost:9090", |
There was a problem hiding this comment.
Bug: Redundant HTTP Prefix Malforms Prometheus URL
The environment variable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST is set to "http://localhost:9090" which includes the http:// prefix. However, fetch_from_prom_server in prometheus_utils.py expects just the hostname:port and prepends http:// itself, resulting in a malformed URL http://http://localhost:9090/api/v1/query that will cause requests to fail.
| "RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE": "0", | ||
| "RAY_SERVE_REPLICA_AUTOSCALING_METRIC_RECORD_INTERVAL_S": "0.5", | ||
| "RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S": "3", | ||
| "RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST": "http://localhost:9090", |
There was a problem hiding this comment.
Bug: URL Prefix Duplication Breaks Prometheus Calls
The environment variable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST is set to "http://localhost:9090" which includes the http:// prefix. However, fetch_from_prom_server in prometheus_utils.py expects just the hostname:port and prepends http:// itself, resulting in a malformed URL http://http://localhost:9090/api/v1/query that will cause requests to fail.
Signed-off-by: Arthur Leung <arcyleung+github@gmail.com>
| "RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE": "0", | ||
| "RAY_SERVE_REPLICA_AUTOSCALING_METRIC_RECORD_INTERVAL_S": "0.5", | ||
| "RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S": "3", | ||
| "RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST": "http://localhost:9090", |
There was a problem hiding this comment.
Bug: Double HTTP prefix in Prometheus URL construction
The environment variable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST is set to "http://localhost:9090" in the test configuration, but the fetch_from_prom_server function in prometheus_utils.py already prepends "http://" to the address parameter when constructing the URL. This results in a malformed URL like "http://http://localhost:9090/api/v1/query" with a double HTTP prefix, which will cause Prometheus queries to fail. The environment variable should be set to just "localhost:9090" without the protocol prefix.
Implementation of AutoscalingConfig prometheus_custom_metrics:

Why are these changes needed?
Support collection of Prometheus exporter metrics at each replica, and report them to the controller, if the serve config AutoscalingConfig specifies.
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.Note
Adds optional per-replica Prometheus metric collection for autoscaling, refactors Prometheus helpers into a shared module, and updates proto/config/constants/replica logic and tests.
AutoscalingConfig.prometheus_metricsis set; fetched atRAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOSTand merged into autoscaling reports.replicalabel and include alongside ongoing-requests/custom metrics.prometheus_metricstoAutoscalingConfig(Python) and new proto messagesPrometheusCustomMetrics/PrometheusMetric; map list ↔ proto on (de)serialization.RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOSTenv var for exporter host.python/ray/_private/prometheus_utils.pywith fetch/parse/filter helpers and async fetch; remove duplicated logic fromtest_utilsand update imports across tests/benchmarks.test_prometheus_autoscaling_metrics.py; adjust existing tests and BUILD to include env and new imports; minor test stability tweaks.Written by Cursor Bugbot for commit 02a540c. This will update automatically on new commits. Configure here.