Skip to content

[Draft] [serve] Prometheus metrics for AutoscalingConfig#56851

Open
arcyleung wants to merge 57 commits intoray-project:masterfrom
arcyleung:prometheus-metrics
Open

[Draft] [serve] Prometheus metrics for AutoscalingConfig#56851
arcyleung wants to merge 57 commits intoray-project:masterfrom
arcyleung:prometheus-metrics

Conversation

@arcyleung
Copy link
Copy Markdown
Contributor

@arcyleung arcyleung commented Sep 23, 2025

Implementation of AutoscalingConfig prometheus_custom_metrics:
image

Why are these changes needed?

Support collection of Prometheus exporter metrics at each replica, and report them to the controller, if the serve config AutoscalingConfig specifies.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Note

Adds optional per-replica Prometheus metric collection for autoscaling, refactors Prometheus helpers into a shared module, and updates proto/config/constants/replica logic and tests.

  • Serve autoscaling (backend):
    • Enable collecting custom Prometheus metrics per replica when AutoscalingConfig.prometheus_metrics is set; fetched at RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST and merged into autoscaling reports.
    • Extend replica metrics manager to asynchronously fetch/filter metrics by replica label and include alongside ongoing-requests/custom metrics.
  • Config/Proto:
    • Add prometheus_metrics to AutoscalingConfig (Python) and new proto messages PrometheusCustomMetrics/PrometheusMetric; map list ↔ proto on (de)serialization.
    • Update JSON surfaces and defaults accordingly.
  • Constants:
    • Introduce RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST env var for exporter host.
  • Prometheus utilities:
    • New python/ray/_private/prometheus_utils.py with fetch/parse/filter helpers and async fetch; remove duplicated logic from test_utils and update imports across tests/benchmarks.
  • Tests/BUILD:
    • Add test_prometheus_autoscaling_metrics.py; adjust existing tests and BUILD to include env and new imports; minor test stability tweaks.

Written by Cursor Bugbot for commit 02a540c. This will update automatically on new commits. Configure here.

Arthur Leung added 23 commits September 15, 2025 21:15
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
@arcyleung arcyleung requested a review from a team as a code owner September 23, 2025 21:40
@arcyleung arcyleung changed the title [serve] Prometheus metrics [serve] Prometheus metrics for AutoscalingConfig Sep 23, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for collecting Prometheus metrics for autoscaling decisions in Ray Serve. The changes include updates to the configuration and protobuf definitions, and new logic in the replica to fetch metrics from a Prometheus endpoint. While the feature is a valuable addition, I've found a few critical issues in the implementation. The logic for fetching metrics is incorrect for labeled metrics (like histograms), which will lead to unexpected behavior. There's also a potential TypeError if Prometheus metrics are not configured. Additionally, the tests for this new feature could be strengthened to catch these kinds of issues. My review includes specific suggestions to address these points.

Comment on lines +22 to +42
def check_autoscaling_metrics_include_prometheus(
client, deployment_id: DeploymentID, expected_metrics: List[str]
) -> bool:
"""Check that autoscaling metrics include the expected prometheus metrics."""

try:
metrics = get_autoscaling_metrics_from_controller(client, deployment_id)
# The metrics should include both ongoing requests and prometheus custom metrics
if not metrics:
print("No metrics returned from controller!")
return False

# For prometheus custom metrics, we expect the keys to be present in the dict
for expected_metric in expected_metrics:
if expected_metric not in metrics:
print(f"Expected metric {expected_metric} not found")
return False
return True
except Exception as e:
print(f"Error checking metrics: {e}")
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test helper check_autoscaling_metrics_include_prometheus and the associated tests are not strong enough. They only check for the presence of the metric key in the results, not the correctness of its value. Because the implementation falls back to 0.0 on any error (including metric not found), the tests will pass even if the metric fetching logic is broken.

To make the tests more robust, they should be updated to:

  1. Use a simple, predictable metric for testing (e.g., a gauge that you can set manually).
  2. Assert that the fetched metric value is correct, not just that the key exists.

This will help catch bugs like the incorrect handling of histogram metrics.

cursor[bot]

This comment was marked as outdated.

Signed-off-by: Arthur Leung <arcyleung@gmail.com>
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation labels Sep 24, 2025
@arcyleung arcyleung changed the title [serve] Prometheus metrics for AutoscalingConfig [Draft] [serve] Prometheus metrics for AutoscalingConfig Oct 20, 2025
Signed-off-by: Arthur Leung <arcyleung@gmail.com>
cursor[bot]

This comment was marked as outdated.

Arthur Leung added 2 commits October 30, 2025 00:16
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Arthur Leung <arcyleung+github@gmail.com>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Arthur Leung <arcyleung@gmail.com>
f"Fetching prometheus metric {metric_name} for deployment {self._deployment_id}"
)
# Call the prometheus query function with the correct signature
result = self._prometheus_query_func("localhost:9090", metric_name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Hardcoded Prometheus Host; needs Configurable Override

The method _get_prometheus_metrics hardcodes "localhost:9090" as the Prometheus server address when calling self._prometheus_query_func("localhost:9090", metric_name). This is inconsistent with how the replica code handles this in replica.py (line 462), which uses the configurable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST environment variable. The hardcoded value should be replaced with a configurable host parameter or environment variable to ensure consistency and allow for flexible Prometheus server configurations.

Fix in Cursor Fix in Web

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 15, 2025
Signed-off-by: Arthur Leung <arcyleung+github@gmail.com>
# Add label selector for this replica
query = f'{metric}{{replica="{self._replica_id.unique_id}"}}'
logger.info(f"Querying prometheus with: {query}")
response = prometheus_handler(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Broken Dependency Injection Prevents Mocking

The _fetch_prometheus_metrics method calls the module-level prometheus_handler function directly instead of using self._prometheus_handler. This breaks the dependency injection pattern established in the constructor and the set_prometheus_handler method, preventing test code from mocking the Prometheus handler as intended. The call should use self._prometheus_handler(...) instead of prometheus_handler(...).

Fix in Cursor Fix in Web

Signed-off-by: Arthur Leung <arcyleung@gmail.com>
prom_addr,
query,
timeout=RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Mocked Handler Ignored, Breaking Tests

The _fetch_prometheus_metrics method calls prometheus_handler instead of self._prometheus_handler. This bypasses the instance variable that can be set via set_prometheus_handler for testing, causing the method to always use the module-level import rather than the potentially mocked handler. This breaks the testing infrastructure where set_prometheus_handler is used to inject mock implementations.

Fix in Cursor Fix in Web

Signed-off-by: Arthur Leung <arcyleung@gmail.com>
event_loop=self._event_loop,
autoscaling_config=self._deployment_config.autoscaling_config,
ingress=ingress,
prometheus_handler=prometheus_handler,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Replica init fails: Unhandled parameter.

The ReplicaBase.__init__ method doesn't accept a prometheus_handler parameter, but line 647 attempts to pass it to create_replica_metrics_manager. When create_replica_impl(**kwargs) is called with prometheus_handler in kwargs, it will fail because ReplicaBase.__init__ doesn't accept this parameter. The prometheus_handler parameter needs to be added to ReplicaBase.__init__'s signature and stored so it can be passed to create_replica_metrics_manager.

Fix in Cursor Fix in Web

# Add label selector for this replica
query = f'{metric}{{replica="{self._replica_id.unique_id}"}}'
logger.info(f"Querying prometheus with: {query}")
response = prometheus_handler(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Prometheus Handler Bypass Breaks Testability

The _fetch_prometheus_metrics method calls prometheus_handler directly instead of self._prometheus_handler. This bypasses the instance variable that can be set via set_prometheus_handler for testing purposes, making it impossible to inject mock handlers. The call should use self._prometheus_handler to respect the configured handler.

Fix in Cursor Fix in Web

Signed-off-by: Arthur Leung <arcyleung@gmail.com>
prom_addr,
query,
timeout=RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Prometheus Tests Hit Real Server

The _fetch_prometheus_metrics method calls prometheus_handler instead of self._prometheus_handler, which prevents the injected test handler from being used. This breaks the testing infrastructure where set_prometheus_handler is used to inject mock handlers, causing tests to call the actual Prometheus server instead of the mock.

Fix in Cursor Fix in Web

# Add label selector for this replica
query = f'{metric}{{replica="{self._replica_id.unique_id}"}}'
logger.info(f"Querying prometheus with: {query}")
response = prometheus_handler(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Prometheus Handler: Mocked Logic Bypassed

The _fetch_prometheus_metrics method uses prometheus_handler (which is fetch_from_prom_server) but passes self._prometheus_handler as a parameter during initialization. However, when calling it on line 474, the code uses the global prometheus_handler directly instead of self._prometheus_handler. This means the injected mock handler for testing won't be used, breaking the testing infrastructure. The call should use self._prometheus_handler instead of the global prometheus_handler.

Fix in Cursor Fix in Web

f"Fetching prometheus metric {metric_name} for deployment {self._deployment_id}"
)
# Call the prometheus query function with the correct signature
result = self._prometheus_query_func("localhost:9090", metric_name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Prometheus Address Inconsistency Breaks Metrics

The Prometheus server address is hardcoded to "localhost:9090" in _get_prometheus_metrics, but replicas use the configurable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST environment variable. This inconsistency means the controller and replicas may query different Prometheus servers, causing metrics collection to fail or return incorrect data. The controller should use the same configurable address as replicas.

Fix in Cursor Fix in Web

Arthur Leung added 2 commits November 18, 2025 17:42
Signed-off-by: Arthur Leung <arcyleung@gmail.com>


# Patch the prometheus_handler at module level
ray.serve._private.replica.prometheus_handler = mock_prometheus_handler
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Prometheus Handler: Module Variable Attribute Error

The test attempts to set ray.serve._private.replica.prometheus_handler but this module-level variable doesn't exist in replica.py. The module only imports default_prometheus_handler from ray._common.prometheus_utils and uses it as a default parameter. This will cause an AttributeError or create a new module attribute that won't affect the actual code behavior since the default parameter value is evaluated at function definition time, not at call time.

Fix in Cursor Fix in Web

"RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE": "0",
"RAY_SERVE_REPLICA_AUTOSCALING_METRIC_RECORD_INTERVAL_S": "0.5",
"RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S": "3",
"RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST": "http://localhost:9090",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Redundant HTTP Prefix Malforms Prometheus URL

The environment variable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST is set to "http://localhost:9090" which includes the http:// prefix. However, fetch_from_prom_server in prometheus_utils.py expects just the hostname:port and prepends http:// itself, resulting in a malformed URL http://http://localhost:9090/api/v1/query that will cause requests to fail.

Fix in Cursor Fix in Web

"RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE": "0",
"RAY_SERVE_REPLICA_AUTOSCALING_METRIC_RECORD_INTERVAL_S": "0.5",
"RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S": "3",
"RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST": "http://localhost:9090",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: URL Prefix Duplication Breaks Prometheus Calls

The environment variable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST is set to "http://localhost:9090" which includes the http:// prefix. However, fetch_from_prom_server in prometheus_utils.py expects just the hostname:port and prepends http:// itself, resulting in a malformed URL http://http://localhost:9090/api/v1/query that will cause requests to fail.

Fix in Cursor Fix in Web

@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Nov 19, 2025
Signed-off-by: Arthur Leung <arcyleung+github@gmail.com>
"RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE": "0",
"RAY_SERVE_REPLICA_AUTOSCALING_METRIC_RECORD_INTERVAL_S": "0.5",
"RAY_SERVE_RECORD_AUTOSCALING_STATS_TIMEOUT_S": "3",
"RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST": "http://localhost:9090",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Double HTTP prefix in Prometheus URL construction

The environment variable RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PROMETHEUS_HOST is set to "http://localhost:9090" in the test configuration, but the fetch_from_prom_server function in prometheus_utils.py already prepends "http://" to the address parameter when constructing the URL. This results in a malformed URL like "http://http://localhost:9090/api/v1/query" with a double HTTP prefix, which will cause Prometheus queries to fail. The environment variable should be set to just "localhost:9090" without the protocol prefix.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community docs An issue or change related to documentation go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants