-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Description
Description
The DefaultClusterAutoscalerV2 class has several tunable parameters (utilization threshold, scaling delta, timing intervals) that are currently only configurable via constructor arguments. These should also be configurable via environment variables using env_integer/env_float, following the established pattern in Ray Data.
Background
Ray Data's DefaultClusterAutoscalerV2 is the cluster autoscaler that monitors resource utilization and requests additional nodes from Ray's autoscaler when utilization exceeds a threshold.
The class has these configurable knobs defined as class constants (lines 72-85):
# Default cluster utilization threshold to trigger scaling up.
DEFAULT_CLUSTER_SCALING_UP_UTIL_THRESHOLD: float = 0.75
# Default interval in seconds to check cluster utilization.
DEFAULT_CLUSTER_UTIL_CHECK_INTERVAL_S: float = 0.25
# Default time window in seconds to calculate the average of cluster utilization.
DEFAULT_CLUSTER_UTIL_AVG_WINDOW_S: int = 10
# Default number of nodes to add per node type.
DEFAULT_CLUSTER_SCALING_UP_DELTA: int = 1
# Min number of seconds between two autoscaling requests.
MIN_GAP_BETWEEN_AUTOSCALING_REQUESTS = 10
# The time in seconds after which an autoscaling request will expire.
AUTOSCALING_REQUEST_EXPIRE_TIME_S = 180Ray Data already has a well-established pattern for environment variable configuration using helper functions. See DefaultAutoscalingCoordinator for a reference:
from ray.autoscaler._private.constants import env_integer
class DefaultAutoscalingCoordinator(AutoscalingCoordinator):
AUTOSCALING_REQUEST_GET_TIMEOUT_S = env_integer(
"RAY_DATA_AUTOSCALING_COORDINATOR_REQUEST_GET_TIMEOUT_S", 5
)
MAX_CONSECUTIVE_FAILURES = env_integer(
"RAY_DATA_AUTOSCALING_COORDINATOR_MAX_CONSECUTIVE_FAILURES", 10
)Implementation Boundaries & Constraints
-
Target File:
python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py -
Do Not Touch: The
__init__signature should remain backward compatible (existing constructor parameters should still work). The instantiation logic inpython/ray/data/_internal/cluster_autoscaler/__init__.pydoes not need modification. -
Mandatory Pattern: Import
env_integerandenv_floatfromray.autoscaler._private.constants(as done indefault_autoscaling_coordinator.py), then replace hardcoded class constants with environment variable-aware versions:
Contributing expectations
Please refer to the Ray Data Contributing Guide for development setup and contribution workflow.