Skip to content

[Data] Make DefaultClusterAutoscalerV2 knobs configurable via environment variables #60004

@bveeramani

Description

@bveeramani

Description

The DefaultClusterAutoscalerV2 class has several tunable parameters (utilization threshold, scaling delta, timing intervals) that are currently only configurable via constructor arguments. These should also be configurable via environment variables using env_integer/env_float, following the established pattern in Ray Data.

Background

Ray Data's DefaultClusterAutoscalerV2 is the cluster autoscaler that monitors resource utilization and requests additional nodes from Ray's autoscaler when utilization exceeds a threshold.

The class has these configurable knobs defined as class constants (lines 72-85):

# Default cluster utilization threshold to trigger scaling up.
DEFAULT_CLUSTER_SCALING_UP_UTIL_THRESHOLD: float = 0.75
# Default interval in seconds to check cluster utilization.
DEFAULT_CLUSTER_UTIL_CHECK_INTERVAL_S: float = 0.25
# Default time window in seconds to calculate the average of cluster utilization.
DEFAULT_CLUSTER_UTIL_AVG_WINDOW_S: int = 10
# Default number of nodes to add per node type.
DEFAULT_CLUSTER_SCALING_UP_DELTA: int = 1

# Min number of seconds between two autoscaling requests.
MIN_GAP_BETWEEN_AUTOSCALING_REQUESTS = 10
# The time in seconds after which an autoscaling request will expire.
AUTOSCALING_REQUEST_EXPIRE_TIME_S = 180

Ray Data already has a well-established pattern for environment variable configuration using helper functions. See DefaultAutoscalingCoordinator for a reference:

from ray.autoscaler._private.constants import env_integer

class DefaultAutoscalingCoordinator(AutoscalingCoordinator):
    AUTOSCALING_REQUEST_GET_TIMEOUT_S = env_integer(
        "RAY_DATA_AUTOSCALING_COORDINATOR_REQUEST_GET_TIMEOUT_S", 5
    )
    MAX_CONSECUTIVE_FAILURES = env_integer(
        "RAY_DATA_AUTOSCALING_COORDINATOR_MAX_CONSECUTIVE_FAILURES", 10
    )

Implementation Boundaries & Constraints

  • Target File: python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py

  • Do Not Touch: The __init__ signature should remain backward compatible (existing constructor parameters should still work). The instantiation logic in python/ray/data/_internal/cluster_autoscaler/__init__.py does not need modification.

  • Mandatory Pattern: Import env_integer and env_float from ray.autoscaler._private.constants (as done in default_autoscaling_coordinator.py), then replace hardcoded class constants with environment variable-aware versions:

Contributing expectations

Please refer to the Ray Data Contributing Guide for development setup and contribution workflow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to Ray

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions