-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Dashboard/Core] Resource list in Cluster Dashboard tab should show only logical GPUs #53641
Copy link
Copy link
Open
Open
Copy link
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weekscoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoredashboardIssues specific to the Ray DashboardIssues specific to the Ray DashboardenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityusability
Description
What happened + What you expected to happen
Bug: Cluster tab head pod resources show 8 GPU, but start params were for 0 GPU, and worker request for 4.
Expected: head node shows 0 GPU on cluster page, matching ray status output and ray start params
(base) ray@ray-serve-llm-raycluster-mzkv9-head-lqnkv:~$ ray status
======== Autoscaler status: 2025-06-07 17:45:06.206068 ========
Node status
---------------------------------------------------------------
Active:
1 node_9928b5bd637a26e42b8068817fb5d853febda9ec8424f4082618d855
1 node_95c758accfddf1542e60fdccb17773d8c866bcd12b1c812c7ea70a80
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Total Usage:
3.0/32.0 CPU (1.0 used of 1.0 reserved in placement groups)
1.0/4.0 GPU (1.0 used of 1.0 reserved in placement groups)
0B/36.00GiB memory
109.95KiB/10.36GiB object_store_memory
Total Constraints:
(no request_resources() constraints)
Total Demands:
(no resource demands)
Versions / Dependencies
Ray 2.46, Python 3.11.11, Kuberay 1.3.0
Cluster: AWS EKS 4x m5.4xlarge, one 8xA100 node (p4d.24xlarge)
Reproduction script
apiVersion: ray.io/v1
kind: RayService
metadata:
name: ray-serve-llm
spec:
serveConfigV2: |
applications:
- name: llms
import_path: ray.serve.llm:build_openai_app
route_prefix: "/"
args:
llm_configs:
- model_loading_config:
model_id: qwen2.5-7b-instruct
model_source: Qwen/Qwen2.5-7B-Instruct
engine_kwargs:
dtype: bfloat16
max_model_len: 1024
device: auto
gpu_memory_utilization: 0.75
deployment_config:
autoscaling_config:
min_replicas: 1
max_replicas: 4
target_ongoing_requests: 64
max_ongoing_requests: 128
rayClusterConfig:
rayVersion: "2.46.0"
headGroupSpec:
rayStartParams:
num-cpus: "0"
num-gpus: "0"
template:
spec:
containers:
- name: ray-head
image: rayproject/ray-llm:2.46.0-py311-cu124
ports:
- containerPort: 8000
name: serve
protocol: TCP
- containerPort: 8080
name: metrics
protocol: TCP
- containerPort: 6379
name: gcs
protocol: TCP
- containerPort: 8265
name: dashboard
protocol: TCP
- containerPort: 10001
name: client
protocol: TCP
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 2
memory: 4Gi
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token
key: hf_token
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 1
numOfHosts: 1
groupName: gpu-group
rayStartParams:
num-gpus: "4"
template:
spec:
containers:
- name: ray-worker
image: rayproject/ray-llm:2.46.0-py311-cu124
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token
key: hf_token
resources:
limits:
cpu: 32
memory: 32Gi
nvidia.com/gpu: "4"
requests:
cpu: 32
memory: 32Gi
nvidia.com/gpu: "4"
---
apiVersion: v1
kind: Secret
metadata:
name: hf-token
type: Opaque
stringData:
hf_token: <your-hf-access-token-value>
Issue Severity
Low: It annoys or frustrates me.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weekscoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoredashboardIssues specific to the Ray DashboardIssues specific to the Ray DashboardenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityusability
