Skip to content

[tune] GPU utilization getting checked for cluster with no gpu #7349

@waldroje

Description

@waldroje

What is the problem?

Trivial issue, which does not actually impact performance as far as I can tell, but it seems if you launch a cluster with a configuration that sets gpus = 0, then there might be a check which avoids this error message...

(pid=4628) Exception in thread Thread-2: 
(pid=4628) Traceback (most recent call last):
(pid=4628)   File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
(pid=4628)     self.run()
(pid=4628)   File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/tune/utils/util.py", line 89, in run
(pid=4628)     self._read_utilization()
(pid=4628)   File "/home/ubuntu/algo/lib/python3.6/site-packages/ray/tune/utils/util.py", line 65, in _read_utilization
(pid=4628)     for gpu in GPUtil.getGPUs():
(pid=4628)   File "/home/ubuntu/algo/lib/python3.6/site-packages/GPUtil/GPUtil.py", line 102, in getGPUs
(pid=4628)     deviceIds = int(vals[i])
(pid=4628) ValueError: invalid literal for int() with base 10: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make  sure that the latest NVIDIA driver is installed and running."

Ray version and other system information (Python version, TensorFlow version, OS):
Ray 0.8.1, TF 2.1, RHEL 7.7

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

If we cannot run your script, we cannot fix your issue.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'ttuneTune-related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions