[tune] fix gpu check#13825
Conversation
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
python/ray/tune/utils/util.py
Outdated
| within GPUtil.getGPUs(). If none, resorts to | ||
| the first item returned from `ray.get_gpu_ids()`. | ||
| gpu_memory_limit (float): If memory usage is below | ||
| gpu_memory_limit (float): If fractional memory usage is below |
There was a problem hiding this comment.
I think the original docstring was correct here. We're comparing absolute memory usage (memoryUsed) vs. the absolute amount we need to have available (gpu_memory_limit). The fact that the default value here is 0.1 might be confusing. Shouldn't we just require passing a GPU limit here? And maybe state that this is memory usage in bytes (if this is in bytes, which I think it is?)
There was a problem hiding this comment.
Can you keep it as a fraction and compare against memoryUtil instead of memoryUsed?
https://github.com/anderskm/gputil/blob/master/GPUtil/GPUtil.py#L50
This is more intuitive for us, particularly since in most cases you just want to say "wait until all the gpu memory is free" or wait_for_gpu(gpu, 1.0).
There was a problem hiding this comment.
so now, the usage is:
wait_for_gpu(target_util=0.1) or wait_for_gpu(target_util=0) (for full blocking)
python/ray/tune/utils/util.py
Outdated
| gpu_id = gpu_id_list[0] | ||
| gpu_object = GPUtil.getGPUs()[gpu_id] | ||
| for i in range(int(retry)): | ||
| gpu_object = GPUtil.getGPUs()[gpu_id] |
There was a problem hiding this comment.
gpu_id is a string, but getGPUs() returns a list. You probably need to compare against the GPU id (which is also an int) returned in the list.
krfricke
left a comment
There was a problem hiding this comment.
Looks good! Just one tiny nit
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
This reverts commit de2a997.
Why are these changes needed?
closes #13486
Related issue number
Checks
scripts/format.shto lint the changes in this PR.