[BUG] Impossible to prebuild w/o having at least one gpu

**Describe the bug**

The goal is to cross-compile on a VM that has no GPUs for other VMs that have GPUs (preparing docker images for CIs). Currently the prebuilding fails.

**To Reproduce**
Steps to reproduce the behavior:

Normally when gpus are enabled all work., Now let's emulate no GPUs installed by adding `CUDA_VISIBLE_DEVICES=` and repeat the same prebuild command - and then it fails.
```
$ CUDA_VISIBLE_DEVICES= TORCH_CUDA_ARCH_LIST="6.1;8.0;8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1  DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1 
WARNING: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
Using pip 22.0.4 from /home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/pip (python 3.8)
Obtaining file:///mnt/nvme0/code/github/00optimize/deepspeed
  Preparing metadata (setup.py): started
  Running command python setup.py egg_info
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/mnt/nvme0/code/github/00optimize/deepspeed/setup.py", line 238, in <module>
      bf16_support = torch.cuda.is_bf16_supported()
    File "/home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/cuda/__init__.py", line 92, in is_bf16_supported
      return torch.cuda.get_device_properties(torch.cuda.current_device()).major >= 8 and cuda_maj_decide
    File "/home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/cuda/__init__.py", line 481, in current_device
      _lazy_init()
    File "/home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init
      torch._C._cuda_init()
  RuntimeError: No CUDA GPUs are available
  No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.6'
  [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
  DS_BUILD_OPS=0
  Installed CUDA version 11.6 does not match the version torch was compiled with 11.5 but since the APIs are compatible, accepting this combination
  Install Ops={'cpu_adam': 1, 'cpu_adagrad': False, 'fused_adam': False, 'fused_lamb': False, 'sparse_attn': False, 'transformer': False, 'stochastic_transformer': False, 'async_io': 1, 'utils': 1, 'quantizer': False, 'transformer_inference': False}
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.

```


**Expected behavior**

I wonder if perhaps we need a new flag that tells the prebuild not to check if there is an actual GPU installed?

We are using `TORCH_CUDA_ARCH_LIST` to cross-compile for the target gpus.

**ds_report output**


```
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [YES] ...... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [YES] ...... [OKAY]
utils .................. [YES] ...... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/stas/anaconda3/envs/py38-pt111/lib/python3.8/site-packages/torch']
torch version .................... 1.11.0+cu115
torch cuda version ............... 11.5
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/mnt/nvme0/code/github/00optimize/deepspeed/deepspeed']
deepspeed info ................... 0.6.6+828ab718, 828ab718, master
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.5
```

**Screenshots**
If applicable, add screenshots to help explain your problem.

**System info (please complete the following information):**
 - OS: [e.g. Ubuntu 21.10]
 - Python version 3.8

Thank you.

@jeffra 

cc: @ydshieh, who originally reported this


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Impossible to prebuild w/o having at least one gpu #2010

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Impossible to prebuild w/o having at least one gpu #2010

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions