Skip to content

Disabling ibverbs fails #5504

@wscullin

Description

@wscullin

I think #5264 may not go far enough. In environments like the Cray XC40/XC50, Intel OmniPath clusters, some Cisco usNIC clusters, there is an IB fabric in places to carry file I/O traffic, but not MPI/IP/other IPC traffic. This means we want to be able to fully disable the use of IB verbs in these environments. We tried:

export WITH_GLOO_IBVERBS=0
export USE_IBVERBS=0
WITH_GLOO_IBVERBS=0 python setup.py build --verbose

and still wound up with:

-DWITH_GLOO_IBVERBS=1 and -DUSE_IBVERBS=1

throughout the build.

I can work around by editing the function should_build_ib() to use an check_env_flag to check for WITH_GLOO_IBVERBS or USE_IBVERBS in the environment in pytorch/tools/setup_helpers/dist_check.py, but think I haven't been through enough in my local edits to handle the default case (finding IB when the user doesn't specify one way or another) and am not sure if I've tracked down all possible side-effects.

Environment Information

When submitting a bug report, please include the following information (where relevant):

  • OS: Linux (Cray Linux Environment 6, but also checked in Clear Linux 21060)
  • PyTorch version: Master 377d896
  • How you installed PyTorch (conda, pip, source): source
  • Python version: 2.7.14
  • CUDA/cuDNN version: N/A
  • GPU models and configuration: N/A
  • GCC version (if compiling from source): N/A I'm building with the Intel 2018 compilers which does necessitate a couple of minor patches to array initialization, but that's neither here nor there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions