Skip to content

[BUG] Building from source with PyTorch backend on AMD system fails #4008

@Slaedr

Description

@Slaedr

Bug summary

The build system fails to find libamdhip64.so despite setting ROCM_PATH and ROCM_ROOT. It searches for /opt/rocm/lib/libamdhip64.so, whereas ROCM_ROOT is /opt/rocm-6.0.0. I cannot create a sym-link to /opt/rocm.

Additionally, it looks like the PyTorch installed by pip has its own libamdhip64.so. Not sure if that one should be preferred.

DeePMD-kit Version

tag v3.0.0b1

Backend and its version

PyTorch 2.4.0+rocm6.0

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

pip install .
Processing /autofs/home1/akashi/sources/deepmd-kit
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting numpy (from deepmd-kit==3.0.0b1)
  Using cached numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Collecting scipy (from deepmd-kit==3.0.0b1)
  Using cached scipy-1.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Collecting pyyaml (from deepmd-kit==3.0.0b1)
  Using cached PyYAML-6.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting dargs>=0.4.7 (from deepmd-kit==3.0.0b1)
  Using cached dargs-0.4.8-py3-none-any.whl.metadata (11 kB)
Collecting h5py (from deepmd-kit==3.0.0b1)
  Using cached h5py-3.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.5 kB)
Collecting wcmatch (from deepmd-kit==3.0.0b1)
  Using cached wcmatch-8.5.2-py3-none-any.whl.metadata (4.8 kB)
Collecting packaging (from deepmd-kit==3.0.0b1)
  Using cached packaging-24.1-py3-none-any.whl.metadata (3.2 kB)
Collecting ml_dtypes (from deepmd-kit==3.0.0b1)
  Using cached ml_dtypes-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting mendeleev (from deepmd-kit==3.0.0b1)
  Using cached mendeleev-0.17.0-py3-none-any.whl.metadata (20 kB)
Collecting array-api-compat (from deepmd-kit==3.0.0b1)
  Using cached array_api_compat-1.7.1-py3-none-any.whl.metadata (1.5 kB)
Collecting typeguard>=4 (from dargs>=0.4.7->deepmd-kit==3.0.0b1)
  Using cached typeguard-4.3.0-py3-none-any.whl.metadata (3.7 kB)
Collecting Pygments<3.0.0,>=2.11.2 (from mendeleev->deepmd-kit==3.0.0b1)
  Using cached pygments-2.18.0-py3-none-any.whl.metadata (2.5 kB)
Collecting SQLAlchemy>=1.4.0 (from mendeleev->deepmd-kit==3.0.0b1)
  Using cached SQLAlchemy-2.0.31-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting colorama<0.5.0,>=0.4.6 (from mendeleev->deepmd-kit==3.0.0b1)
  Using cached colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting numpy (from deepmd-kit==3.0.0b1)
  Using cached numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting pandas<3.0,>=2.1 (from mendeleev->deepmd-kit==3.0.0b1)
  Using cached pandas-2.2.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Collecting pyfiglet<0.9,>=0.8.post1 (from mendeleev->deepmd-kit==3.0.0b1)
  Using cached pyfiglet-0.8.post1-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting bracex>=2.1.1 (from wcmatch->deepmd-kit==3.0.0b1)
  Using cached bracex-2.4-py3-none-any.whl.metadata (3.6 kB)
Collecting python-dateutil>=2.8.2 (from pandas<3.0,>=2.1->mendeleev->deepmd-kit==3.0.0b1)
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting pytz>=2020.1 (from pandas<3.0,>=2.1->mendeleev->deepmd-kit==3.0.0b1)
  Using cached pytz-2024.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas<3.0,>=2.1->mendeleev->deepmd-kit==3.0.0b1)
  Using cached tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)
Requirement already satisfied: typing-extensions>=4.6.0 in /lustre/world-share/stf218/akashi/miniconda3/envs/uq4mat_pt/lib/python3.12/site-packages (from SQLAlchemy>=1.4.0->mendeleev->deepmd-kit==3.
0.0b1) (4.9.0)
Collecting greenlet!=0.4.17 (from SQLAlchemy>=1.4.0->mendeleev->deepmd-kit==3.0.0b1)
  Using cached greenlet-3.0.3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (3.8 kB)
Collecting typing-extensions>=4.6.0 (from SQLAlchemy>=1.4.0->mendeleev->deepmd-kit==3.0.0b1)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas<3.0,>=2.1->mendeleev->deepmd-kit==3.0.0b1)
  Using cached six-1.16.0-py2.py3-none-any.whl.metadata (1.8 kB)
Using cached dargs-0.4.8-py3-none-any.whl (26 kB)
Using cached array_api_compat-1.7.1-py3-none-any.whl (37 kB)
Using cached h5py-3.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)
Using cached mendeleev-0.17.0-py3-none-any.whl (367 kB)
Using cached numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.0 MB)
Using cached ml_dtypes-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)
Using cached packaging-24.1-py3-none-any.whl (53 kB)
Using cached PyYAML-6.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (724 kB)
Using cached scipy-1.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.8 MB)
Using cached wcmatch-8.5.2-py3-none-any.whl (39 kB)
Using cached bracex-2.4-py3-none-any.whl (11 kB)
Using cached colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Using cached pandas-2.2.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)
Using cached pyfiglet-0.8.post1-py2.py3-none-any.whl (865 kB)
Using cached pygments-2.18.0-py3-none-any.whl (1.2 MB)
Using cached SQLAlchemy-2.0.31-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Using cached typeguard-4.3.0-py3-none-any.whl (35 kB)
Using cached greenlet-3.0.3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (625 kB)
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Using cached pytz-2024.1-py2.py3-none-any.whl (505 kB)
Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Using cached tzdata-2024.1-py2.py3-none-any.whl (345 kB)
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Building wheels for collected packages: deepmd-kit
  Building wheel for deepmd-kit (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for deepmd-kit (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [74 lines of output]
      *** scikit-build-core 0.8.2 using CMake 3.30.1 (wheel)
      *** Configuring CMake...
      2024-07-23 12:36:44,357 - scikit_build_core - WARNING - libdir/ldlibrary: /lustre/world-share/stf218/akashi/miniconda3/envs/uq4mat_pt/lib/libpython3.12.a is not a real file!
      2024-07-23 12:36:44,357 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/lustre/world-share/stf218/akashi/miniconda3/envs/uq4mat_pt/lib, ldlibrary=libpython3.12.a, mult
iarch=x86_64-linux-gnu, masd=None
      loading initial cache file build/py37-none-manylinux_2_31_x86_64/CMakeInit.txt
      -- Cray Programming Environment 2.7.31 C
      -- Cray Programming Environment 2.7.31 CXX
      -- Supported model version: 1.1
      -- Will not build nv GPU support
      -- The HIP compiler identification is Clang 17.0.0
      -- Found ROCM in /opt/rocm-6.0.0, build AMD GPU support
      /opt/rocm-6.0.0/bin/rocm_agent_enumerator:95: SyntaxWarning: invalid escape sequence '\w'
        @staticVars(search_name=re.compile("gfx[0-9a-fA-F]+(:[-+:\w]+)?"))
      /opt/rocm-6.0.0/bin/rocm_agent_enumerator:152: SyntaxWarning: invalid escape sequence '\A'
        line_search_term = re.compile("\A\s+Name:\s+(amdgcn-amd-amdhsa--gfx\d+)")
      /opt/rocm-6.0.0/bin/rocm_agent_enumerator:154: SyntaxWarning: invalid escape sequence '\A'
        line_search_term = re.compile("\A\s+Name:\s+(gfx\d+)")
      /opt/rocm-6.0.0/bin/rocm_agent_enumerator:175: SyntaxWarning: invalid escape sequence '\w'
        target_search_term = re.compile("1002:\w+")
      Building PyTorch for GPU arch: gfx90a
      HIP VERSION: 6.0.32830-d62f6a171
      -- Caffe2: Header version is: 6.0.0
      
      ***** ROCm version from rocm_version.h ****
      
      ROCM_VERSION_DEV: 6.0.0
      ROCM_VERSION_DEV_MAJOR: 6
      ROCM_VERSION_DEV_MINOR: 0
      ROCM_VERSION_DEV_PATCH: 0
      ROCM_VERSION_DEV_INT:   60000
      HIP_VERSION_MAJOR: 6
      HIP_VERSION_MINOR: 0
      TORCH_HIP_VERSION: 600
      
      ***** Library versions from dpkg *****
      
      
      ***** Library versions from cmake find_package *****
      
      hip VERSION: 6.0.23494
      hsa-runtime64 VERSION: 1.12.60000
      amd_comgr VERSION: 2.6.0
      rocrand VERSION: 2.10.17
      hiprand VERSION: 2.10.16
      rocblas VERSION: 4.0.0
      hipblas VERSION: 2.0.0
      hipblaslt VERSION: 0.6.0
      miopen VERSION: 3.00.0
      hipfft VERSION: 1.0.12
      hipsparse VERSION: 3.0.0
      rccl VERSION: 2.18.3
      rocprim VERSION: 3.0.0
      hipcub VERSION: 3.0.0
      rocthrust VERSION: 3.0.0
      hipsolver VERSION: 2.0.0
      HIP is using new type enums
      CMake Warning at /lustre/world-share/stf218/akashi/miniconda3/envs/uq4mat_pt/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
        static library kineto_LIBRARY-NOTFOUND not found.
      Call Stack (most recent call first):
        /lustre/world-share/stf218/akashi/miniconda3/envs/uq4mat_pt/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found)
        CMakeLists.txt:189 (find_package)
      
      
      -- PyTorch CXX11 ABI: 0
      -- Enabled backends:
      -- - PyTorch
      -- HIP major version is 6
      -- Configuring done (3.8s)
      -- Generating done (0.1s)
      -- Build files have been written to: /home/akashi/sources/deepmd-kit/build/py37-none-manylinux_2_31_x86_64
      *** Building project with Ninja...
      ninja: error: '/opt/rocm/lib/libamdhip64.so', needed by 'op/pt/libdeepmd_op_pt.so', missing and no known rule to make it
      
      *** CMake build failed
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for deepmd-kit
Failed to build deepmd-kit
ERROR: Could not build wheels for deepmd-kit, which is required to install pyproject.toml-based projects

Steps to Reproduce

With ROCm installed on anything other than /opt/rocm, attempt to build DeePMD with Pytorch backend on AMD system from source as detailed here: https://docs.deepmodeling.com/projects/deepmd/en/v3.0.0b1/install/install-from-source.html

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions