Skip to content

Latest commit

 

History

History

README.md

hipTensor

Note

The published documentation is available at hipTensor in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the projects/hiptensor/docs folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see Contribute to ROCm documentation.

Welcome! hiptensor is AMD's C++ library for accelerating tensor primitives using GPU matrix cores on AMD's latest discrete GPUs.

Requirements

hipTensor currently supports the following AMDGPU architectures:

  • CDNA class GPU featuring matrix core support: gfx908, gfx90a, gfx942, gfx950 as 'gfx9'.
  • RDNA class GPU featuring matrix core support: gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151, gfx1152, gfx1153, gfx1200 and gfx1201.

Note

Double precision FP64 datatype support requires gfx90a, gfx942 or gfx950.

Dependencies:

  • Minimum ROCm version support is 7.0.
  • Minimum cmake version support is 3.14.
  • Minimum ROCm-cmake version support is 0.8.0.
  • Minimum Composable Kernel version support is composable_kernel 1.1.0 for ROCm 6.0.2 (or ROCm package composablekernel-dev).
  • Minimum HIP runtime version support is 4.3.0 (or ROCm package ROCm hip-runtime-amd).
  • Minimum LLVM dev package version support is 7.0 (available as ROCm package rocm-llvm-dev).

Optional:

  • doxygen (for building documentation)

Build with CMake

For more detailed information, please refer to the hipTensor installation guide.

Project options

Option Description Default Value
GPU_TARGETS Build code for specific GPU target(s) gfx908;gfx90a;gfx942;gfx950;gfx11-generic;gfx12-generic
HIPTENSOR_BUILD_TESTS Build the tests ON
HIPTENSOR_BUILD_SAMPLES Build the samples ON
HIPTENSOR_BUILD_COMPRESSED_DBG Enable compressed debug symbols ON
HIPTENSOR_DEFAULT_STRIDES_COL_MAJOR Set the hipTensor default data layout to column major ON
HIPTENSOR_INLINE_UNARY_OPS Inline all unary ops for best runtime performance (slower compilation) OFF

Example configurations

By default, the project is configured as Release mode. Here are some of the examples for the configuration:

Configuration Command
Basic CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> .
Targeting gfx908 CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DGPU_TARGETS=gfx908
Debug build CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DCMAKE_BUILD_TYPE=Debug

After configuration, build with cmake --build <build_dir> -- -j<nproc>.

Documentation

For more comprehensive documentation on installation, samples and test contents, API reference and programmer's guide you can build the documentation locally using the following commands:

cd docs

pip3 install -r sphinx/requirements.txt

python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html

The HTML documentation can be viewed in your browser by opening the docs/_build/html/index.html result.

The latest official documentation for hipTensor is available at: https://rocm.docs.amd.com/projects/hipTensor/en/latest/index.html.

Contributing to the hipTensor Library

Community collaboration is encouraged! If you are considering contributing, please follow the hipTensor Contribution Guide to get started.