This README covers NVTX topics specific to C/C++. For general NVTX information, see the README in the NVTX repo root.
The NVTX API is written in C, and the NVTX C++ API is implemented as wrappers for parts of the C API. In C++, both the NVTX C and NVTX C++ APIs can be used.
The NVTX C and C++ header files include Doxygen comments to provide reference documentation. The API references were generated from these Doxygen comments.
This C++ example annotates some_function with a Push/Pop range using the function's name. This range begins at the top of the function body, and automatically ends when the function returns. The function performs a loop, sleeping for one second in each iteration. A local nvtx3::scoped_range annotates the scope of the loop body with a Push/Pop range. The loop iteration ranges are nested within the function range.
#include <nvtx3/nvtx3.hpp>
void some_function()
{
NVTX3_FUNC_RANGE(); // Range around the whole function
for (int i = 0; i < 6; ++i) {
nvtx3::scoped_range loop{"loop range"}; // Range for iteration
// Make each iteration last for one second
std::this_thread::sleep_for(std::chrono::seconds{1});
}
}Normally, this program waits for 6 seconds, and does nothing else.
Launch it from NVIDIA Nsight Systems, and you'll see this execution on a timeline:
The NVTX row shows the function's name "some_function" in the top-level range and the "loop range" message in the nested ranges. The loop iterations each last for the expected one second.
Using the NVTX C API, the following example would produce the same timeline:
#include <nvtx3/nvToolsExt.h>
void some_function()
{
nvtxRangePush(__func__); // Range around the whole function
for (int i = 0; i < 6; ++i) {
nvtxRangePush("loop range"); // Range for iteration
// Make each iteration last for one second
std::this_thread::sleep_for(std::chrono::seconds{1});
nvtxRangePop(); // End the inner range
}
nvtxRangePop(); // End the outer range
}If the function gets to a return or throw before the nvtxRangePop call, the range will be left unclosed, and tool behavior is undefined for this case. Using the C++ API is safer, since the local scoped_range variable calls nvtxRangePop in its destructor.
C:
nvtxMark("This is a marker");C++:
nvtx3::mark("This is a marker");C:
nvtxRangePush("This is a push/pop range");
// Do something interesting in the range
nvtxRangePop(); // Pop must be on same thread as corresponding PushC++:
{
nvtx3::scoped_range range("This is a push/pop range");
// Do something interesting in the range
// Range is popped when scoped_range object goes out of scope
}C:
// Somewhere in the code:
nvtxRangeHandle_t handle = nvtxRangeStart("This is a start/end range");
// Somewhere else in the code, not necessarily same thread as Start call:
nvtxRangeEnd(handle);C++:
// Automatically start and end a range around an object's lifetime:
class SomeResource // movable, but not copyable
{
// class members, methods, etc.
// Range starts at construction, ends at destruction
nvtx3::unique_range range(objectInstanceName);
};The NVTX C API is used for all resource naming.
// Name the current CPU thread
nvtxNameOsThread(pthread_self(), "Network I/O");// Name CUDA streams
cudaStream_t graphicsStream, aiStream;
cudaStreamCreate(&graphicsStream);
cudaStreamCreate(&aiStream);
nvtxNameCudaStreamA(graphicsStream, "Graphics");
nvtxNameCudaStreamA(aiStream, "AI");NVTX is thread safe. All NVTX functions can be called concurrently, including initialization, both for C and C++.
Sanitizers may report conflicts in nvtxImplCore.h / nvtxInitDefs.h. This is due to optimizations to avoid memory barriers in the hot path. The implementation ensures that all race outcomes lead to the same result. The race condition is benign: the worst case is seeing an old initialization function pointer, which will simply trigger re-initialization that immediately detects completion.
Tools should be implemented in a thread-safe way. They should assume that any function may be called concurrently. Setting callback function pointers should be done atomically, but relaxed memory consistency is sufficient for correctness.
For C and C++, NVTX is a header-only library with no dependencies. Simply #include the header(s) you want to use, and call NVTX functions! NVTX initializes automatically during the first call to any NVTX function.
It is not necessary to link against a binary library or add any link-time parameters. On older POSIX platforms with glibc versions prior to 2.34, adding the -ldl option to the linker command is required.
NOTE: Older versions of NVTX did require linking against a dynamic library. NVTX version 3 provides the same API, but removes the need to link with any library. Ensure you are including NVTX v3 by using the nvtx3 directory as a prefix in your #includes:
C:
#include <nvtx3/nvToolsExt.h>C++:
#include <nvtx3/nvtx3.hpp>Since the C and C++ APIs are header-only, dependency-free, and don't require explicit initialization, they are suitable for annotating other header-only libraries.
For projects that use CMake, the included CMakeLists.txt provides targets nvtx3-c and nvtx3-cpp that set the include search paths (and add the -ldl linker option if applicable).
Suppose your project layout looks like the following:
CMakeLists.txt
imports/
CMakeLists.txt
Other 3rd party libraries here...
NVTX/ (a copy of this directory from github)
CMakeLists.txt
include/
nvtx3/
(all NVTX v3 headers here)
source/
CMakeLists.txt
main.cpp
The root CMakeLists.txt file contains:
add_subdirectory(imports)
add_subdirectory(source)The imports/CMakeLists.txt file contains:
add_subdirectory(NVTX)
add_subdirectory(...) # Other imported librariesThe source/CMakeLists.txt file can now use CMake targets defined by NVTX:
add_executable(my_program main.cpp)
target_link_libraries(my_program PRIVATE nvtx3-cpp)CMake Package Manager (CPM) is a utility that automatically downloads dependencies when CMake first runs on a project. Since NVTX v3 is just a few headers, the download will be fast. The downloaded files can be stored in an external cache directory to avoid redownloading during clean builds, and to enable offline builds. First, download CPM.cmake from CPM's repo and save it in your project. Then you can fetch NVTX directly from GitHub with CMake code like this (CMake 3.14 or greater is required):
include(path/to/CPM.cmake)
CPMAddPackage(
NAME NVTX
GITHUB_REPOSITORY NVIDIA/NVTX
GIT_TAG release-v3
SOURCE_SUBDIR c
)
add_executable(some_c_program main.c)
target_link_libraries(some_c_program PRIVATE nvtx3-c)
add_executable(some_cpp_program main.cpp)
target_link_libraries(some_cpp_program PRIVATE nvtx3-cpp)The NVTX C API is a header-only library, implemented using standard C89. The headers can be compiled with -std=gnu90 or newer using many common compilers. Tested compilers include:
- GNU gcc
- clang
- Microsoft Visual C++
- NVIDIA nvcc
C89 support in these compilers has not changed in many years, so even very old compiler versions should work.
Using different versions of the NVTX headers in the same translation unit or different translation units is supported, as long as best practices are followed.
The convenience macros in nvToolsExtPayloadHelper.h (NVTX_DEFINE_SCHEMA_FOR_STRUCT, NVTX_DEFINE_STRUCT_WITH_SCHEMA, NVTX_DEFINE_STRUCT_WITH_SCHEMA_AND_REGISTER, NVTX_DEFINE_SCHEMA_FOR_STRUCT_AND_REGISTER, and NVTX_DEFINE_STRUCT) rely on variadic macro argument counting, which requires a standards-conforming preprocessor. Microsoft Visual C++'s traditional preprocessor does not expand __VA_ARGS__ correctly for these patterns.
To use these macros with MSVC, enable the conforming preprocessor:
- Visual Studio 2019 and newer:
/Zc:preprocessor - Visual Studio 2017 (v15.5+):
/experimental:preprocessor
Visual Studio versions older than 2017 do not support the conforming preprocessor and cannot use these macros. GCC, Clang, and other compilers with conforming preprocessors work without any additional flags.
The NVTX C++ API is a header-only library, implemented as a wrapper over the NVTX C API, using standard C++11. The C++ headers are provided alongside the C headers. NVTX C++ is implemented , and can be compiled with -std=c++11 or newer using many common compilers. Tested compilers include:
- GNU g++ (4.8.5 to 11.1)
- clang (3.5.2 to 12.0)
- Microsoft Visual C++ (VS 2015 to VS 2022)
- On VS 2017.7 and newer, NVTX enables better error message output
- NVIDIA nvcc (CUDA 7.0 and newer)
Minor versions of NVTX releases may introduce new features into the nvtx3::v1 namespace.
To use these features, ensure that within each compilation unit, the first inclusion of nvtx3.hpp is based at least on the latest required release.
If an older version is included first, the new features will not be available.
It is supported to link together multiple minor versions of NVTX in different objects.
For maximum compatibility in header-only libraries or other scenarios with complex NVTX dependencies, use symbols of a specific major version, e.g. nvtx3::v1::domain.
- v3.3: Add
payload_datawrapper fornvtxPayloadData_tin support of extended payloads.
