-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Hi, I have questions about nvbit_get_related_functions().
Example app used
vectoradd code provided by the release, but modified to add another kernel and a function called from this kernel like below:
__device__ __noinline__ void vecAddMul2_another(double *c, int n) {
int id = blockIdx.x * blockDim.x + threadIdx.x;
if (id < n) c[id] *= 2;
}
__global__ void vecAddMul2(double *a, double *b, double *c, int n) {
int id = blockIdx.x * blockDim.x + threadIdx.x;
if (id < n) c[id] = 2 * (a[id] + b[id]);
vecAddMul2_another(c, n);
}
Issue
I think nvbit_get_related_functions() should return function vecAddMul2_another for the kernel vecAddMul2, however, it didn't.
In instr_count_bb/instr_count.cu, I printed out the related_functions.size() after calling nvbit_get_related_functions() and checked the basic blocks. It will show 0 for the size of the related functions vector, and basic blocks of vecAddMul2_another will be embedded in the basic blocks list for vecAddMul2. At first, I thought it is because the compiler made the function inlined, even though I provided __noinline__ keyword. However, with cuobjdump and nvdisasm, I confirmed that it did not get inlined.
Setup
Driver Version: 555.42.02
CUDA Version: 12.5
Compute Capability: 8.0
Could you please let me know the potential reasons behind this and how should I fix this, or if I am understanding the usage of nvbit_get_related_functions() wrong?
Thanks!