RawKernel is very useful, but currently there appears to be no interface to set the CUDA function attributes.
This is necessary in order to use more than 48KB of shared memory per block in newer GPUs, or to specify the ratio of shared memory to L1 cache (ie, the shared memory "carveout").
Overview of shared memory configuration:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory-7-x
cudaFuncSetAttribute documentation:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__HIGHLEVEL.html#group__CUDART__HIGHLEVEL_1g422642bfa0c035a590e4c43ff7c11f8d
RawKernel is very useful, but currently there appears to be no interface to set the CUDA function attributes.
This is necessary in order to use more than 48KB of shared memory per block in newer GPUs, or to specify the ratio of shared memory to L1 cache (ie, the shared memory "carveout").
Overview of shared memory configuration:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory-7-x
cudaFuncSetAttribute documentation:
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__HIGHLEVEL.html#group__CUDART__HIGHLEVEL_1g422642bfa0c035a590e4c43ff7c11f8d