Bug Report
Hi,
Recently, I encountered an issue where cuMemCreate fails when running on an NVIDIA B300 GPU using NvlinkTransport.
I am using Mooncake with its NVLink transport enabled. The project is compiled with the flag -DUSE_MNNVL=ON, and the environment variable MC_FORCE_MNNVL=0 is set. During execution, the program fails at the following code path:
https://github.com/kvcache-ai/Mooncake/blob/main/mooncake-transfer-engine/src/transport/nvlink_transport/nvlink_transport.cpp#L611-L617
The error message is:
NvlinkTransport: cuMemCreate failed: 800
error code 800 corresponds to CUDA_ERROR_NOT_PERMITTED.
I have checked previous issue like #965 , but it did not resolve my problem. I am wondering if there are any insights into the cause of this error.
I have tried CUDA versions 12.9 and 13.1, but the issue persists in both cases.
Before submitting...
Bug Report
Hi,
Recently, I encountered an issue where cuMemCreate fails when running on an NVIDIA B300 GPU using NvlinkTransport.
I am using Mooncake with its NVLink transport enabled. The project is compiled with the flag
-DUSE_MNNVL=ON, and the environment variableMC_FORCE_MNNVL=0is set. During execution, the program fails at the following code path:https://github.com/kvcache-ai/Mooncake/blob/main/mooncake-transfer-engine/src/transport/nvlink_transport/nvlink_transport.cpp#L611-L617
The error message is:
error code 800 corresponds to CUDA_ERROR_NOT_PERMITTED.
I have checked previous issue like #965 , but it did not resolve my problem. I am wondering if there are any insights into the cause of this error.
I have tried CUDA versions 12.9 and 13.1, but the issue persists in both cases.
Before submitting...