Skip to content

[Bug]: Mooncake NvlinkTransport cuMemCreate failed #1297

@lyh552506

Description

@lyh552506

Bug Report

Hi,

Recently, I encountered an issue where cuMemCreate fails when running on an NVIDIA B300 GPU using NvlinkTransport.

I am using Mooncake with its NVLink transport enabled. The project is compiled with the flag -DUSE_MNNVL=ON, and the environment variable MC_FORCE_MNNVL=0 is set. During execution, the program fails at the following code path:
https://github.com/kvcache-ai/Mooncake/blob/main/mooncake-transfer-engine/src/transport/nvlink_transport/nvlink_transport.cpp#L611-L617

The error message is:

NvlinkTransport: cuMemCreate failed: 800

error code 800 corresponds to CUDA_ERROR_NOT_PERMITTED.

I have checked previous issue like #965 , but it did not resolve my problem. I am wondering if there are any insights into the cause of this error.

I have tried CUDA versions 12.9 and 13.1, but the issue persists in both cases.

Before submitting...

  • Ensure you searched for relevant issues and read the [documentation]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions