third-party: Improvements to NVSHMEM Integration by seth-howell · Pull Request #295 · deepseek-ai/DeepEP

seth-howell · 2025-07-12T00:41:01Z

Use upstream NVSHMEM binaries when building DeepEP.
Add back support for CPU-Assisted IBGDA.
Remove the nvshmem host-side patch.

NVSHMEM 3.3 and above support the host-side features in the patch. Note: Removed recv queue support Signed-off-by: Seth Howell <sethh@nvidia.com>

This allows users to use NVSHMEM without setting the driver regkey. Signed-off-by: Seth Howell <sethh@nvidia.com>

Signed-off-by: Seth Howell <sethh@nvidia.com>

third-party/nvshmem.patch

third-party/README.md

Skylion007 · 2025-07-12T18:12:56Z

setup.py

-        nvcc_dlink.extend(['-dlink', f'-L{nvshmem_dir}/lib', '-lnvshmem'])
-        extra_link_args.extend(['-l:libnvshmem.a', '-l:nvshmem_bootstrap_uid.so', f'-Wl,-rpath,{nvshmem_dir}/lib'])
+        nvcc_dlink.extend(['-dlink', f'-L{nvshmem_dir}/lib', '-lnvshmem_device'])
+        extra_link_args.extend(['-l:libnvshmem_host.so', '-l:libnvshmem_device.a', f'-Wl,-rpath,{nvshmem_dir}/lib'])


Quality of life feature, since we use unedited nvshmem binaries now, why don't we just we have NVSHMEM dir in setup.py try to find it from the nvshmem wheel if not specified with an import nvshmem; nvshmem.__file__? Would simplify compilation a lot

I was validating this on my local system, and it seems that some NVIDIA wheels which only include C++ binaries (NVSHMEM, NCCL, etc.) don't have __init__.py files in them so it's impossible to do this right now (nvshmem.__file__ is None). I've raised a ticket internally to fix that and as soon as that is up, can push another change to update setup.py.

Disregard, I got a little extra guidance on how we're expected to do this with namespace packages.

setup.py

This will give consumers an opportunity to update their builds. Signed-off-by: Seth Howell <sethh@nvidia.com>

Signed-off-by: Seth Howell <sethh@nvidia.com>

Responding to review comments. Signed-off-by: Seth Howell <sethh@nvidia.com>

This enables the CPU-Assisted data path. Signed-off-by: Seth Howell <sethh@nvidia.com>

Signed-off-by: Seth Howell <sethh@nvidia.com>

sphish · 2025-07-16T05:51:51Z

LGTM, any suggestion? @youkaichao

youkaichao · 2025-07-16T06:07:12Z

LGTM now, thanks!

ishandhanani · 2025-07-31T14:54:28Z

@youkaichao - we're having some trouble when running SGLang + DeepEP after this version bump. Specifically we see some cuMemCreate failed errors. Curious on if you've seen that recently after this PR?

Resolved by fixing cuda graph bs in sglang

youkaichao · 2025-08-07T09:24:31Z

we're having some trouble when running SGLang + DeepEP after this version bump. Specifically we see some cuMemCreate failed errors. Curious on if you've seen that recently after this PR?

sorry i have no ideas

alpha-baby · 2025-08-08T04:13:25Z

@youkaichao - we're having some trouble when running SGLang + DeepEP after this version bump. Specifically we see some cuMemCreate failed errors. Curious on if you've seen that recently after this PR?-在此版本升级后，我们在运行 SGLang+DeepEP 时遇到了一些问题。具体来说，我们看到了一些#0 错误。想知道在这次公关之后，你最近是否看到了这一点？

can you show detail log?

HPC4AI · 2025-10-28T13:33:03Z

@youkaichao - we're having some trouble when running SGLang + DeepEP after this version bump. Specifically we see some cuMemCreate failed errors. Curious on if you've seen that recently after this PR?

Resolved by fixing cuda graph bs in sglang

Hello, I’m also encountering the same issue. Could you share how you resolved it? Why does CUDA Graph cause this error?
Thanks

ishandhanani · 2025-11-29T18:45:33Z

@youkaichao - we're having some trouble when running SGLang + DeepEP after this version bump. Specifically we see some cuMemCreate failed errors. Curious on if you've seen that recently after this PR?
Resolved by fixing cuda graph bs in sglang

Hello, I’m also encountering the same issue. Could you share how you resolved it? Why does CUDA Graph cause this error? Thanks

NVSHMEM during init allocs memory. CUDA graph was taking too much mem causing the fail

seth-howell added 3 commits July 11, 2025 17:30

third-party: Update tests to use upstream NVSHMEM

441833d

NVSHMEM 3.3 and above support the host-side features in the patch. Note: Removed recv queue support Signed-off-by: Seth Howell <sethh@nvidia.com>

third-party: Add CPU-assisted IBGDA support

aa3187e

This allows users to use NVSHMEM without setting the driver regkey. Signed-off-by: Seth Howell <sethh@nvidia.com>

third-party: Update readme to reflect new features.

69f9dfe

Signed-off-by: Seth Howell <sethh@nvidia.com>

youkaichao reviewed Jul 12, 2025

View reviewed changes

third-party/nvshmem.patch Show resolved Hide resolved

youkaichao reviewed Jul 12, 2025

View reviewed changes

third-party/README.md Show resolved Hide resolved

youkaichao reviewed Jul 12, 2025

View reviewed changes

third-party/README.md Show resolved Hide resolved

Skylion007 reviewed Jul 12, 2025

View reviewed changes

sphish reviewed Jul 14, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

setup.py Outdated Show resolved Hide resolved

seth-howell mentioned this pull request Jul 14, 2025

Add NVSHMEM to PYTORCH_EXTRA_INSTALL_REQUIREMENTS pytorch/pytorch#154568

Closed

seth-howell force-pushed the main branch from 6886dd2 to 0d1441e Compare July 15, 2025 03:28

seth-howell added 5 commits July 14, 2025 23:42

third-party: Add back nvshmem.patch.

b79ca2b

This will give consumers an opportunity to update their builds. Signed-off-by: Seth Howell <sethh@nvidia.com>

setup.py: Clean up some extra prints.

903711c

Signed-off-by: Seth Howell <sethh@nvidia.com>

setup.py: Add logic for detecting library locations from NVSHMEM wheels.

2a87339

Signed-off-by: Seth Howell <sethh@nvidia.com>

setup.py: Remove nvcc_dlink specific gencode

35e1cd1

Responding to review comments. Signed-off-by: Seth Howell <sethh@nvidia.com>

buffer.py: Do not force the NIC handler to GPU.

e6b4f52

This enables the CPU-Assisted data path. Signed-off-by: Seth Howell <sethh@nvidia.com>

seth-howell force-pushed the main branch from 306eb5c to e6b4f52 Compare July 15, 2025 06:42

third-party: Add link to blog post on CPU-Assisted IBGDA.

c5d2202

Signed-off-by: Seth Howell <sethh@nvidia.com>

sphish approved these changes Jul 16, 2025

View reviewed changes

sphish merged commit b6ce310 into deepseek-ai:main Jul 16, 2025

Conversation

seth-howell commented Jul 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Skylion007 Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

seth-howell Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seth-howell Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

seth-howell Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sphish commented Jul 16, 2025

Uh oh!

youkaichao commented Jul 16, 2025

Uh oh!

ishandhanani commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Aug 7, 2025

Uh oh!

alpha-baby commented Aug 8, 2025

Uh oh!

HPC4AI commented Oct 28, 2025

Uh oh!

ishandhanani commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

seth-howell Jul 14, 2025 •

edited

Loading

ishandhanani commented Jul 31, 2025 •

edited

Loading