Commit 3d3428e
Use cudaRuntimeGetVersion for cudaMemcpyBatchAsync ABI dispatch
Addresses the review on #23172:
#23172 (comment)
cudaMemcpyBatchAsync is a libcudart (runtime) symbol; the ABI of the
function dlsym'd into this process is owned by the libcudart that's
actually loaded, not by the host's kernel driver. Dispatching on
cudaDriverGetVersion() breaks in the common container case where a
cu12 runtime is paired with a cu13-capable host driver: driver=13000
steers us to the 8-param v13 call, but the symbol resolves to v12
(9 params with failIdx), so the stream argument lands in a wrong slot
and we segfault — the exact crash this fix was supposed to prevent.
Reproduced on ion-user-9 with lmsysorg/sglang:dev (cu12.9 runtime):
cudaDriverGetVersion() = 13000
cudaRuntimeGetVersion() = 12090
v12 dispatch of dlsym'd symbol: cudaSuccess, exit 0
v13 dispatch of dlsym'd symbol: Segmentation fault (core dumped)
Switching the signature-selection to cudaRuntimeGetVersion makes the
choice follow the loaded libcudart, which is what actually determines
the ABI. The existing cudaDriverGetVersion guard above is kept — it
remains the right knob for the capability check since cudaMemcpyBatch
requires a 12.8+ driver regardless of the runtime version.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent eba2695 commit 3d3428e
1 file changed
Lines changed: 13 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
812 | 812 | | |
813 | 813 | | |
814 | 814 | | |
815 | | - | |
816 | | - | |
817 | | - | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
818 | 828 | | |
819 | 829 | | |
820 | 830 | | |
| |||
0 commit comments