Skip to content

Enable lazy loading for cuStreamWriteValue32#4196

Merged
wujingyue merged 5 commits intomainfrom
bug3907
Apr 10, 2025
Merged

Enable lazy loading for cuStreamWriteValue32#4196
wujingyue merged 5 commits intomainfrom
bug3907

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Apr 4, 2025

Fixes #3907

cc @samnordmann

@wujingyue wujingyue linked an issue Apr 4, 2025 that may be closed by this pull request
@wujingyue
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Apr 4, 2025

Review updated until commit 0ab8092

Description

  • Added test for cuStreamWriteValue32

  • Updated driver_api.h to include cuStreamWriteValue32_v2

  • Included new test file in CMakeLists.txt


Changes walkthrough 📝

Relevant files
Tests
test_driver_api.cpp
Added test for cuStreamWriteValue32                                           

tests/cpp/test_driver_api.cpp

  • Added a new test case for cuStreamWriteValue32
  • Included necessary headers and setup for CUDA testing
  • +39/-0   
    Enhancement
    driver_api.h
    Updated driver API wrapper                                                             

    csrc/driver_api.h

    • Added cuStreamWriteValue32_v2 to the list of wrapped functions
    +1/-0     
    Configuration changes
    CMakeLists.txt
    Added new test file to CMake                                                         

    CMakeLists.txt

    • Included new test file test_driver_api.cpp in the build list
    +1/-0     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    API Version

    The PR introduces cuStreamWriteValue32_v2 but the test uses cuStreamWriteValue32. Ensure that the correct API version is being used in the test.

    fn(cuStreamWriteValue32_v2);       \
    Data Type Mismatch

    The test uses sizeof(int32_t) in cudaMemcpyAsync instead of sizeof(uint32_t). This could lead to undefined behavior.

    &value_received, d_ptr, sizeof(int32_t), cudaMemcpyDeviceToHost, stream));

    @wujingyue wujingyue requested a review from zasdfgbnm April 4, 2025 22:56
    @wujingyue
    Copy link
    Collaborator Author

    !test

    Copy link
    Collaborator

    @samnordmann samnordmann left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    can we also make sure that cuMemGetAddressRange and cuStreamWaitValue32 work properly?

    #4197 (comment)

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue wujingyue merged commit 640e5a4 into main Apr 10, 2025
    45 of 46 checks passed
    @wujingyue wujingyue deleted the bug3907 branch April 10, 2025 00:54
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    Error with driver API's lazy load of cuStream ops

    3 participants