Check error status after hipLaunchKernelGGL calls#686
Check error status after hipLaunchKernelGGL calls#686kiritigowda merged 13 commits intoROCm:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 82 out of 83 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…zacharyvincze/rpp into zv/chore/check-hip-kernel-launch-status
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 83 out of 84 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| kernelSize, | ||
| roiTensorPtrSrc, | ||
| roiType, | ||
| handle); | ||
| HIP_CHECK_LAUNCH_RETURN(); | ||
| } |
There was a problem hiding this comment.
HIP_CHECK_LAUNCH_RETURN() can return early from this function, which will bypass the hipFree(tempPtr) cleanup later in the HIP path when tempPtr was allocated for RGB->greyscale conversion. Consider restructuring so kernel-launch error handling still frees tempPtr (e.g., store status then jump to a common cleanup block).
There was a problem hiding this comment.
There are other cases where tempPtr will not be freed in the case of an early return. For example, if the else branch is taken, RPP_ERROR_NOT_IMPLEMENTED will be returned without freeing potentially allocated memory.
This issue is outside the scope of this PR. But tempPtr will need proper RAII wrappers in another PR.
There was a problem hiding this comment.
@zacharyvincze can you file a separate issue to capture this and we can merge this PR
There was a problem hiding this comment.
@kiritigowda I've filed an issue regarding the memory management issues here: #687. Also, moved the kernel launch checks to sobbel_filter.cpp where the kernel is actually called, rather than the call site in rppt_tensor_filter_augmentations.cpp.
|
Version update required |
Description
RPP_ERROR_GPUto signify an internal HIP error during kernel launches.rppdefs.h,HIP_CHECK_LAUNCH_RETURN()which can be added after kernel launches to check the last error and returnRppStatus::RPP_ERROR_GPUif a failure has occurred. This macro is defined conditionally, only if the HIP backend is enabled.Notes
CHECK_RETURN_STATUSbeing used in RPP kernel launches. This macro is used to check HIP runtime calls such as stream synchronization, memory allocation, etc, but will immediately callexit(-1)and halt the application instead of propagating the error viaRppStatusand letting the end user handle/recover from it.