Skip to content

Check error status after hipLaunchKernelGGL calls#686

Merged
kiritigowda merged 13 commits intoROCm:developfrom
zacharyvincze:zv/chore/check-hip-kernel-launch-status
Mar 18, 2026
Merged

Check error status after hipLaunchKernelGGL calls#686
kiritigowda merged 13 commits intoROCm:developfrom
zacharyvincze:zv/chore/check-hip-kernel-launch-status

Conversation

@zacharyvincze
Copy link
Copy Markdown
Contributor

Description

  • Addresses [Issue]: HIP Backend - Gated HIP Launch #618
  • Adds error checking after HIP kernel launches to propagate errors via RppStatus and halt the operator.
  • New RppStatus type added: RPP_ERROR_GPU to signify an internal HIP error during kernel launches.
  • Added macro to rppdefs.h, HIP_CHECK_LAUNCH_RETURN() which can be added after kernel launches to check the last error and return RppStatus::RPP_ERROR_GPU if a failure has occurred. This macro is defined conditionally, only if the HIP backend is enabled.

Notes

  • Concerns about use of CHECK_RETURN_STATUS being used in RPP kernel launches. This macro is used to check HIP runtime calls such as stream synchronization, memory allocation, etc, but will immediately call exit(-1) and halt the application instead of propagating the error via RppStatus and letting the end user handle/recover from it.
    • Outside of the scope of this PR, as this PR modifies enough files as is.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@zacharyvincze zacharyvincze requested a review from Copilot March 5, 2026 18:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 95.16484% with 22 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/modules/tensor/hip/kernel/slice.cpp 41.67% 7 Missing ⚠️
api/rppdefs.h 50.00% 4 Missing ⚠️
src/modules/tensor/hip/kernel/to_decibels.cpp 40.00% 3 Missing ⚠️
src/modules/tensor/hip/kernel/flip_voxel.cpp 50.00% 2 Missing ⚠️
src/modules/tensor/hip/kernel/noise_gaussian.cpp 75.00% 2 Missing ⚠️
src/include/common/hip/rpp_hip_roi_conversion.hpp 50.00% 1 Missing ⚠️
src/modules/tensor/hip/kernel/log.cpp 75.00% 1 Missing ⚠️
src/modules/tensor/hip/kernel/log1p.cpp 66.67% 1 Missing ⚠️
src/modules/tensor/hip/kernel/resample.cpp 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #686      +/-   ##
===========================================
+ Coverage    92.47%   92.49%   +0.02%     
===========================================
  Files          202      202              
  Lines        91272    91730     +458     
===========================================
+ Hits         84396    84840     +444     
- Misses        6876     6890      +14     
Files with missing lines Coverage Δ
src/modules/tensor/hip/kernel/add_scalar.cpp 100.00% <100.00%> (ø)
src/modules/tensor/hip/kernel/bitwise_and.cpp 98.63% <100.00%> (+0.08%) ⬆️
src/modules/tensor/hip/kernel/bitwise_not.cpp 98.53% <100.00%> (+0.09%) ⬆️
src/modules/tensor/hip/kernel/bitwise_or.cpp 98.63% <100.00%> (+0.08%) ⬆️
src/modules/tensor/hip/kernel/bitwise_xor.cpp 98.61% <100.00%> (+0.08%) ⬆️
src/modules/tensor/hip/kernel/blend.cpp 98.70% <100.00%> (+0.07%) ⬆️
src/modules/tensor/hip/kernel/box_filter.cpp 99.66% <100.00%> (+0.02%) ⬆️
src/modules/tensor/hip/kernel/brightness.cpp 98.70% <100.00%> (+0.07%) ⬆️
src/modules/tensor/hip/kernel/channel_dropout.cpp 98.57% <100.00%> (+0.11%) ⬆️
src/modules/tensor/hip/kernel/channel_permute.cpp 100.00% <100.00%> (ø)
... and 74 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kiritigowda kiritigowda self-assigned this Mar 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 82 out of 83 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 83 out of 84 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 592 to 597
kernelSize,
roiTensorPtrSrc,
roiType,
handle);
HIP_CHECK_LAUNCH_RETURN();
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIP_CHECK_LAUNCH_RETURN() can return early from this function, which will bypass the hipFree(tempPtr) cleanup later in the HIP path when tempPtr was allocated for RGB->greyscale conversion. Consider restructuring so kernel-launch error handling still frees tempPtr (e.g., store status then jump to a common cleanup block).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other cases where tempPtr will not be freed in the case of an early return. For example, if the else branch is taken, RPP_ERROR_NOT_IMPLEMENTED will be returned without freeing potentially allocated memory.

This issue is outside the scope of this PR. But tempPtr will need proper RAII wrappers in another PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zacharyvincze can you file a separate issue to capture this and we can merge this PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiritigowda I've filed an issue regarding the memory management issues here: #687. Also, moved the kernel launch checks to sobbel_filter.cpp where the kernel is actually called, rather than the call site in rppt_tensor_filter_augmentations.cpp.

@kiritigowda
Copy link
Copy Markdown
Collaborator

Version update required

@zacharyvincze zacharyvincze requested a review from a team as a code owner March 17, 2026 21:07
@kiritigowda kiritigowda merged commit 3ab5ec0 into ROCm:develop Mar 18, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants