Sobel Filter on HOST and HIP#673
Conversation
r-abishek
commented
Feb 18, 2026
- Adds Sobel Filter for tensor API on HOST and HIP
- Adds support for U8/FF16/F32/I8 and NHWC/NCHW variants
- Adds relevant unit / performance / qa tests
…ferent conditions
Sobel filter Kernel implementation
There was a problem hiding this comment.
Pull request overview
This PR adds Sobel Filter support for tensor API on both HOST and HIP backends. The implementation supports multiple data types (U8/F16/F32/I8) and layouts (NHWC/NCHW) with three kernel sizes (3x3, 5x5, 7x7) and three gradient types (X, Y, XY).
Changes:
- Added
rppt_sobel_filterAPI function with comprehensive parameter validation - Implemented CPU and HIP kernels for Sobel filter with AVX2 optimizations for HOST
- Added test infrastructure including unit tests, performance tests, and QA tests
Reviewed changes
Copilot reviewed 12 out of 39 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| api/rppt_tensor_filter_augmentations.h | Added API declaration and documentation for rppt_sobel_filter |
| src/modules/tensor/rppt_tensor_filter_augmentations.cpp | Implemented main function with RGB to grayscale conversion support |
| src/modules/tensor/cpu/kernel/sobel_filter.cpp | CPU kernel implementation with AVX2 SIMD optimizations |
| src/include/tensor/host_tensor_executors.hpp | Added host executor function declaration |
| src/include/tensor/hip_tensor_executors.hpp | Added HIP executor function declaration |
| utilities/test_suite/rpp_test_suite_image.h | Added test infrastructure support and helper functions |
| utilities/test_suite/common.py | Added sobel_filter to augmentation mappings |
| utilities/test_suite/HOST/runImageTests.py | Added test cases for HOST backend |
| utilities/test_suite/HIP/runImageTests.py | Added test cases for HIP backend |
| utilities/test_suite/HOST/Tensor_image_host.cpp | Added HOST test executable support |
| utilities/test_suite/HIP/Tensor_image_hip.cpp | Added HIP test executable support |
| docs/data/doxygenOutputs/*.png | Added documentation images for all gradient types and kernel sizes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| Rpp32f *rowTemp, *rowTempRef, *outVal, *outRefVal, *outputTemp, *outputTempRef, *outputTempChn, *outputTempRefChn; | ||
| Rpp32f cutoff = (testCase == LENS_CORRECTION) ? 1e-4 : 1e-5; | ||
| Rpp32f cutoff = ((testCase == LENS_CORRECTION)|| (testCase == SOBEL_FILTER)) ? 1e-4 : 1e-5; |
There was a problem hiding this comment.
Inconsistent spacing around '||' operator. The convention in this file shows spacing on both sides of boolean operators (see line 1103). Add a space after ')' and before '||' for consistency.
| } | ||
| } | ||
|
|
||
| // returns the interpolation type used for image resizing or scaling operations. |
There was a problem hiding this comment.
The comment "returns the interpolation type used for image resizing or scaling operations." is incorrect for this function. This function actually returns a string combining kernel size and gradient type for the Sobel filter, not interpolation types. The comment should be updated to accurately describe the function's purpose, such as "returns the kernel size and gradient type for sobel filter operations."
| } | ||
| #ifdef GPU_SUPPORT | ||
| else if ((handleBackend == RppBackend::RPP_HIP_BACKEND) && (executionBackend == RppBackend::RPP_HIP_BACKEND)) | ||
| { |
There was a problem hiding this comment.
Missing offset validation check for HIP backend. Similar filter operations (like median_filter and gaussian_filter) validate that srcDescPtr->offsetInBytes is sufficient for the kernel size with a check like: if (srcDescPtr->offsetInBytes < 12 * (kernelSize / 2)) return RPP_ERROR_LOW_OFFSET;. This validation should be added after line 558 to ensure the offset can accommodate the kernel padding requirements.
| { | |
| { | |
| if (srcDescPtr->offsetInBytes < 12 * (kernelSize / 2)) | |
| return RPP_ERROR_LOW_OFFSET; |
| Rpp32u kernelSize, gradientType; | ||
| get_kernel_size_and_gradient_type(additionalParam, kernelSize, gradientType); | ||
|
|
||
| func += kernelSizeAndGradientName; |
There was a problem hiding this comment.
Trailing whitespace after the variable declaration. Remove the extra space after the semicolon.
| #ifdef GPU_SUPPORT | ||
| else if ((handleBackend == RppBackend::RPP_HIP_BACKEND) && (executionBackend == RppBackend::RPP_HIP_BACKEND)) | ||
| { | ||
| RpptDescPtr inputDesc = srcDescPtr; |
There was a problem hiding this comment.
Trailing whitespace after the declaration. Remove the extra space after the semicolon for consistency.
| RpptDescPtr inputDesc = srcDescPtr; | |
| RpptDescPtr inputDesc = srcDescPtr; |
| size_t dataSize = dstDescPtr->strides.nStride * dstDescPtr->n * elementSize; | ||
|
|
||
| CHECK_RETURN_STATUS(hipMalloc(&tempPtr, dataSize)); | ||
|
|
There was a problem hiding this comment.
Inconsistent formatting: Extra whitespace after the variable initialization. Align formatting with the rest of the codebase (compare to line 568 which has no trailing space).
| Rpp32s rowKernelLoopLimit = kernelSize; | ||
| get_kernel_loop_limit(i, rowKernelLoopLimit, padLength, unpaddedHeight); | ||
| RpptImageBorderEdge padVertical = i < padLength ? RpptImageBorderEdge::TOP_EDGE : RpptImageBorderEdge::BOTTOM_EDGE; | ||
| process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filter, padVertical); |
There was a problem hiding this comment.
Function parameter mismatch: process_left_border_columns_pln_pln is defined with 9 parameters (including filterXTensor and filterYTensor), but here it's being called with only 8 parameters (only filter). This will cause a compilation error. For non-bidirectional cases (X or Y gradient only), the function should be called with the same filter passed twice, or there should be a separate overload for single-filter cases. The calls at lines 321, 483, and 634 have this issue.
| process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filter, padVertical); | |
| process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filter, filter, padVertical); |
| Rpp32s rowKernelLoopLimit = kernelSize; | ||
| get_kernel_loop_limit(i, rowKernelLoopLimit, padLength, unpaddedHeight); | ||
| RpptImageBorderEdge padVertical = i < padLength ? RpptImageBorderEdge::TOP_EDGE : RpptImageBorderEdge::BOTTOM_EDGE; | ||
| process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filter, padVertical); |
There was a problem hiding this comment.
Same function parameter mismatch as line 321 - process_left_border_columns_pln_pln expects 9 parameters but receives only 8.
| process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filter, padVertical); | |
| process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filter, borderType, padVertical); |
| Rpp32s rowKernelLoopLimit = kernelSize; | ||
| get_kernel_loop_limit(i, rowKernelLoopLimit, padLength, unpaddedHeight); | ||
| RpptImageBorderEdge padVertical = i < padLength ? RpptImageBorderEdge::TOP_EDGE : RpptImageBorderEdge::BOTTOM_EDGE; | ||
| process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filter, padVertical); |
There was a problem hiding this comment.
Same function parameter mismatch as line 321 - process_left_border_columns_pln_pln expects 9 parameters but receives only 8.
| process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filter, padVertical); | |
| process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filter, 1, padVertical); |
| } | ||
|
|
||
| // returns the interpolation type used for image resizing or scaling operations. | ||
| inline std::string get_kernel_size_and_gradient_type(unsigned int val, Rpp32u &kernelSize, Rpp32u &gradientType) |
There was a problem hiding this comment.
Inconsistent spacing: there should be no space between 'Rpp32u' and '&kernelSize' in the parameter list. The ampersand should be adjacent to the parameter name, following the convention used throughout the codebase.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #673 +/- ##
===========================================
+ Coverage 92.44% 92.48% +0.04%
===========================================
Files 196 198 +2
Lines 88305 89139 +834
===========================================
+ Hits 81632 82437 +805
- Misses 6673 6702 +29
🚀 New features to boost your workflow:
|
msiddaiah
left a comment
There was a problem hiding this comment.
Please restest before merging, after addressing the comments by me and copilot.
| /* | ||
| MIT License | ||
|
|
||
| Copyright (c) 2019 - 2025 Advanced Micro Devices, Inc. |
| { | ||
| if ((kernelSize != 3) && (kernelSize != 5) && (kernelSize != 7)) | ||
| return RPP_ERROR_INVALID_ARGUMENTS; | ||
| if ((sobelType != 0) && (sobelType != 1) && (sobelType != 2)) |
There was a problem hiding this comment.
Use #defines for 0, 1, 2 for sobel type checking for code readability
| sum_f8x.f4[1] = FLOAT4_ZERO; | ||
| sum_f8y.f4[0] = FLOAT4_ZERO; | ||
| sum_f8y.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
| sum_f8x.f4[1] = FLOAT4_ZERO; | ||
| sum_f8y.f4[0] = FLOAT4_ZERO; | ||
| sum_f8y.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
| sum_f8x.f4[1] = FLOAT4_ZERO; | ||
| sum_f8y.f4[0] = FLOAT4_ZERO; | ||
| sum_f8y.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
| float *filter_row5 = &filter_row1[20]; | ||
| sum_f8.f4[0] = FLOAT4_ZERO; | ||
| sum_f8.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
| float *filter_row7 = &filter_row1[42]; | ||
| sum_f8.f4[0] = FLOAT4_ZERO; | ||
| sum_f8.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
| float *filter_row3 = &filter_row1[6]; | ||
| sum_f8.f4[0] = FLOAT4_ZERO; | ||
| sum_f8.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
| float *filter_row5 = &filter_row1[20]; | ||
| sum_f8.f4[0] = FLOAT4_ZERO; | ||
| sum_f8.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
| float *filter_row7 = &filter_row1[42]; | ||
| sum_f8.f4[0] = FLOAT4_ZERO; | ||
| sum_f8.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |