Skip to content

Emboss - resolved review comments#568

Merged
r-abishek merged 7 commits intor-abishek:ar/embossfrom
RooseweltMcW:apr/emboss
Feb 25, 2026
Merged

Emboss - resolved review comments#568
r-abishek merged 7 commits intor-abishek:ar/embossfrom
RooseweltMcW:apr/emboss

Conversation

@RooseweltMcW
Copy link
Copy Markdown

No description provided.

@r-abishek r-abishek changed the base branch from ar/emboss to master February 24, 2026 04:14
@r-abishek r-abishek changed the base branch from master to ar/emboss February 24, 2026 04:14
@r-abishek
Copy link
Copy Markdown
Owner

@RooseweltMcW Shows merge conflicts

@r-abishek r-abishek requested a review from Copilot February 24, 2026 04:23
@r-abishek r-abishek added the minor fix Minor implementation fixes label Feb 24, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes OpenCL + legacy (RPPI/BatchPD) surface area, unifies/updates the public tensor APIs to support runtime backend selection (HOST vs HIP), and refreshes build + packaging + docs accordingly.

Changes:

  • Removed OpenCL backend + BatchPD OpenCL/HIP headers, CMake finder, and associated docs.
  • Unified multiple RPPT tensor APIs into single entrypoints that take RppBackend executionBackend.
  • Updated build/packaging metadata (version bump, Debian packaging templates, Sphinx deps, changelog, Doxygen inputs).

Reviewed changes

Copilot reviewed 50 out of 700 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/include/batch_pd/hip/hipoc_program.hpp Removed BatchPD HIP program wrapper header.
src/include/batch_pd/hip/hipoc_kernel.hpp Removed BatchPD HIP kernel wrapper header.
src/include/batch_pd/cl/temp_file.hpp Removed BatchPD OpenCL temp-file helper.
src/include/batch_pd/cl/rpp_cl_common.hpp Removed BatchPD OpenCL common utilities.
src/include/batch_pd/cl/oclkernel.hpp Removed BatchPD OpenCL kernel wrapper.
src/include/batch_pd/cl/ocldeviceinfo.hpp Removed BatchPD OpenCL device info helpers.
src/include/batch_pd/cl/load_file.hpp Removed BatchPD OpenCL file loader helper.
src/include/batch_pd/cl/clhelper.hpp Removed BatchPD OpenCL program/kernel helpers.
docs/sphinx/requirements.txt Updated pinned doc toolchain versions.
docs/sphinx/requirements.in Bumped rocm-docs-core input requirement.
docs/install/rpp-verify-install.rst Removed OpenCL verification instructions.
docs/install/rpp-build-and-install.rst Removed OpenCL build instructions.
cmake/FindOpenCL.cmake Removed custom OpenCL finder module.
api/rppt_tensor_morphological_operations.h Added HOST variants + unified erode/dilate entrypoints.
api/rppt_tensor_filter_augmentations.h Unified filter APIs into backend-selectable entrypoints (plus doc updates).
api/rppt_tensor_data_exchange_operations.h Unified data-exchange APIs into backend-selectable entrypoints.
api/rppt_tensor_bitwise_operations.h Unified bitwise APIs into backend-selectable entrypoints.
api/rppt_tensor_audio_augmentations.h Unified audio APIs into backend-selectable entrypoints.
api/rppt_tensor_arithmetic_operations.h Unified arithmetic APIs into backend-selectable entrypoints.
api/rppt.h Removed inclusion of legacy augmentation header.
api/rppi_statistical_operations.h Removed deprecated RPPI statistical operations header.
api/rppi_morphological_transforms.h Removed deprecated RPPI morphological transforms header.
api/rppi_logical_operations.h Removed deprecated RPPI logical operations header.
api/rppi_legacy_augmentations.h Removed legacy augmentation header.
api/rppi_filter_operations.h Removed deprecated RPPI filter operations header.
api/rppi_color_model_conversions.h Removed deprecated RPPI color conversions header.
api/rppi_arithmetic_operations.h Removed deprecated RPPI arithmetic operations header.
api/rppi.h Removed deprecated umbrella RPPI header.
api/rppdefs.h Backend macro rename, removed OCL backend enum, added new status code + constants.
api/rpp_backend.h.in Removed OpenCL backend CMake define.
api/rpp.h Removed OpenCL queue typedefs + legacy include paths; clarified HIP/HOST-only handle docs.
README.md Removed OpenCL backend build section.
DEBIAN/copyright.in Switched to Debian copyright-format template content.
CMakeLists.txt Removed OpenCL + legacy toggles; added TheRock handling + version bump + packaging hooks.
CHANGELOG.md Documented API unification / backend selection + removal of legacy/OpenCL.
.Doxyfile Removed RPPI headers from Doxygen INPUT.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if(USING_THE_ROCK)
if(NOT DEFINED ENV{HIP_DEVICE_LIB_PATH})
set(ENV{HIP_DEVICE_LIB_PATH} ${ROCM_PATH}/lib/llvm/amdgcn/bitcode)
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "HIP_DEVICE_LIB_PATH=${ROCM_PATH}/lib/llvm/amdgcn/bitcode")
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RULE_LAUNCH_COMPILE expects a launcher command, not an environment-variable assignment. Setting it to HIP_DEVICE_LIB_PATH=... will typically be treated as an executable name and can break compilation. Use set(ENV{HIP_DEVICE_LIB_PATH} ...) alone or set RULE_LAUNCH_COMPILE to a real launcher (e.g., ${CMAKE_COMMAND} -E env HIP_DEVICE_LIB_PATH=... <compiler>), depending on the intended behavior.

Suggested change
set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "HIP_DEVICE_LIB_PATH=${ROCM_PATH}/lib/llvm/amdgcn/bitcode")

Copilot uses AI. Check for mistakes.
@@ -1 +1 @@
rocm-docs-core[api_reference]==1.27.0
rocm-docs-core[api_reference]==1.31.3
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra name api_reference is inconsistent with the generated requirements.txt line that uses rocm-docs-core[api-reference]. Extras are normalized in some tooling, but not all; this mismatch can break reproducible lockfile generation (pip-compile) or dependency resolution. Align the extra name with the one actually provided by rocm-docs-core (and regenerate requirements.txt).

Suggested change
rocm-docs-core[api_reference]==1.31.3
rocm-docs-core[api-reference]==1.31.3

Copilot uses AI. Check for mistakes.
Comment on lines 68 to 69
#ifdef GPU_SUPPORT
/*! \brief Erode augmentation on HIP backend for a NCHW/NHWC layout tensor
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unified rppt_erode(..., RppBackend executionBackend) declaration is guarded by #ifdef GPU_SUPPORT, which prevents HOST-only builds from using the unified API even though executionBackend can select RPP_HOST_BACKEND. This is inconsistent with other unified APIs in this PR (e.g., box/median/gaussian filter) that are not GPU-guarded. Consider declaring rppt_erode/rppt_dilate unconditionally and having the implementation return an appropriate status when HIP is requested but not compiled in.

Copilot uses AI. Check for mistakes.
Comment on lines +90 to 91
RppStatus rppt_erode(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32u kernelSize, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle, RppBackend executionBackend);
#endif // GPU_SUPPORT
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unified rppt_erode(..., RppBackend executionBackend) declaration is guarded by #ifdef GPU_SUPPORT, which prevents HOST-only builds from using the unified API even though executionBackend can select RPP_HOST_BACKEND. This is inconsistent with other unified APIs in this PR (e.g., box/median/gaussian filter) that are not GPU-guarded. Consider declaring rppt_erode/rppt_dilate unconditionally and having the implementation return an appropriate status when HIP is requested but not compiled in.

Copilot uses AI. Check for mistakes.
* \param [in] roiTensorPtrSrc ROI data in HIP memory (for HIP backend) or HOST memory (for HOST backend), for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>
* \param [in] rppHandle RPP HIP/HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation references rppCreateWithBatchSize(), but the public handle creation API shown elsewhere in this PR is rppCreate(...) (with a batch size parameter). If rppCreateWithBatchSize() is no longer part of the public API, this should be updated to prevent user confusion (similar references appear elsewhere, e.g., older rppCreateWithStreamAndBatchSize() wording).

Copilot uses AI. Check for mistakes.
*/
RppStatus rppt_bitwise_xor_gpu(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT
RppStatus rppt_bitwise_xor(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle, RppBackend executionBackend);
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation references rppCreateWithBatchSize(), but the public handle creation API shown elsewhere in this PR is rppCreate(...) (with a batch size parameter). If rppCreateWithBatchSize() is no longer part of the public API, this should be updated to prevent user confusion (similar references appear elsewhere, e.g., older rppCreateWithStreamAndBatchSize() wording).

Copilot uses AI. Check for mistakes.
Comment on lines +31 to +35
* "Pinned buffer API support for HOST and HIP

### Changed

* AMDClag++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin`
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the stray quote and missing closing quote/backtick in the 'Pinned buffer' bullet, and correct AMDClag++ to AMDClang++.

Suggested change
* "Pinned buffer API support for HOST and HIP
### Changed
* AMDClag++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin`
* Pinned buffer API support for HOST and HIP
### Changed
* AMDClang++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin`

Copilot uses AI. Check for mistakes.
Comment on lines +31 to +35
* "Pinned buffer API support for HOST and HIP

### Changed

* AMDClag++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin`
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the stray quote and missing closing quote/backtick in the 'Pinned buffer' bullet, and correct AMDClag++ to AMDClang++.

Suggested change
* "Pinned buffer API support for HOST and HIP
### Changed
* AMDClag++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin`
* Pinned buffer API support for HOST and HIP
### Changed
* AMDClang++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin`

Copilot uses AI. Check for mistakes.
* \details This function performs the fmadd operation on a batch of 4D tensors.
* It multiplies each element of the source tensor by a corresponding element in the 'mulTensor',
* adds a corresponding element from the 'addTensor', and stores the result in the destination tensor.
* Support added for f32 -> f32 dataype.
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in documentation: dataypedatatype.

Copilot uses AI. Check for mistakes.
Source: https://github.com/ROCm/@DEB_PACKAGE_NAME@
Files: *
License: @DEB_LICENSE@
Copyright: @DEB_COPYRIGHT_YEAR@ Advanced Micro Devices, Inc. All rights Reserved.
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is inconsistent with the MIT license text that follows (MIT is typically 'All rights reserved.' without the unusual capitalization, and Debian machine-readable copyright files usually separate Copyright: from License: blocks cleanly). Consider adjusting wording/capitalization and ensuring the machine-readable structure matches Debian's copyright-format expectations to avoid packaging lint issues.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sum_f8.f4[0] = FLOAT4_ZERO;
sum_f8.f4[1] = FLOAT4_ZERO;
if ((id_x_i >= roiBeginX) && ((id_x_i + 7 + padLength) < roiWidth) && (id_y_i >= roiBeginY) && (id_y_i < roiHeight))
if ((id_x_i >= 0) && ((id_x_i + 7 + padLength) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight))
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent boundary condition check. This line still uses padLength in the expression ((id_x_i + 7 + padLength) < roiWidth), while all other similar checks in this PR have been updated to use ((id_x_i + 7) < roiWidth). This inconsistency will cause incorrect behavior for the 7x7 emboss filter. The padLength should be removed from this condition.

Suggested change
if ((id_x_i >= 0) && ((id_x_i + 7 + padLength) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight))
if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight))

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified, Done.

sum_f8.f4[0] = FLOAT4_ZERO;
sum_f8.f4[1] = FLOAT4_ZERO;
if ((id_x_i >= roiBeginX) && ((id_x_i + 7 + padLength) < roiWidth) && (id_y_i >= roiBeginY) && (id_y_i < roiHeight))
if ((id_x_i >= 0) && ((id_x_i + 7) < 0) && (id_y_i >= roiBeginY) && (id_y_i < roiHeight))
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical bug in the boundary condition check. The condition ((id_x_i + 7) < 0) is incorrect and will always be false for valid coordinates. This should be ((id_x_i + 7) < roiWidth) to match the pattern used in other parts of the code. Additionally, id_y_i >= roiBeginY should be id_y_i >= 0 to be consistent with the coordinate system changes made throughout this PR.

Suggested change
if ((id_x_i >= 0) && ((id_x_i + 7) < 0) && (id_y_i >= roiBeginY) && (id_y_i < roiHeight))
if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight))

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified, Done.

@r-abishek r-abishek merged commit c4afb65 into r-abishek:ar/emboss Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor fix Minor implementation fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants