Emboss - resolved review comments#568
Conversation
|
@RooseweltMcW Shows merge conflicts |
There was a problem hiding this comment.
Pull request overview
This PR removes OpenCL + legacy (RPPI/BatchPD) surface area, unifies/updates the public tensor APIs to support runtime backend selection (HOST vs HIP), and refreshes build + packaging + docs accordingly.
Changes:
- Removed OpenCL backend + BatchPD OpenCL/HIP headers, CMake finder, and associated docs.
- Unified multiple RPPT tensor APIs into single entrypoints that take
RppBackend executionBackend. - Updated build/packaging metadata (version bump, Debian packaging templates, Sphinx deps, changelog, Doxygen inputs).
Reviewed changes
Copilot reviewed 50 out of 700 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/include/batch_pd/hip/hipoc_program.hpp | Removed BatchPD HIP program wrapper header. |
| src/include/batch_pd/hip/hipoc_kernel.hpp | Removed BatchPD HIP kernel wrapper header. |
| src/include/batch_pd/cl/temp_file.hpp | Removed BatchPD OpenCL temp-file helper. |
| src/include/batch_pd/cl/rpp_cl_common.hpp | Removed BatchPD OpenCL common utilities. |
| src/include/batch_pd/cl/oclkernel.hpp | Removed BatchPD OpenCL kernel wrapper. |
| src/include/batch_pd/cl/ocldeviceinfo.hpp | Removed BatchPD OpenCL device info helpers. |
| src/include/batch_pd/cl/load_file.hpp | Removed BatchPD OpenCL file loader helper. |
| src/include/batch_pd/cl/clhelper.hpp | Removed BatchPD OpenCL program/kernel helpers. |
| docs/sphinx/requirements.txt | Updated pinned doc toolchain versions. |
| docs/sphinx/requirements.in | Bumped rocm-docs-core input requirement. |
| docs/install/rpp-verify-install.rst | Removed OpenCL verification instructions. |
| docs/install/rpp-build-and-install.rst | Removed OpenCL build instructions. |
| cmake/FindOpenCL.cmake | Removed custom OpenCL finder module. |
| api/rppt_tensor_morphological_operations.h | Added HOST variants + unified erode/dilate entrypoints. |
| api/rppt_tensor_filter_augmentations.h | Unified filter APIs into backend-selectable entrypoints (plus doc updates). |
| api/rppt_tensor_data_exchange_operations.h | Unified data-exchange APIs into backend-selectable entrypoints. |
| api/rppt_tensor_bitwise_operations.h | Unified bitwise APIs into backend-selectable entrypoints. |
| api/rppt_tensor_audio_augmentations.h | Unified audio APIs into backend-selectable entrypoints. |
| api/rppt_tensor_arithmetic_operations.h | Unified arithmetic APIs into backend-selectable entrypoints. |
| api/rppt.h | Removed inclusion of legacy augmentation header. |
| api/rppi_statistical_operations.h | Removed deprecated RPPI statistical operations header. |
| api/rppi_morphological_transforms.h | Removed deprecated RPPI morphological transforms header. |
| api/rppi_logical_operations.h | Removed deprecated RPPI logical operations header. |
| api/rppi_legacy_augmentations.h | Removed legacy augmentation header. |
| api/rppi_filter_operations.h | Removed deprecated RPPI filter operations header. |
| api/rppi_color_model_conversions.h | Removed deprecated RPPI color conversions header. |
| api/rppi_arithmetic_operations.h | Removed deprecated RPPI arithmetic operations header. |
| api/rppi.h | Removed deprecated umbrella RPPI header. |
| api/rppdefs.h | Backend macro rename, removed OCL backend enum, added new status code + constants. |
| api/rpp_backend.h.in | Removed OpenCL backend CMake define. |
| api/rpp.h | Removed OpenCL queue typedefs + legacy include paths; clarified HIP/HOST-only handle docs. |
| README.md | Removed OpenCL backend build section. |
| DEBIAN/copyright.in | Switched to Debian copyright-format template content. |
| CMakeLists.txt | Removed OpenCL + legacy toggles; added TheRock handling + version bump + packaging hooks. |
| CHANGELOG.md | Documented API unification / backend selection + removal of legacy/OpenCL. |
| .Doxyfile | Removed RPPI headers from Doxygen INPUT. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if(USING_THE_ROCK) | ||
| if(NOT DEFINED ENV{HIP_DEVICE_LIB_PATH}) | ||
| set(ENV{HIP_DEVICE_LIB_PATH} ${ROCM_PATH}/lib/llvm/amdgcn/bitcode) | ||
| set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "HIP_DEVICE_LIB_PATH=${ROCM_PATH}/lib/llvm/amdgcn/bitcode") |
There was a problem hiding this comment.
RULE_LAUNCH_COMPILE expects a launcher command, not an environment-variable assignment. Setting it to HIP_DEVICE_LIB_PATH=... will typically be treated as an executable name and can break compilation. Use set(ENV{HIP_DEVICE_LIB_PATH} ...) alone or set RULE_LAUNCH_COMPILE to a real launcher (e.g., ${CMAKE_COMMAND} -E env HIP_DEVICE_LIB_PATH=... <compiler>), depending on the intended behavior.
| set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "HIP_DEVICE_LIB_PATH=${ROCM_PATH}/lib/llvm/amdgcn/bitcode") |
| @@ -1 +1 @@ | |||
| rocm-docs-core[api_reference]==1.27.0 | |||
| rocm-docs-core[api_reference]==1.31.3 | |||
There was a problem hiding this comment.
The extra name api_reference is inconsistent with the generated requirements.txt line that uses rocm-docs-core[api-reference]. Extras are normalized in some tooling, but not all; this mismatch can break reproducible lockfile generation (pip-compile) or dependency resolution. Align the extra name with the one actually provided by rocm-docs-core (and regenerate requirements.txt).
| rocm-docs-core[api_reference]==1.31.3 | |
| rocm-docs-core[api-reference]==1.31.3 |
| #ifdef GPU_SUPPORT | ||
| /*! \brief Erode augmentation on HIP backend for a NCHW/NHWC layout tensor |
There was a problem hiding this comment.
The unified rppt_erode(..., RppBackend executionBackend) declaration is guarded by #ifdef GPU_SUPPORT, which prevents HOST-only builds from using the unified API even though executionBackend can select RPP_HOST_BACKEND. This is inconsistent with other unified APIs in this PR (e.g., box/median/gaussian filter) that are not GPU-guarded. Consider declaring rppt_erode/rppt_dilate unconditionally and having the implementation return an appropriate status when HIP is requested but not compiled in.
| RppStatus rppt_erode(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32u kernelSize, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle, RppBackend executionBackend); | ||
| #endif // GPU_SUPPORT |
There was a problem hiding this comment.
The unified rppt_erode(..., RppBackend executionBackend) declaration is guarded by #ifdef GPU_SUPPORT, which prevents HOST-only builds from using the unified API even though executionBackend can select RPP_HOST_BACKEND. This is inconsistent with other unified APIs in this PR (e.g., box/median/gaussian filter) that are not GPU-guarded. Consider declaring rppt_erode/rppt_dilate unconditionally and having the implementation return an appropriate status when HIP is requested but not compiled in.
| * \param [in] roiTensorPtrSrc ROI data in HIP memory (for HIP backend) or HOST memory (for HOST backend), for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y)) | ||
| * \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB) | ||
| * \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt> | ||
| * \param [in] rppHandle RPP HIP/HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt> |
There was a problem hiding this comment.
The documentation references rppCreateWithBatchSize(), but the public handle creation API shown elsewhere in this PR is rppCreate(...) (with a batch size parameter). If rppCreateWithBatchSize() is no longer part of the public API, this should be updated to prevent user confusion (similar references appear elsewhere, e.g., older rppCreateWithStreamAndBatchSize() wording).
| */ | ||
| RppStatus rppt_bitwise_xor_gpu(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle); | ||
| #endif // GPU_SUPPORT | ||
| RppStatus rppt_bitwise_xor(RppPtr_t srcPtr1, RppPtr_t srcPtr2, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle, RppBackend executionBackend); |
There was a problem hiding this comment.
The documentation references rppCreateWithBatchSize(), but the public handle creation API shown elsewhere in this PR is rppCreate(...) (with a batch size parameter). If rppCreateWithBatchSize() is no longer part of the public API, this should be updated to prevent user confusion (similar references appear elsewhere, e.g., older rppCreateWithStreamAndBatchSize() wording).
| * "Pinned buffer API support for HOST and HIP | ||
|
|
||
| ### Changed | ||
|
|
||
| * AMDClag++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin` |
There was a problem hiding this comment.
Fix the stray quote and missing closing quote/backtick in the 'Pinned buffer' bullet, and correct AMDClag++ to AMDClang++.
| * "Pinned buffer API support for HOST and HIP | |
| ### Changed | |
| * AMDClag++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin` | |
| * Pinned buffer API support for HOST and HIP | |
| ### Changed | |
| * AMDClang++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin` |
| * "Pinned buffer API support for HOST and HIP | ||
|
|
||
| ### Changed | ||
|
|
||
| * AMDClag++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin` |
There was a problem hiding this comment.
Fix the stray quote and missing closing quote/backtick in the 'Pinned buffer' bullet, and correct AMDClag++ to AMDClang++.
| * "Pinned buffer API support for HOST and HIP | |
| ### Changed | |
| * AMDClag++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin` | |
| * Pinned buffer API support for HOST and HIP | |
| ### Changed | |
| * AMDClang++ compiler has moved to `${ROCM_PATH}/lib/llvm/bin` |
| * \details This function performs the fmadd operation on a batch of 4D tensors. | ||
| * It multiplies each element of the source tensor by a corresponding element in the 'mulTensor', | ||
| * adds a corresponding element from the 'addTensor', and stores the result in the destination tensor. | ||
| * Support added for f32 -> f32 dataype. |
There was a problem hiding this comment.
Typo in documentation: dataype → datatype.
| Source: https://github.com/ROCm/@DEB_PACKAGE_NAME@ | ||
| Files: * | ||
| License: @DEB_LICENSE@ | ||
| Copyright: @DEB_COPYRIGHT_YEAR@ Advanced Micro Devices, Inc. All rights Reserved. |
There was a problem hiding this comment.
This line is inconsistent with the MIT license text that follows (MIT is typically 'All rights reserved.' without the unusual capitalization, and Debian machine-readable copyright files usually separate Copyright: from License: blocks cleanly). Consider adjusting wording/capitalization and ensuring the machine-readable structure matches Debian's copyright-format expectations to avoid packaging lint issues.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sum_f8.f4[0] = FLOAT4_ZERO; | ||
| sum_f8.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= roiBeginX) && ((id_x_i + 7 + padLength) < roiWidth) && (id_y_i >= roiBeginY) && (id_y_i < roiHeight)) | ||
| if ((id_x_i >= 0) && ((id_x_i + 7 + padLength) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
There was a problem hiding this comment.
Inconsistent boundary condition check. This line still uses padLength in the expression ((id_x_i + 7 + padLength) < roiWidth), while all other similar checks in this PR have been updated to use ((id_x_i + 7) < roiWidth). This inconsistency will cause incorrect behavior for the 7x7 emboss filter. The padLength should be removed from this condition.
| if ((id_x_i >= 0) && ((id_x_i + 7 + padLength) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) | |
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
| sum_f8.f4[0] = FLOAT4_ZERO; | ||
| sum_f8.f4[1] = FLOAT4_ZERO; | ||
| if ((id_x_i >= roiBeginX) && ((id_x_i + 7 + padLength) < roiWidth) && (id_y_i >= roiBeginY) && (id_y_i < roiHeight)) | ||
| if ((id_x_i >= 0) && ((id_x_i + 7) < 0) && (id_y_i >= roiBeginY) && (id_y_i < roiHeight)) |
There was a problem hiding this comment.
Critical bug in the boundary condition check. The condition ((id_x_i + 7) < 0) is incorrect and will always be false for valid coordinates. This should be ((id_x_i + 7) < roiWidth) to match the pattern used in other parts of the code. Additionally, id_y_i >= roiBeginY should be id_y_i >= 0 to be consistent with the coordinate system changes made throughout this PR.
| if ((id_x_i >= 0) && ((id_x_i + 7) < 0) && (id_y_i >= roiBeginY) && (id_y_i < roiHeight)) | |
| if ((id_x_i >= 0) && ((id_x_i + 7) < roiWidth) && (id_y_i >= 0) && (id_y_i < roiHeight)) |
No description provided.