Tensor Arithmetic Operations - Updated Branch by Srihari-mcw · Pull Request #491 · r-abishek/rpp

Srihari-mcw · 2025-09-17T03:24:05Z

No description provided.

…r.cpp file

…ointer at beginning

…xecutors.hpp

…rious dims

* Sobel filter implementation for HOST with QA for U8 and F32 * Fixed HIP F32 image generation and QA passed for 1e-4 * Revert test suite image file * Updated HOST backend implementation for sobel filter * Updated documentation and added images * Resolved review comments * Resolved review comments and modified error status to set for two different conditions * updated unified api for sobel filter * cleanup sobel filter api * Resolve copilot review comments * resolve review comments --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

* Optimized version of channel dropout HIP backend and working code for HOST AVX, SSE * Modified name for dropout compute function * Modified way of AVX and SSE version channel dropout to avoid if statments * Modified Channel Dropout with generic compute code reused * Parameters and name change for channel dropout * Modified HIP for better performance * Modified the code and made the channel dropout templated version for all the bitdepths * Modified Erase kernel for grid and cutout dropout version for better performance * Modified the .h file to have the dropout to effects and added output images in the docs * Added output image and modified the .h file to effects for channel dropout * Removed space * added space * Moved grid dropout .h to effects * Modified channel dropout for I8 variant HOST side * Resolved all review comments and modified code to produce results for i8 variant * Removed empty line * Resolved review comments * Modified HOST after merge * Made changes after merging and QA passed for dropout * Channel dropout make_float 4 macro changes * Updated QA with random generator and updated BIN files * Modified QA name changes * Modified RandomSeed value passed as parameter to the function call * Update rppt_tensor_effects_augmentations.cpp indentation modified * Initial modified HIP backend Grid dropout with better performance * Removed space and review comments resolved * Added I8 support for grid dropout * Updated and modified indentation * Modified HIP backend test suite changes * Host side modification for dropout to use init function in API level and have a seperate file for Grid dropout HOST backend * Modified color buffer to use scratch Buffer Host * colon removed * Fixed linker issue and HIP backend passed * Added random erase dropout functionality and modified the test suite * channel dropout implementation * Resolved all the review comments and modified the magic number to set as constant for better understanding, added required comments * Removed other varients of dropout * Removed rd inside kernel * Update kernel and removed unwanted functions * HOST modifications for randomErase * Modified randomization in test suite * Removed unwanted files and headers * Resolved review comments * Updated documentation and resolved comments * updated random noise generation in test suite HOST implementation * Updated kernel files for random erase to remove noise generation logic in test suite and pass buffer for random noise * updated init dropout function * Added break statement after merge conflicts * ROI fixes - Box filter and Median filter (ROCm#652) * Add Box and Median Filter ROI fixes after minor corrections * Fix source index computation --------- Co-authored-by: Mukesh <mukesh.jayakodi@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> * Removed numBoxTensor parameter and resolved review comments * Added early condition to check and return the invalid ROI region * Updated kernel to use anchorBox info for random erase kernel * Add unified api for random_erase * Resolved review comments * Updated param name for batchSize * Resolved review comments * Reverted modified changes --------- Co-authored-by: sampath117 <snehaa@multicorewareinc.com> Co-authored-by: RooseweltMcW <austin.roosewelt@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Mukesh <mukesh.jayakodi@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: HazarathKumarM <119284987+HazarathKumarM@users.noreply.github.com>

Emboss kernel --------- Co-authored-by: RooseweltMcW <austin.roosewelt@multicorewareinc.com> Co-authored-by: root <root@ixt-sjc2-52.local.lan> Co-authored-by: sampath117 <snehaa@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: HazarathKumarM <119284987+HazarathKumarM@users.noreply.github.com>

* Add HIP kernel launch error checking * Rename RPP_ERROR_GPU -> RPP_ERROR_HIP_LAUNCH * Address PR comments * Move melScalePtr deletion to before the HIP launch * Add checks for HIP sobel filter * Fix docstring * Move HIP_CHECK_LAUNCH_RETURN to sobel_filter.cpp * Bump version and update changelog * Address PR comments --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

Dropout (Grid and Cutout) on HIP and HOST --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: sampath117 <snehaa@multicorewareinc.com> Co-authored-by: RooseweltMcW <austin.roosewelt@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: arvindcheru <90783369+arvindcheru@users.noreply.github.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Maddisetty <hmaddise@ctr2-alola-login-01.amd.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: HazarathKumarM <119284987+HazarathKumarM@users.noreply.github.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>

…ocs/sphinx (ROCm#692) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.32.0 to 1.33.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.33.1/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.32.0...v1.33.1) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-version: 1.33.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

Dropout Kernels --------- Co-authored-by: sampath117 <snehaa@multicorewareinc.com> Co-authored-by: RooseweltMcW <austin.roosewelt@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: HazarathKumarM <119284987+HazarathKumarM@users.noreply.github.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>

* Doxygen update * Doxygen updates

* add yuv_To_rgb kernel, test case and 3 test images and their outputs * clean up kernel parameters * test suite upgrade * fix kernel to support full and studio range * change copyright to 2026 * add enums for color standard and range * update version for adding this new kernel * add hip platform to CmakeLists * fix merge conflict errors

* Updates for tensor bitwise operations to have two inputs * Initial addition across pipeline * Updates for testing and fix issues * Testing i8 * Tensor Binary Bitwise Operations extended for short and int datatypes * Changes in test suite to allocate memory for int32 and uint32 datatypes * Updates for templating bitwise binary operations * Initial updates for a HIP Version * Fixes for U8 HIP Implementation and Running * Initial updates for 2D bitwise or * Update the tensor implementations to fix accuracy * Updates to not replace the pointers * Updates for 4D or higher * Fix compilation issues * Updates to fix output for 4D case * Fixes for 1D case * Modifications in outer API and support other bit depths also * Code addition to vectorized test of uchar type * Updates to test U8/F32 vector versions * Updates for fixing tensor bitwise operations vectorization with integer * Updates to test performance * Fix memory related bugs with test suite * Add Strides definition for individual samples * Updates to use newer strides and ROI for computation * Integrated broadcast support and templated all tensor bitwise operations * Added test suite support * updated the incompatible dimensions check * updated mem allocs * Updates to check overall dims to avoid memory errors * Fix issues found while testing different ndims * Rename variables and remove print statements * Allocate lesser memory * Remove unnecessary files * Remove dependency of broadcast.hpp * Updates for CMakeLists.txt * Remove unused function * Remove line * Cleanup rppt_tensor_bitwise_operations.cpp file * Add documentation for external API functions and remove load/store functions * Further cleanup of files * Rename hip kernel file * Update comments * Initial cleanup of HIP Kernels * Add EOF line in common.py * Further cleanup of kernel files * Add changes for separating broadcasting on HOST * Update the code for HIP for broadcastMode * Fix compilation issues * Update vectorized implementation for bitwise OR * Update code for 3D vectorization * Update code for 4D vectorization * Template implementations * Separate broadcast and non broadcast codes into two paths * Pass only the destination strides * broadcastMode introduced as parameter * Add comments * Test Suite Fixes for HOST * Further changes to rpp_test_suite_misc.h * Add more fixes * Fixes for HIP Side * Fix issues with test suite * Update the compare_output calls * Minor cleanup fixes * Updates for consistency * Check for nullptr * Further cleanup updates * Cleanup fixes * Further cleanup files * Complete remaining cleanup * Fix test suite compilation issues * Fix QA issues with tensor bitwise with broadcasting * Fix QA issues with tensor and * Update vectorized versions for 1D, 2D and 3D cases of broadcasting * Compile fixes with vectorized version * Add Bitwise Operations, Function Templates Struct and Bin File Consolidation for both HOST and HIP. * Fix rename to bitDepthTestMode, Datatype format and remove other bin files * Fix Broadcast condition for HIP & HOST and add roi broadcast condition * Fix the renamed of BitdepthTestmode and resolved test suite error * Fix Comments correction, Malloc Corrections and I8 datatype conversion. * update unified API for tensor bitwise ops * fix HOST QA issues * revert unnecessary changes * fix normalize issues and cleanup * revert error code capture code * resolved copilot review comments * fix build error * resolve review comments * resolve review comments --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Mukesh <mukesh.jayakodi@multicorewareinc.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com> Co-authored-by: HazarathKumarM <119284987+HazarathKumarM@users.noreply.github.com>

* added support for tensor add tensor and tensor mul scalar * Add golden outputs * update RPP version * resolve review comments * rename new audio function names * rename new hip audio functions --------- Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>

* GPU support for Histogram Equalize * HOST and HIP version of Histogram Equalise with QA for U8 * Updated build LUT function with AVX version * Updated docs image and updated AVX logic * Optimize Hist Equalize * Updated Helper function * Updated helpers * Cleanup and clamp condition * Clean up for histogram equalize HOST * Cleanup * Code cleanup * Taking memory from scratch buffer * Changed C style casts * updated unified api for histogram equalize * updated the documentation * Resolved review comments * Updated case list for histogram equalize * Updated parameter names * modified aligned length * modified mem allocations * resolve review comments * resolve review comments * resolve review comments --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: ManasaDattaT <tammisetti.manasadatta@multicorewareinc.com> Co-authored-by: HazarathKumarM <119284987+HazarathKumarM@users.noreply.github.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>

…ivide

* optimized Median Filter 3 and 5 kernels * optimizations for kernel size 7 and 9 * cleanup * optimize HIP code and cleanup * gaussian filter optimization * resolve review comments * fix build error * resolve review comments * resolve review comments * resolved review comments * resolved review comments --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: HazarathKumarM <119284987+HazarathKumarM@users.noreply.github.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>

* update images in readme * Add voxel gifs , modify the images for 2D augmentations * update testsuite readme * updated original images * updated augmentation table and added slice image --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com>

* Suppress unused functions (roi_conversion) * Add extra braces to hip load helpers in rpp_hip_load_store.hpp * Remove unused variables in rpp_cpu_simd_load_store.hpp * Check return status in random_erase.cpp * Resolve uninitialized variables and fix bug in tensor_mean.cpp/jitter.cpp * Suppress any warnings that haven't been addressed in phase 1 * Fix typo in function name * Fix typo in function name * Fix typos --------- Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>

Srihari-mcw added 30 commits April 24, 2025 08:14

Add initial implementation of tensor add tensor host

812c872

Update both sources with respective strides

1d2f077

Initial working version of tensor add tensor host without broadcasting

401c362

Add broadcast support file and make modifications to tensor_add_tenso…

36eb127

…r.cpp file

Current test patch here

7dc9f74

Cleanup tensor_add_tensor and changes to accomodate shift of source p…

01603a8

…ointer at beginning

Make tensor_add_tensor_recursive templated

dc52dca

Add initial implementation for F16 implementation for tensor_add_tensor

7e50fd2

Changes currently in test suite for testing tensor operations

093e4b6

Tensor operations for arithmetic

57db8a4

Add support for subtraction, division and multiply across HOST

116ea92

Fix linker errors and other issues

6d7bedb

Make changes for F16 test invocation for tensor arithmetic operations

be7342e

Add declarations for tensor operations F16 bit depth in host_tensor_e…

496f8c0

…xecutors.hpp

Fix compilation issues

12ca8b8

Add AVX implementation for broadcastNDim = 1

b571f29

Make changes to have code for else case inside broadcaseNDim = 1

06dcbef

Updates for src2shape == 1 for broadcastNDim = 1

146d371

Add code for src1shape == 1

d194ff4

Add code for broadcastNDim = 2

b2bb242

Add updates for numDims=3

f387e54

Fix issues with broadcastNDim = 3

9194f18

Add vectorized versions of divide, multiply and sub operations for va…

d789ce4

…rious dims

Fix compilation with subtract, multiply and divide

020d8c0

Update the leftover part processed by raw C

b656b57

Bug fixes for src1shape == 1 and broadastNDim = 2

a06c1a7

Add AVX Version for F16 version of tensor_multiply_tensor

d248f2d

Update the store function

2f73e65

Update Rpp32f* to Rpp16f*

113b9b3

Modify to cast the pointer

c210d6c

HazarathKumarM and others added 30 commits February 27, 2026 04:19

resolve review comments

daa9b5c

Merge branch 'develop' into tensor_arithmetic_float_divide

5af06fb

resolve review comments

777c845

Merge branch 'develop' into tensor_arithmetic_float_divide

cb8d5e6

Merge branch 'develop' into tensor_arithmetic_float_divide

a4f4d2e

Merge branch 'develop' into tensor_arithmetic_float_divide

146098d

Merge branch 'develop' into tensor_arithmetic_float_divide

3532e23

Add copyright

1f21475

resolve review comments

ac562a1

add missing bitdepths

ddfb3b3

Docs - update doxygen headers to correct image names (ROCm#695)

57e2b40

* Doxygen update * Doxygen updates

Merge remote-tracking branch 'develop' into tensor_arithmetic_float_d…

8b25f9e

…ivide

resolve review comments

af75376

Fix dropout issues

e9e7fc8

Merge branch 'fix_tot_issues' into tensor_arithmetic_float_divide

eda310b

Merge branch 'develop' into tensor_arithmetic_float_divide

d510a89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor Arithmetic Operations - Updated Branch#491

Tensor Arithmetic Operations - Updated Branch#491
Srihari-mcw wants to merge 131 commits intor-abishek:ar/broadcasting_tensor_arithmeticfrom
Srihari-mcw:tensor_arithmetic_float_divide

Srihari-mcw commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

Srihari-mcw commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants