Histogram Equalize kernel Implementation - HOST / HIP by RooseweltMcW · Pull Request #533 · r-abishek/rpp

RooseweltMcW · 2025-11-26T13:27:55Z

Histogram Equalize kernel implementation for both HOST and HIP backend

r-abishek

Please address minor comment.
This funciton only has U8.
Please confirm U8 QA passing for this?

r-abishek · 2026-02-20T02:47:16Z

src/modules/tensor/rppt_tensor_color_augmentations.cpp

+
+        if ((srcDescPtr->dataType == RpptDataType::U8) && (dstDescPtr->dataType == RpptDataType::U8))
+        {
+            histogram_equalize_u8_u8_host_tensor(static_cast<Rpp8u*>(srcPtr) + srcDescPtr->offsetInBytes,


Seems like this function is only for U8?

Pls add reejction condition for an F32/I8/F16 input then.

Check for src and dst data types.
Perhaps add an appropriate RPP_ERROR for invalid source data type and invalid destination data type

Added, Done.

Copilot

Pull request overview

This pull request implements histogram equalization functionality for both HOST (CPU) and HIP (GPU) backends in the RPP library. Histogram equalization is a color augmentation technique that enhances image contrast by redistributing pixel intensity values.

Changes:

Adds histogram_equalize augmentation with ID 97 for both HOST and HIP backends
Implements color space conversion (RGB ↔ YCbCr) and histogram equalization on the Y channel for RGB images
Adds test suite integration for the new augmentation

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
utilities/test_suite/rpp_test_suite_image.h	Adds HISTOGRAM_EQUALIZE enum and augmentation map entry
utilities/test_suite/common.py	Adds histogram_equalize to Python test configuration and color_augmentations group
utilities/test_suite/HOST/Tensor_image_host.cpp	Integrates histogram_equalize test case for HOST backend
utilities/test_suite/HIP/Tensor_image_hip.cpp	Integrates histogram_equalize test case for HIP backend
src/modules/tensor/rppt_tensor_color_augmentations.cpp	Implements rppt_histogram_equalize API entry point with validation
src/modules/tensor/hip/kernel/histogram_equalize.cpp	Implements HIP kernels for color conversion, histogram collection, LUT building, and application
src/modules/tensor/cpu/kernel/histogram_equalize.cpp	Implements HOST CPU functions with AVX2 SIMD optimizations for histogram equalization
src/include/tensor/host_tensor_executors.hpp	Declares HOST executor function
src/include/tensor/hip_tensor_executors.hpp	Declares HIP executor function
api/rppt_tensor_color_augmentations.h	Adds public API documentation for rppt_histogram_equalize
docs/data/doxygenOutputs/color_augmentations_histogram_equalize_img150x150.png	Adds sample output image for documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-20T02:59:58Z

src/modules/tensor/hip/kernel/histogram_equalize.cpp

+__device__ constexpr float coeffGCb = -0.344136f;
+__device__ constexpr float coeffGCr = -0.714136f;


The YCbCr to RGB conversion coefficients have a sign mismatch between HOST and HIP implementations. The HIP version uses negative values (coeffGCb = -0.344136f, coeffGCr = -0.714136f) on lines 42-43, while the HOST version uses positive values (coeffGCb = 0.344136f, coeffGCr = 0.714136f) in cpu/kernel/histogram_equalize.cpp lines 43-44. The standard YCbCr to RGB conversion formula is: G = Y - 0.344136 * Cb - 0.714136 * Cr. Since the HIP implementation already negates coeffGCb and coeffGCr when defining them, but then uses them in addition operations (lines 100-101), this results in incorrect calculations. Either the sign should be removed from the coefficient definitions, or the operations should use subtraction instead of addition.

Modified, Done.

Copilot · 2026-02-20T02:59:58Z

src/modules/tensor/cpu/kernel/histogram_equalize.cpp

+        Rpp8u *scratchBase = reinterpret_cast<Rpp8u *>(handle.GetInitHandle()->mem.mcpu.scratchBufferHost);
+        Rpp8u *yBuf = scratchBase + batchCount * (pixels * 3);
+        Rpp8u *cbBuf = yBuf + pixels;
+        Rpp8u *crBuf = cbBuf + pixels;
+        Rpp8u *dstYBuf = yBuf;


Potential buffer overflow in HOST implementation. The code allocates scratch buffers for each batch element without validating total size against available scratch buffer. Line 299 computes: yBuf = scratchBase + batchCount * (pixels * 3), where pixels = roiWidth * roiHeight. For parallel execution with OpenMP (line 278), multiple threads may simultaneously access the scratch buffer with different batchCount values. If the total requirement (batchSize * max_pixels * 3) exceeds the available scratchBufferHost size, this will cause memory corruption. There's no bounds checking to ensure the scratch buffer is large enough.

Copilot · 2026-02-20T02:59:58Z

src/include/tensor/hip_tensor_executors.hpp


+// -------------------- histogram_equalize --------------------
+
+RppStatus hip_exec_histogram_equalize_tensor(Rpp8u *srcPtr1,


Inconsistent parameter naming. The first parameter is named srcPtr1 in the declaration, but it should be srcPtr for consistency with all other function declarations in this file and with the actual implementation.

Suggested change

RppStatus hip_exec_histogram_equalize_tensor(Rpp8u *srcPtr1,

RppStatus hip_exec_histogram_equalize_tensor(Rpp8u *srcPtr,

Modified, Done.

Copilot · 2026-02-20T02:59:59Z

src/modules/tensor/cpu/kernel/histogram_equalize.cpp

+            cbPtr = cbBuf;
+            crPtr = crBuf;
+#if __AVX2__
+            alignedLength = ((roi.xywhROI.roiWidth / vectorIncrement) - 1) * vectorIncrement;


Potential edge case issue with alignedLength calculation. The formula ((roi.xywhROI.roiWidth / vectorIncrement) - 1) * vectorIncrement will produce negative or zero values when roiWidth is less than vectorIncrement (48). For small ROIs (e.g., roiWidth = 16), this results in alignedLength = ((16 / 48) - 1) * 48 = (0 - 1) * 48 = -48. The loop condition vectorLoopCount < alignedLength would then be false initially, which is correct behavior, but using a potentially negative value is poor practice. Consider using: alignedLength = (roi.xywhROI.roiWidth / vectorIncrement) * vectorIncrement to avoid negative values, or add a check to ensure alignedLength is non-negative.

Suggested change

alignedLength = ((roi.xywhROI.roiWidth / vectorIncrement) - 1) * vectorIncrement;

if (roi.xywhROI.roiWidth >= vectorIncrement)

alignedLength = ((roi.xywhROI.roiWidth / vectorIncrement) - 1) * vectorIncrement;

else

alignedLength = 0;

Copilot · 2026-02-20T02:59:59Z

src/modules/tensor/cpu/kernel/histogram_equalize.cpp

+            cbPtr = cbBuf;
+            crPtr = crBuf;
+#if __AVX2__
+            alignedLength = ((roi.xywhROI.roiWidth / vectorIncrement) - 1) * vectorIncrement;


Potential edge case issue with alignedLength calculation. The formula ((roi.xywhROI.roiWidth / vectorIncrement) - 1) * vectorIncrement will produce negative or zero values when roiWidth is less than vectorIncrement (48). For small ROIs (e.g., roiWidth = 16), this results in alignedLength = ((16 / 48) - 1) * 48 = (0 - 1) * 48 = -48. The loop condition vectorLoopCount < alignedLength would then be false initially, which is correct behavior, but using a potentially negative value is poor practice. Consider using: alignedLength = (roi.xywhROI.roiWidth / vectorIncrement) * vectorIncrement to avoid negative values, or add a check to ensure alignedLength is non-negative.

Suggested change

alignedLength = ((roi.xywhROI.roiWidth / vectorIncrement) - 1) * vectorIncrement;

alignedLength = ((roi.xywhROI.roiWidth / vectorIncrement) - 1) * vectorIncrement;

if (alignedLength < 0)

alignedLength = 0;

Copilot · 2026-02-20T02:59:59Z

src/modules/tensor/hip/kernel/histogram_equalize.cpp

+    unsigned int *d_hist = reinterpret_cast<unsigned int*>(handle.GetInitHandle()->mem.mgpu.scratchBufferHip.floatmem);
+    unsigned char *d_lut = reinterpret_cast<unsigned char*>(d_hist + batchSize * HISTOGRAM_BINS);
+
+    if(srcDescPtr->c == 3)
+    {
+        const size_t planeSize = static_cast<size_t>(srcDescPtr->w) * srcDescPtr->h * srcDescPtr->n;
+        Rpp8u *yuvBuf = reinterpret_cast<Rpp8u *>(d_lut + batchSize * HISTOGRAM_BINS);
+
+        Rpp8u *yBuf = yuvBuf;
+        Rpp8u *cbBuf = yuvBuf + planeSize;
+        Rpp8u *crBuf = yuvBuf + (planeSize * 2);


Potential buffer overflow issue. The scratch buffer allocation assumes sequential memory layout without validation. The code allocates:

d_hist: batchSize * 256 * sizeof(unsigned int) = batchSize * 1024 bytes

d_lut: batchSize * 256 * sizeof(unsigned char) = batchSize * 256 bytes

yuvBuf: 3 * planeSize = 3 * w * h * n bytes (for 3-channel images)

The total requirement is batchSize * 1280 + 3 * w * h * n bytes. For large images (e.g., 3840x2160), this equals batchSize * 1280 + 3 * 3840 * 2160 * n = 24,883,200 * n + 1280 * n bytes per batch element. The scratch buffer is 8,294,400 * sizeof(float) = 33,177,600 bytes total. This may be insufficient for large batches or images, yet there's no validation to ensure the required memory fits within the available scratch buffer, risking memory corruption.

Copilot · 2026-02-20T02:59:59Z

src/include/tensor/host_tensor_executors.hpp


+// -------------------- histogram_equalize --------------------
+
+RppStatus histogram_equalize_u8_u8_host_tensor(Rpp8u *srcPtr1,


Inconsistent parameter naming. The first parameter is named srcPtr1 in the declaration, but it should be srcPtr for consistency with all other function declarations in this file and with the actual implementation in histogram_equalize.cpp line 265 which uses srcPtr.

Suggested change

RppStatus histogram_equalize_u8_u8_host_tensor(Rpp8u *srcPtr1,

RppStatus histogram_equalize_u8_u8_host_tensor(Rpp8u *srcPtr,

Modified, Done.

Copilot · 2026-02-20T03:00:00Z

src/modules/tensor/cpu/kernel/histogram_equalize.cpp

+            collect_hist_pln_tensor_host(srcPtr, hist, roiWidth, roiHeight, srcDescPtr->strides.hStride);
+            build_lut_from_hist_host(hist, lutBatch, pixels);
+            apply_lut_tensor(srcPtr, dstPtr, roiWidth, roiHeight, lutBatch, srcDescPtr->strides.hStride, dstDescPtr->strides.hStride);
+        }


Missing error handling for unsupported layout combinations. The implementation only handles specific layout combinations (NCHW->NCHW, NCHW->NHWC, NHWC->NCHW, NHWC->NHWC for 3-channel, and NCHW->NCHW for 1-channel). If the code reaches a combination not explicitly handled (though unlikely given earlier validation), it will silently skip processing and return RPP_SUCCESS without producing any output or error. Consider adding an else clause to return an appropriate error status for consistency with error handling in other parts of the codebase.

Suggested change

}

}

else

{

// Unsupported channel/layout combination

return RPP_ERROR;

}

Not relevant , we already checking the condition and also it is not possible to return inside openMP loop

Copilot · 2026-02-20T03:00:00Z

utilities/test_suite/HOST/Tensor_image_host.cpp

+                    testCaseName = "histogram_equalize";
+
+                    startWallTime = omp_get_wtime();
+                    startCpuTime = clock();


The startCpuTime = clock() statement is missing in this case block. All other test cases in this file follow the pattern of setting both startWallTime and startCpuTime before calling the function. This line should be added after line 1855 (startWallTime = omp_get_wtime()) for consistency with the rest of the codebase.

Already present, can be ignored.

HazarathKumarM · 2026-02-24T17:18:36Z

@r-abishek all comments are reoslved now

HazarathKumarM added 10 commits October 22, 2025 13:50

GPU support for Histogram Equalize

1444bca

HOST and HIP version of Histogram Equalise with QA for U8

7ee8318

Updated build LUT function with AVX version

906c84f

Merge remote-tracking branch 'origin/develop' into apr/histEqualize

a27883b

Updated docs image and updated AVX logic

5e2d798

Optimize Hist Equalize

0eb8899

Updated Helper function

c675c0a

Updated helpers

ef14d73

Cleanup and clamp condition

804a502

Clean up for histogram equalize HOST

1399d99

Srihari-mcw pushed a commit to Srihari-mcw/rpp that referenced this pull request Dec 8, 2025

License - Updates and fixes (r-abishek#533)

e61cdc7

ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 19, 2025

License - Updates and fixes (r-abishek#533)

a2ca744

ManasaDattaT added 4 commits December 22, 2025 06:42

Cleanup

d889118

Code cleanup

2c37148

Taking memory from scratch buffer

90dca9f

Changed C style casts

93696af

ManasaDattaT force-pushed the apr/histEqualize branch from 89ac0c2 to 93696af Compare December 24, 2025 05:00

HazarathKumarM added 4 commits February 17, 2026 23:48

Merge branch 'develop' into apr/histEqualize

1de4528

updated unified api for histogram equalize

7aa6375

Merge branch 'hist_temp' into apr/histEqualize

a76e161

updated the documentation

8a541dc

r-abishek requested changes Feb 20, 2026

View reviewed changes

r-abishek assigned HazarathKumarM and RooseweltMcW Feb 20, 2026

r-abishek added the enhancement New feature or request label Feb 20, 2026

r-abishek requested a review from Copilot February 20, 2026 02:55

Copilot started reviewing on behalf of r-abishek February 20, 2026 02:56 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

HazarathKumarM added 2 commits February 20, 2026 01:44

Resolved review comments

c89e6bc

Updated case list for histogram equalize

8362336

HazarathKumarM added 2 commits February 24, 2026 00:15

Updated parameter names

f394dd1

modified aligned length

45a2e2e

modified mem allocations

226d386

HazarathKumarM force-pushed the apr/histEqualize branch from 74fbd82 to 226d386 Compare February 24, 2026 17:22

r-abishek changed the base branch from develop to ar/histogram_equalize February 25, 2026 00:24

r-abishek approved these changes Feb 25, 2026

View reviewed changes

r-abishek merged commit 0a6a42f into r-abishek:ar/histogram_equalize Feb 25, 2026

		__device__ constexpr float coeffGCb = -0.344136f;
		__device__ constexpr float coeffGCr = -0.714136f;


		// -------------------- histogram_equalize --------------------

		RppStatus hip_exec_histogram_equalize_tensor(Rpp8u *srcPtr1,


		// -------------------- histogram_equalize --------------------

		RppStatus histogram_equalize_u8_u8_host_tensor(Rpp8u *srcPtr1,

-        }
+        }
+        else
+        {
+            // Unsupported channel/layout combination
+            return RPP_ERROR;
+        }

Conversation

RooseweltMcW commented Nov 26, 2025

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

HazarathKumarM Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HazarathKumarM commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HazarathKumarM Feb 24, 2026 •

edited

Loading