Skip to content

RPP Jpeg Compression distortion Tensor Support#408

Merged
r-abishek merged 19 commits intor-abishek:ar/opt_jpeg_distortion_hostfrom
HazarathKumarM:hk/jpeg_distortion
Mar 26, 2025
Merged

RPP Jpeg Compression distortion Tensor Support#408
r-abishek merged 19 commits intor-abishek:ar/opt_jpeg_distortion_hostfrom
HazarathKumarM:hk/jpeg_distortion

Conversation

@HazarathKumarM
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HazarathKumarM Please address first round comments


inline void rpp_load24_f16pkd3_to_f32pln3_avx(Rpp16f *srcPtr, __m256 *p)
{
__m128 p128[8];
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for removal of this helper?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted back


inline void rpp_load24_f16pln3_to_f32pln3_avx(Rpp16f *srcPtrR, Rpp16f *srcPtrG, Rpp16f *srcPtrB, __m256 *p)
{
p[0] = _mm256_cvtph_ps(_mm_castps_si128(_mm_loadu_ps(reinterpret_cast<Rpp32f *>(srcPtrR))));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

91: ["tensor_stddev", "HOST", "HIP"],
92: ["slice", "HOST", "HIP"]
92: ["slice", "HOST", "HIP"],
93: ["jpeg_compression_distortion", "HOST", "HIP"]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, this PR seems to only have HOST. If HIP is a separate PR, remove here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

using namespace std;
#define MAX_IMAGE_DUMP 20
#define MAX_BATCH_SIZE 512
#define GOLDEN_OUTPUT_MAX_HEIGHT 150 // Golden outputs are generated with MAX_HEIGHT set to 150. Changing this constant will result in QA test failures
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 700+ lines changing here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got added in the merge

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

* It introduces artifacts seen in lossy JPEG compression by converting the image to the frequency domain using the Discrete Cosine Transform (DCT),
* applying quantization, and then reconstructing the image using the inverse DCT (IDCT).
* This process introduces compression-related distortions similar to those in JPEG images.
* \param [in] srcPtr source tensor in HOST memory
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add sample input and outputs like we do above

 * \image html lens_img640x480.png  Sample Input
 * \image html geometric_augmentations_lens_correction_img_640x480.png Sample Output

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

x[0] = _mm256_setr_ps(y0, y1, y2, y3, y4, y5, y6, y7);
}

void quantizeBlockAVX2(__m256 *p, float quantTable[8][8])
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Rpp32f, Rpp32s or Rpp32u etc everywhere. Multiple instances.
Only fine to use int for loop indexes and int/float inside hip kernels for now.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this quantTable should be call by reference too

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


void quantizeBlockAVX2(__m256 *p, float quantTable[8][8])
{
float qscale = GetQualityFactorScale(50);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function helpers in snake_case. Please adhere same standards!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


void quantizeBlockAVX2(__m256 *p, float quantTable[8][8])
{
float qscale = GetQualityFactorScale(50);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this just qualityFactor instead of qscale. Same inside the GetQualityFactorScale() function

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

void quantizeBlockAVX2(__m256 *p, float quantTable[8][8])
{
float qscale = GetQualityFactorScale(50);
__m256 scale = _mm256_set1_ps(qscale);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this pQualityFactor to indicate vector float variable

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Convert to Cb/Cr
int chromaIdx = (row / 2) * 8 + (col / 2);
cb[chromaIdx] = std::clamp((-0.168736f * avgR - 0.331264f * avgG + 0.5f * avgB + 128.0f), 0.0f, 255.0f) - 128.0f;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add parentheses for readability for every other instance without.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@r-abishek r-abishek added the enhancement New feature or request label Mar 13, 2025
@r-abishek r-abishek added this to the sow12ms2 milestone Mar 13, 2025
Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HazarathKumarM Few more comments

for (Rpp32s col = 0; col < 8; col++)
{
Rpp32s idx = row * stride + col;
Rpp32f Qcoeff = quantTable[row * 8 + col] * qualityFactor; // Directly accessing 1D array
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to camelCase -> qCoeff and not Qcoeff
Write the loop as for (Rpp32s row = 0, rowTimes8 = 0; row < 8; row++, rowTimes8 += 8)
Please take a look at such optimizations too

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
for (Rpp32s col = 0; col < 8; col++)
{
Rpp32s idx = row * stride + col;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please say something like Rpp32s rowIdx = row * stride; outside this loop and dont repeat constant multiplies inside the col loop

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


x1 = dctNormFactor * (dctCoeff1 * tmp4 - dctCoeff3 * tmp5 + dctCoeff4 * tmp6 - dctCoeff6 * tmp7);
x3 = dctNormFactor * (dctCoeff3 * tmp4 + dctCoeff6 * tmp5 - dctCoeff1 * tmp6 + dctCoeff4 * tmp7);
x5 = dctNormFactor * (dctCoeff4 * tmp4 + dctCoeff1 * tmp5 + dctCoeff6 * tmp6 - dctCoeff3 * tmp7);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use parentheses every time there is such compute for readability.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always for all instances:

x1 = dctNormFactor * ((dctCoeff1 * tmp4) - (dctCoeff3 * tmp5) + (dctCoeff4 * tmp6) - (dctCoeff6 * tmp7));

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


temp[0] = x[0] + x[7];
temp[1] = x[1] + x[6];
temp[2] = x[2] + x[5];
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use temp as an array here but not for the two functions above. Lets use an array everywhere

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if constexpr (std::is_same<T, Rpp32f>::value || std::is_same<T, Rpp16f>::value)
{
r[row * 16 + col] = static_cast<Rpp32f>(srcPtrR[idx]) * 255.0f;
g[row * 16 + col] = static_cast<Rpp32f>(srcPtrG[idx]) * 255.0f;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the row * 16 and the row * hStride are constants for the second inner loop. Can be taken out please!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Process 2x2 Y pixels for each Cb/Cr pair
for (Rpp32s sub_row = 0; sub_row < 2; sub_row++)
{
for (Rpp32s sub_col = 0; sub_col < 2; sub_col++)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistently use camelCase for variables

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Rpp32s dstIdx = (row * 2 + sub_row) * hStride + (col * 2 + sub_col) * wStride;

// Prevent out-of-bounds access
if ((row * 2 + sub_row) >= colLimit || (col * 2 + sub_col) >= rowLimit)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again row * 2 + subRow is computed multiple times inside the loop while its a constant. Please take it out of the loop and compute loop-independent computations outside the loop

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
for (Rpp32s sub_col = 0; sub_col < 2; sub_col++)
{
Rpp32s y_idx = (row * 2 + sub_row) * 16 + (col * 2 + sub_col);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it yIdx

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dct_inv_8x8_1d<1>(cb + row * 8);

for(Rpp32s row = 0; row < 8; row++)
dct_fwd_8x8_1d<1>(cr + row * 8); // Row-wise DCT (stride = 1)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont need a row * 8 and a row++.
Just use one variable and one increment and do a += 8

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for(Rpp32s row = 0; row < 8; row++)
dct_inv_8x8_1d<8>(cr + row);
for(Rpp32s row = 0; row < 8; row++)
dct_inv_8x8_1d<1>(cr + row * 8);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment everywhere

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

void process_jpeg_compression_distortion(__m256* pRgb, __m256* pY, __m256* pCb, __m256* pCr,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these functions like this one need more comments to be self-explanatory

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@r-abishek
Copy link
Copy Markdown
Owner

@HazarathKumarM Leave a note when the 10 odd comments above are addressed

x[1] = _mm256_cvtss_f32(_mm256_permutevar8x32_ps(pVecDct[0], _mm256_setr_epi32(1, 1, 1, 1, 1, 1, 1, 1)));
x[2] = _mm256_cvtss_f32(_mm256_permutevar8x32_ps(pVecDct[0], _mm256_setr_epi32(2, 2, 2, 2, 2, 2, 2, 2)));
x[3] = _mm256_cvtss_f32(_mm256_permutevar8x32_ps(pVecDct[0], _mm256_setr_epi32(3, 3, 3, 3, 3, 3, 3, 3)));
x[4] = _mm256_cvtss_f32(_mm256_permutevar8x32_ps(pVecDct[0], _mm256_setr_epi32(4, 4, 4, 4, 4, 4, 4, 4)));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HazarathKumarM We definitely need to take these 'constant register initializations' (at runtime) outside the 4 loops its inside now. Please init them as 8 const __m256 in the either the rpp_cpu_simd_math.hpp or other better location as per new restructure.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, use _mm256_set1_epi32(7) instead of the _mm256_setr_epi32(7, 7, 7, 7, 7, 7, 7, 7)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@HazarathKumarM
Copy link
Copy Markdown
Collaborator Author

@r-abishek I have resolved all the comments you added here and also moved all the coefficient declarations to the top of the file

@r-abishek r-abishek changed the base branch from develop to ar/opt_jpeg_distortion_host March 26, 2025 06:26
@r-abishek r-abishek merged commit 4778243 into r-abishek:ar/opt_jpeg_distortion_host Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants