RPP Jpeg Compression distortion Tensor Support#408
RPP Jpeg Compression distortion Tensor Support#408r-abishek merged 19 commits intor-abishek:ar/opt_jpeg_distortion_hostfrom
Conversation
Add inverse DCT function Add ycbCr to RGB conversion Add test suite support
r-abishek
left a comment
There was a problem hiding this comment.
@HazarathKumarM Please address first round comments
|
|
||
| inline void rpp_load24_f16pkd3_to_f32pln3_avx(Rpp16f *srcPtr, __m256 *p) | ||
| { | ||
| __m128 p128[8]; |
There was a problem hiding this comment.
Reason for removal of this helper?
There was a problem hiding this comment.
reverted back
|
|
||
| inline void rpp_load24_f16pln3_to_f32pln3_avx(Rpp16f *srcPtrR, Rpp16f *srcPtrG, Rpp16f *srcPtrB, __m256 *p) | ||
| { | ||
| p[0] = _mm256_cvtph_ps(_mm_castps_si128(_mm_loadu_ps(reinterpret_cast<Rpp32f *>(srcPtrR)))); |
utilities/test_suite/common.py
Outdated
| 91: ["tensor_stddev", "HOST", "HIP"], | ||
| 92: ["slice", "HOST", "HIP"] | ||
| 92: ["slice", "HOST", "HIP"], | ||
| 93: ["jpeg_compression_distortion", "HOST", "HIP"] |
There was a problem hiding this comment.
As of now, this PR seems to only have HOST. If HIP is a separate PR, remove here.
| using namespace std; | ||
| #define MAX_IMAGE_DUMP 20 | ||
| #define MAX_BATCH_SIZE 512 | ||
| #define GOLDEN_OUTPUT_MAX_HEIGHT 150 // Golden outputs are generated with MAX_HEIGHT set to 150. Changing this constant will result in QA test failures |
There was a problem hiding this comment.
got added in the merge
| * It introduces artifacts seen in lossy JPEG compression by converting the image to the frequency domain using the Discrete Cosine Transform (DCT), | ||
| * applying quantization, and then reconstructing the image using the inverse DCT (IDCT). | ||
| * This process introduces compression-related distortions similar to those in JPEG images. | ||
| * \param [in] srcPtr source tensor in HOST memory |
There was a problem hiding this comment.
Pls add sample input and outputs like we do above
* \image html lens_img640x480.png Sample Input
* \image html geometric_augmentations_lens_correction_img_640x480.png Sample Output
| x[0] = _mm256_setr_ps(y0, y1, y2, y3, y4, y5, y6, y7); | ||
| } | ||
|
|
||
| void quantizeBlockAVX2(__m256 *p, float quantTable[8][8]) |
There was a problem hiding this comment.
Please use Rpp32f, Rpp32s or Rpp32u etc everywhere. Multiple instances.
Only fine to use int for loop indexes and int/float inside hip kernels for now.
There was a problem hiding this comment.
And this quantTable should be call by reference too
|
|
||
| void quantizeBlockAVX2(__m256 *p, float quantTable[8][8]) | ||
| { | ||
| float qscale = GetQualityFactorScale(50); |
There was a problem hiding this comment.
function helpers in snake_case. Please adhere same standards!
|
|
||
| void quantizeBlockAVX2(__m256 *p, float quantTable[8][8]) | ||
| { | ||
| float qscale = GetQualityFactorScale(50); |
There was a problem hiding this comment.
Call this just qualityFactor instead of qscale. Same inside the GetQualityFactorScale() function
| void quantizeBlockAVX2(__m256 *p, float quantTable[8][8]) | ||
| { | ||
| float qscale = GetQualityFactorScale(50); | ||
| __m256 scale = _mm256_set1_ps(qscale); |
There was a problem hiding this comment.
Call this pQualityFactor to indicate vector float variable
|
|
||
| // Convert to Cb/Cr | ||
| int chromaIdx = (row / 2) * 8 + (col / 2); | ||
| cb[chromaIdx] = std::clamp((-0.168736f * avgR - 0.331264f * avgG + 0.5f * avgB + 128.0f), 0.0f, 255.0f) - 128.0f; |
There was a problem hiding this comment.
Please add parentheses for readability for every other instance without.
r-abishek
left a comment
There was a problem hiding this comment.
@HazarathKumarM Few more comments
| for (Rpp32s col = 0; col < 8; col++) | ||
| { | ||
| Rpp32s idx = row * stride + col; | ||
| Rpp32f Qcoeff = quantTable[row * 8 + col] * qualityFactor; // Directly accessing 1D array |
There was a problem hiding this comment.
Please change to camelCase -> qCoeff and not Qcoeff
Write the loop as for (Rpp32s row = 0, rowTimes8 = 0; row < 8; row++, rowTimes8 += 8)
Please take a look at such optimizations too
| { | ||
| for (Rpp32s col = 0; col < 8; col++) | ||
| { | ||
| Rpp32s idx = row * stride + col; |
There was a problem hiding this comment.
Please say something like Rpp32s rowIdx = row * stride; outside this loop and dont repeat constant multiplies inside the col loop
|
|
||
| x1 = dctNormFactor * (dctCoeff1 * tmp4 - dctCoeff3 * tmp5 + dctCoeff4 * tmp6 - dctCoeff6 * tmp7); | ||
| x3 = dctNormFactor * (dctCoeff3 * tmp4 + dctCoeff6 * tmp5 - dctCoeff1 * tmp6 + dctCoeff4 * tmp7); | ||
| x5 = dctNormFactor * (dctCoeff4 * tmp4 + dctCoeff1 * tmp5 + dctCoeff6 * tmp6 - dctCoeff3 * tmp7); |
There was a problem hiding this comment.
Please use parentheses every time there is such compute for readability.
There was a problem hiding this comment.
Always for all instances:
x1 = dctNormFactor * ((dctCoeff1 * tmp4) - (dctCoeff3 * tmp5) + (dctCoeff4 * tmp6) - (dctCoeff6 * tmp7));
|
|
||
| temp[0] = x[0] + x[7]; | ||
| temp[1] = x[1] + x[6]; | ||
| temp[2] = x[2] + x[5]; |
There was a problem hiding this comment.
We use temp as an array here but not for the two functions above. Lets use an array everywhere
| if constexpr (std::is_same<T, Rpp32f>::value || std::is_same<T, Rpp16f>::value) | ||
| { | ||
| r[row * 16 + col] = static_cast<Rpp32f>(srcPtrR[idx]) * 255.0f; | ||
| g[row * 16 + col] = static_cast<Rpp32f>(srcPtrG[idx]) * 255.0f; |
There was a problem hiding this comment.
All the row * 16 and the row * hStride are constants for the second inner loop. Can be taken out please!
| // Process 2x2 Y pixels for each Cb/Cr pair | ||
| for (Rpp32s sub_row = 0; sub_row < 2; sub_row++) | ||
| { | ||
| for (Rpp32s sub_col = 0; sub_col < 2; sub_col++) |
There was a problem hiding this comment.
Consistently use camelCase for variables
| Rpp32s dstIdx = (row * 2 + sub_row) * hStride + (col * 2 + sub_col) * wStride; | ||
|
|
||
| // Prevent out-of-bounds access | ||
| if ((row * 2 + sub_row) >= colLimit || (col * 2 + sub_col) >= rowLimit) |
There was a problem hiding this comment.
Again row * 2 + subRow is computed multiple times inside the loop while its a constant. Please take it out of the loop and compute loop-independent computations outside the loop
| { | ||
| for (Rpp32s sub_col = 0; sub_col < 2; sub_col++) | ||
| { | ||
| Rpp32s y_idx = (row * 2 + sub_row) * 16 + (col * 2 + sub_col); |
| dct_inv_8x8_1d<1>(cb + row * 8); | ||
|
|
||
| for(Rpp32s row = 0; row < 8; row++) | ||
| dct_fwd_8x8_1d<1>(cr + row * 8); // Row-wise DCT (stride = 1) |
There was a problem hiding this comment.
We dont need a row * 8 and a row++.
Just use one variable and one increment and do a += 8
| for(Rpp32s row = 0; row < 8; row++) | ||
| dct_inv_8x8_1d<8>(cr + row); | ||
| for(Rpp32s row = 0; row < 8; row++) | ||
| dct_inv_8x8_1d<1>(cr + row * 8); |
| } | ||
| } | ||
|
|
||
| void process_jpeg_compression_distortion(__m256* pRgb, __m256* pY, __m256* pCb, __m256* pCr, |
There was a problem hiding this comment.
Some of these functions like this one need more comments to be self-explanatory
|
@HazarathKumarM Leave a note when the 10 odd comments above are addressed |
| x[1] = _mm256_cvtss_f32(_mm256_permutevar8x32_ps(pVecDct[0], _mm256_setr_epi32(1, 1, 1, 1, 1, 1, 1, 1))); | ||
| x[2] = _mm256_cvtss_f32(_mm256_permutevar8x32_ps(pVecDct[0], _mm256_setr_epi32(2, 2, 2, 2, 2, 2, 2, 2))); | ||
| x[3] = _mm256_cvtss_f32(_mm256_permutevar8x32_ps(pVecDct[0], _mm256_setr_epi32(3, 3, 3, 3, 3, 3, 3, 3))); | ||
| x[4] = _mm256_cvtss_f32(_mm256_permutevar8x32_ps(pVecDct[0], _mm256_setr_epi32(4, 4, 4, 4, 4, 4, 4, 4))); |
There was a problem hiding this comment.
@HazarathKumarM We definitely need to take these 'constant register initializations' (at runtime) outside the 4 loops its inside now. Please init them as 8 const __m256 in the either the rpp_cpu_simd_math.hpp or other better location as per new restructure.
There was a problem hiding this comment.
Also, use _mm256_set1_epi32(7) instead of the _mm256_setr_epi32(7, 7, 7, 7, 7, 7, 7, 7)
|
@r-abishek I have resolved all the comments you added here and also moved all the coefficient declarations to the top of the file |
No description provided.