Skip to content

RPP F32 QA : Review Comments Resolution#431

Merged
r-abishek merged 5 commits intor-abishek:ar/test_suite_upgrade_9_f32qafrom
HazarathKumarM:hk/f32_QA-pr
May 14, 2025
Merged

RPP F32 QA : Review Comments Resolution#431
r-abishek merged 5 commits intor-abishek:ar/test_suite_upgrade_9_f32qafrom
HazarathKumarM:hk/f32_QA-pr

Conversation

@HazarathKumarM
Copy link
Copy Markdown
Collaborator

  • Resolved review comments added by Rajy
  • Modified SIMD print functions to use union which help us eliminating an extra storeu_ps instruction

@r-abishek r-abishek requested a review from Copilot May 8, 2025 01:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR resolves review comments by modifying SIMD print functions to use union-based conversions, eliminating an extra storeu_ps instruction.

  • Replaced temporary arrays with unions for SIMD print functions
  • Consolidated and standardized the conversions across various data types
Comments suppressed due to low confidence (1)

src/include/common/cpu/rpp_cpu_simd_load_store.hpp:295

  • [nitpick] Consider using the loop variable name 'ct' instead of 'i' for consistency with the other print functions.
for (int i = 0; i < 8; ++i)

Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HazarathKumarM Pls address comments

union {
__m128i vec;
char arr[16];
} u;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do the union initialization inside every function. Take it out globally.
Next, we should ideally pass pointer to __m128i (call by reference). So receive it in __m128i* vPrintArrayPtr.
A global u is already defined, so we just use that here next

u.vec = *(vPrintArrayPtr);

This way, two copies are avoided:

  1. The call by value copy when you call rpp_mm_print_epi8(__m128i vPrintArray)
  2. The copy at u.vec = vPrintArray;

Pls check u.arr should also be accessible after that.

@r-abishek r-abishek added the enhancement New feature or request label May 8, 2025
@HazarathKumarM
Copy link
Copy Markdown
Collaborator Author

  • pixel check changes are pending on this PR

@HazarathKumarM
Copy link
Copy Markdown
Collaborator Author

image


// Union for handling 128-bit SIMD data (SSE).
union RppSIMD128 {
__m128i m128i_val;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open braces in next line as per RPP standard everywhere

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

char printArray[16];
_mm_storeu_si128((__m128i *)printArray, vPrintArray);
inline void rpp_mm_print_epi8(__m128i *v) {
RppSIMD128 u;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment. Open braces in next line as per RPP standard everywhere

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Union for handling 128-bit SIMD data (SSE).
union RppSIMD128 {
__m128i m128i_val;
__m128 m128_val;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont use snake_case here too, call it m128iVal and m128Val

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HazarathKumarM Please address comments on following similar rpp standards

@r-abishek r-abishek requested a review from Copilot May 14, 2025 04:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR resolves review comments from Rajy and refines SIMD print functions by introducing union-based types, while removing explicit boundary check calls in several tensor processing kernels.

  • Removed calls to rpp_pixel_check_0to1 in threshold, crop, crop_and_patch, and blend kernels
  • Revised SIMD print helper functions to use union-based data for improved performance and clarity

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/modules/tensor/cpu/kernel/threshold.cpp Removed boundary check calls; relies on compute_threshold logic
src/modules/tensor/cpu/kernel/crop_and_patch.cpp Removed boundary check calls in pixel processing
src/modules/tensor/cpu/kernel/crop.cpp Removed boundary check calls in pixel processing
src/modules/tensor/cpu/kernel/blend.cpp Removed boundary check calls during alpha-blending
src/include/common/cpu/rpp_cpu_simd_load_store.hpp Introduced union types and refactored SIMD print helpers
Comments suppressed due to low confidence (4)

src/modules/tensor/cpu/kernel/threshold.cpp:434

  • Removal of boundary checks in threshold functions should be validated; ensure that compute_threshold_* functions or another mechanism guarantees valid pixel ranges. Consider adding a clarifying comment about how boundary conditions are now enforced.
rpp_pixel_check_0to1(p, 1);

src/modules/tensor/cpu/kernel/crop_and_patch.cpp:448

  • With the boundary check removed, verify that pixel values remain within valid ranges as expected downstream. A brief comment explaining this design decision might improve code clarity.
rpp_pixel_check_0to1(p, 3);

src/modules/tensor/cpu/kernel/crop.cpp:231

  • Ensure that the removal of this boundary check does not lead to invalid pixel values in subsequent processing; document the alternative validation mechanism if one exists.
rpp_pixel_check_0to1(p, 3);

src/modules/tensor/cpu/kernel/blend.cpp:341

  • Verify that eliminating the boundary check in the blend function is safe and that the blending operation correctly constrains the pixel values. Consider documenting this behavior for future maintainers.
rpp_pixel_check_0to1(p1, 3);

const __m256i avx_pxMaskG = _mm256_setr_epi8(0x80, 1, 0x80, 0x80, 4, 0x80, 0x80, 7, 0x80, 0x80, 10, 0x80, 0x80, 13, 0x80, 0x80, 16, 0x80, 0x80, 19, 0x80, 0x80, 22, 0x80, 0x80, 25, 0x80, 0x80, 28, 0x80, 0x80, 0x80);
const __m256i avx_pxMaskB = _mm256_setr_epi8(0x80, 0x80, 2, 0x80, 0x80, 5, 0x80, 0x80, 8, 0x80, 0x80, 11, 0x80, 0x80, 14, 0x80, 0x80, 17, 0x80, 0x80, 20, 0x80, 0x80, 23, 0x80, 0x80, 26, 0x80, 0x80, 29, 0x80, 0x80);

// Union for handling 128-bit SIMD data (SSE).
Copy link

Copilot AI May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] It may be helpful to add a brief explanation on why the union-based SIMD structures were introduced and how they contribute to eliminating extra store instructions, to aid future maintenance.

Copilot uses AI. Check for mistakes.
@HazarathKumarM
Copy link
Copy Markdown
Collaborator Author

@HazarathKumarM Please address comments on following similar rpp standards

all review comments are addressed now

@r-abishek r-abishek merged commit 491eeac into r-abishek:ar/test_suite_upgrade_9_f32qa May 14, 2025
kiritigowda added a commit that referenced this pull request Jun 3, 2025
* Add F32 QA Golden outputs

* modify Doxygen comments

* modify range check functions

* RPP F32 QA : Review Comments Resolution (#431)

* Modified SIMD print functions to use union

* remove redundant unions in print functions

* removed pixel checks

* remove pixel check in threshold

* resolve review comments

---------

Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: HazarathKumarM <119284987+HazarathKumarM@users.noreply.github.com>
Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>
ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 19, 2025
…s/sphinx (r-abishek#431)

Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.7.1 to 1.7.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.7.1...v1.7.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 19, 2025
* Add F32 QA Golden outputs

* modify Doxygen comments

* modify range check functions

* RPP F32 QA : Review Comments Resolution (r-abishek#431)

* Modified SIMD print functions to use union

* remove redundant unions in print functions

* removed pixel checks

* remove pixel check in threshold

* resolve review comments

---------

Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: HazarathKumarM <119284987+HazarathKumarM@users.noreply.github.com>
Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants