Skip to content

[G-API]: kmeans() Standard Kernel Implementation#18857

Merged
alalek merged 13 commits intoopencv:masterfrom
OrestChura:oc/kmeans
Nov 30, 2020
Merged

[G-API]: kmeans() Standard Kernel Implementation#18857
alalek merged 13 commits intoopencv:masterfrom
OrestChura:oc/kmeans

Conversation

@OrestChura
Copy link
Copy Markdown
Contributor

These changes:

  • Add kmeans() standard kernel
    • provide API and documentation - 4 overloads:
      • standard GMat - for any dimensionality
      • GMat without bestLabels initialization for convenience
      • GArray<Point2f> - for 2D
      • GArray<Point3f> - for 3D
    • support OCV backend
  • Accuracy tests:
    • for every input - 2 tests
      1. without initializing. In this case, no comparison with cv::kmeans is done as kmeans uses random auto-initialization
      2. with initialization
    • in both cases, only 1 attempt is done as after first attempt kmeans initializes bestLabels randomly

    Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
build_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

Xbuild_image:Custom=ubuntu-openvino-2020.3.0:16.04
Xbuild_image:Custom Win=openvino-2020.3.0
Xbuild_image:Custom Mac=openvino-2020.3.0

test_modules:Custom=gapi
test_modules:Custom Win=gapi
test_modules:Custom Mac=gapi

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
test_opencl:Custom=OFF
test_bigdata:Custom=1
test_filter:Custom=*

 - 4 overloads:
    - standard GMat - for any dimensionality
    - GMat without bestLabels initialization
    - GArray<Point2f> - for 2D
    - GArray<Point3f> - for 3D
 - Accuracy tests:
   - for every input - 2 tests
   1) without initializing. In this case, no comparison with cv::kmeans is done as kmeans uses random auto-initialization
   2) with initialization
   - in both cases, only 1 attempt is done as after first attempt kmeans initializes bestLabels randomly
@OrestChura
Copy link
Copy Markdown
Contributor Author

@AsyaPronina please review

// Checks if the passed mat is a set of n-dimentional points of the given depth
// Returns (quantity, elemSize)
std::tuple<int, int> checkVector(const int chan, const cv::Size &size, const int depth,
const int ddepth)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already saw similar functionality recently.


@dmatveev There is a lot of code going into the public headers:

  • this may break bindings generators because they parse public headers
  • this slowdowns compilation of OpenCV itself and user apps

There is strong recommendation to keep minimal amount of code in public headers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In imgproc.hpp. @OrestChura, can you merge checkVector and isPointsVector?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Alexander, Maxim!
I made attempt to merge them and share the generalized function between gapi/core.hpp & gapi/imgproc.hpp. Put it in gmat.hpp/gmat.cpp

// Checks if the passed mat is a set of n-dimentional points of the given depth
// Returns (quantity, elemSize)
std::tuple<int, int> checkVector(const int chan, const cv::Size &size, const int depth,
const int ddepth)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In imgproc.hpp. @OrestChura, can you merge checkVector and isPointsVector?

 - bestLabels is returned to its original place among parameters
 - checkVector and isPointsVector functions are merged into one, shared between core.hpp & imgproc.hpp by placing it into gmat.hpp (and implementation - to gmat.cpp)
 - typos corrected
 - unified names in tests
 - const added
 - typos
 - fixed the doc note
 - ddepth -> expectedDepth, `< 0 ` -> `== -1`
 - supported: multiple channels, reversed width
 - added test cases for those
 - added notes in docs
 - refactored checkVector to return dimentionality along with quantity
 - makes chackVector smaller and (maybe) clearer
Copy link
Copy Markdown
Contributor

@mpashchenkov mpashchenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it works better than correct.

 - cv::checkVector -> cv::gapi::detail
 - Changed checkVector: returns bool, quantity & dimensionality as references
 - Polishing checkVector
 - FIXME added
@OrestChura OrestChura force-pushed the oc/kmeans branch 2 times, most recently from 00ccc5c to 38952b2 Compare November 26, 2020 18:08
 - checkVector: added overload, separate two different functionalities
 - depth assert - out of the function
@AsyaPronina
Copy link
Copy Markdown
Contributor

Thanks a lot for your efforts, Orest!!

 - quantity -> amount, dimensionality -> dim
 - Fix typos
outMeta(const GMatDesc& in, int K, const GMatDesc& bestLabels, const TermCriteria&, int,
KmeansFlags flags) {
GAPI_Assert(in.depth == CV_32F);
std::vector<int> amount_n_dim = detail::checkVector(in);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmatveev We need new approach to get rid implementation code from public headers.

Keep interface definition only.

This also should get rid several helper functions.

Copy link
Copy Markdown
Contributor Author

@OrestChura OrestChura Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your concerns about implementations in .hpp, will discuss it within the team. I think this has to be solved for all the kernels in our modules. Although, I think this step is not for this PR, now I propose to leave it as is for consistency and then move all the implementations at once.

But what do you mean by helper functions?

Comment on lines +529 to +532
// kmeans sets these labels' sizes when no bestLabels given:
GMatDesc out_labels(CV_32S, 1, Size{1, amount}),
// kmeans always sets these centers' sizes:
centers (CV_32F, 1, Size{dim, K});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use 2 variable's definitions instead of one (for all non-trivial variables).

This is necessary for:

  • 2 dedicated lines rovided by compiler
  • 2 lines in debugger
  • 2 lines in valgrind reports in case of memory issues.

(+ fix indentation of comments)

// kmeans sets these labels' sizes when no bestLabels given:
GMatDesc out_labels(CV_32S, 1, Size{1, amount});
// kmeans always sets these centers' sizes:
GMatDesc centers   (CV_32F, 1, Size{dim, K});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. Fixed

or N columns if there is a single channel. Mat should have @ref CV_32F depth.

@note Although, if GMat with height != 1, width != 1, channels != 1 given as data, n-dimensional
samples are considered given in amount of A, where A = height, n = width * channels.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move @note outside of @params section.

Check the documentation page.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. Changed the notes

Comment on lines +1862 to +1867
@return Compactness measure that is computed as
\f[\sum _i \| \texttt{samples} _i - \texttt{centers} _{ \texttt{labels} _i} \| ^2\f]
after every attempt. The best (minimum) value is chosen and the corresponding labels and the
compactness value are returned by the function.
@return Integer array that stores the cluster indices for every sample.
@return Array of the cluster centers.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@return must be used once.

Move all details/notes before @params.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed returns

}

namespace {
std::vector<int> checkVectorImpl(const int width, const int height, const int chan, const int n)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace {

may be just "static"?

(to avoid unnecessary obfuscated names in debugger)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Thanks! Changed

 - fix docs
 - use 2 variable's definitions instead of one (for all non-trivial variables)
@OrestChura OrestChura requested a review from alalek November 30, 2020 09:45
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alalek alalek merged commit 986ad4f into opencv:master Nov 30, 2020
{
EXPECT_TRUE(compact_gapi == compact_ocv);
EXPECT_TRUE(compareVectorsAbsExact(labels_gapi, labels_ocv));
EXPECT_TRUE(compareVectorsAbsExact(centers_gapi, centers_ocv));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are several reports from coverity tool:

>>>     CID 1470409:    (SWAPPED_ARGUMENTS)
>>>     The positions of arguments in the call to "compareVectorsAbsExact" do not match the ordering of the parameters:
* "centers_gapi" is passed to "outOCV"
* "centers_ocv" is passed to "outGAPI"
1567             EXPECT_TRUE(compareVectorsAbsExact(centers_gapi, centers_ocv));

please take a look

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR: #19044

@OrestChura OrestChura mentioned this pull request Dec 8, 2020
4 tasks
@alalek alalek mentioned this pull request Apr 9, 2021
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
[G-API]: kmeans() Standard Kernel Implementation

* cv::gapi::kmeans kernel implementation
 - 4 overloads:
    - standard GMat - for any dimensionality
    - GMat without bestLabels initialization
    - GArray<Point2f> - for 2D
    - GArray<Point3f> - for 3D
 - Accuracy tests:
   - for every input - 2 tests
   1) without initializing. In this case, no comparison with cv::kmeans is done as kmeans uses random auto-initialization
   2) with initialization
   - in both cases, only 1 attempt is done as after first attempt kmeans initializes bestLabels randomly

* Addressing comments
 - bestLabels is returned to its original place among parameters
 - checkVector and isPointsVector functions are merged into one, shared between core.hpp & imgproc.hpp by placing it into gmat.hpp (and implementation - to gmat.cpp)
 - typos corrected

* addressing comments
 - unified names in tests
 - const added
 - typos

* Addressing comments
 - fixed the doc note
 - ddepth -> expectedDepth, `< 0 ` -> `== -1`

* Fix unsupported cases of input Mat
 - supported: multiple channels, reversed width
 - added test cases for those
 - added notes in docs
 - refactored checkVector to return dimentionality along with quantity

* Addressing comments
 - makes chackVector smaller and (maybe) clearer

* Addressing comments

* Addressing comments
 - cv::checkVector -> cv::gapi::detail

* Addressing comments
 - Changed checkVector: returns bool, quantity & dimensionality as references

* Addressing comments
 - Polishing checkVector
 - FIXME added

* Addressing discussion
 - checkVector: added overload, separate two different functionalities
 - depth assert - out of the function

* Addressing comments
 - quantity -> amount, dimensionality -> dim
 - Fix typos

* Addressing comments
 - fix docs
 - use 2 variable's definitions instead of one (for all non-trivial variables)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants