dnn(OpenCL): fix automatic globalsize adjusting#20655
dnn(OpenCL): fix automatic globalsize adjusting#20655opencv-pushbot merged 1 commit intoopencv:3.4from
Conversation
- if kernel code doesn't support that
|
👍 |
| * @param sync specify whether to wait for OpenCL computation to finish before return. | ||
| * @param q command queue | ||
| */ | ||
| bool run_(int dims, size_t globalsize[], size_t localsize[], bool sync, const Queue& q=Queue()); |
There was a problem hiding this comment.
Can I be in the loop on these quick changes you are making? I don't see anyone reviewing them before you merge.
A strange and difficult to understand API like run vs run_ isn't readable or maintainable.
Instead, if you want a new api, then something like the following is clean and readable. And on the next ABI change can be condensed into a single API
bool run(int dims, size_t globalsize[], size_t localsize[], bool sync, const Queue& q=Queue());
bool run(int dims, size_t globalsize[], size_t localsize[], bool sync, const Queue& q=Queue(), const bool tuneGlobalSize = true);
cc @mshabunin
There was a problem hiding this comment.
@diablodale , IMO this name is fine, it exposes raw private method and often underscores are used to mark such private-public functions (e.g. in Python). I believe this patch is not meant to be final, either dnn kernels should be updated, either globalsize tuning mechanism should be improved (like you suggested).
This series of patches is needed to enable new CI machine which is blocked by hanging and crashing dnn and ocl tests. We will take in account any comments and patches by the community even after this PR is merged.
There was a problem hiding this comment.
This is a public method. Not private! It introduces an ABI promise across 3.x, 4.x, and ongoing maintenance. The ocl.cpp file is quite hectic and the patchwork of changes made over the years continues to grow with less-long-term hacks like this.
DNN is not the core of OpenCV. Meaning it is built on top of the core parts. In OpenCV terms, it is built on top of the core module. The core module can present to modules like DNN a set of public services.
The correction of this API is trivial to make. And provides consistency, legacy support, and ABI changing future support. There is no known downside...compared to this run_ thing.
I found this bug and design flaw. It has existed for ~8 years. I recommend those invested into a solution (like us) not move too quickly. That CI machine can wait a week.
There was a problem hiding this comment.
I through about extra bool parameter.
However it is better to avoid multiple bool parameters in any function / method. (true, false) is confusing and non-readable API.
run() vs run_() usually means the the second one do "less" work and the first one is an "extended" variant.
In general, for properly designed OpenCL kernels we should not use run_() at all. But there are several external contributions which doesn't follow the spirit of OpenCV OpenCL computations and we have such bugs.
Suggestions are welcome for the name of new method.
Alternatives are:
enqueue(as reference toclEnqueueNDRangeKernel, but there is "sync" parameter - so this name is not accurate). Also there would be questions about difference ofrun()andenqueue()and they may be improperly used.runNoTuneGlobalSize(long name, but do less job). It is better to name existed behavior ofrun()asrunWithTuneGlobalSize(), but it is too late.- something else?
BTW, existed "adjusting" empirics are buggy too. It provides weird results for some cases.
There was a problem hiding this comment.
I follow your thinking on multiple booleans. Yet, that is far less concern for me than overall readibility.
https://google.github.io/styleguide/cppguide.html#General_Naming_Rules
Optimize for readability using names that would be clear even to people on a different team. Use names that describe the purpose or intent of the object. Do not worry about saving horizontal space as it is far more important to make your code immediately understandable by a new reader. Minimize the use of abbreviations that would likely be unknown to someone outside your project (especially acronyms and initialisms). Do not abbreviate by deleting letters within a word.
When a new API is required, then please it needs to be readable. A developers should be able to read the code and know what it does.
- The existing run apis are fine. I do not see value in changing the existing "run" family of APIs.
Kernel::run,Kernel::runTask,Kernel::runProfiling()are all readable and provide distinction between each. Kernel::run()is great because it is the root, default, basic functionality. It is short, clear, and (I think) good. It reads in english exact what it does "kernel run" aka "run a kernel".- I recommend follow the same pattern for a new API. It should start with "run" and then provide distinction.
- Your second idea
Kernel::runNoTuneGlobalSize. works for me. Also the shorterKernel::runNoTune. The second I think works because there is no other tuning that is occuring. - Adding the 4th Kernel "run" api demonstrates API proliferation. This is concerning but also a choice already made two times before with
runProfilingandrunTask.
// unreadable example
//...
else
{
if (haveSrc2)
k.args(srcarg, src.cols, (int)src.total(), ngroups, dbarg, src2arg);
else
k.args(srcarg, src.cols, (int)src.total(), ngroups, dbarg);
}
size_t globalsize = ngroups * wgs;
if (k.run_(1, &globalsize, &wgs, true))
{
typedef Scalar (*part_sum)(Mat m);
part_sum funcs[3] = { ocl_part_sum<int>, ocl_part_sum<float>, ocl_part_sum<double> },
func = funcs[ddepth - CV_32S];
Mat mres = db.getMat(ACCESS_READ);
if (calc2)
const_cast<Scalar &>(res2) = func(mres.colRange(ngroups, dbsize));
res = func(mres.colRange(0, ngroups));
return true;
}// good readable example
// ...
else
{
if (haveSrc2)
k.args(srcarg, src.cols, (int)src.total(), ngroups, dbarg, src2arg);
else
k.args(srcarg, src.cols, (int)src.total(), ngroups, dbarg);
}
size_t globalsize = ngroups * wgs;
if (k.runNoTuneGlobalSize(1, &globalsize, &wgs, true))
{
typedef Scalar (*part_sum)(Mat m);
part_sum funcs[3] = { ocl_part_sum<int>, ocl_part_sum<float>, ocl_part_sum<double> },
func = funcs[ddepth - CV_32S];
Mat mres = db.getMat(ACCESS_READ);
if (calc2)
const_cast<Scalar &>(res2) = func(mres.colRange(ngroups, dbsize));
res = func(mres.colRange(0, ngroups));
return true;
}
relates #20615