-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
GPU memory still allocated when fastNlMeansDenoising() returns #17789
Copy link
Copy link
Closed
Labels
category: 3rdpartycategory: oclcategory: photoquestion (invalid tracker)ask questions and other "no action" items here: https://forum.opencv.orgask questions and other "no action" items here: https://forum.opencv.org
Description
System information (version)
- OpenCV =>
4.1.14.3 (installed usingvcpkg) - Operating System / Platform => Windows 10 64 Bit
- Compiler => Visual Studio 2019 version 16.6.5
- GPU => NVIDIA GeForce GTX 1050 with 4 GiB dedicated video memory
- NVIDIA driver version => 446.14
Detailed description
I am running cv::fastNlMeansDenoising() on the GPU via T-API / UMats / OpenCL. I am not able to loop over a series of images of the same size: after a few iterations the calculation falls back on the CPU with the error message CL_MEM_OBJECT_ALLOCATION_FAILURE (-4). It seems that when cv::fastNlMeansDenoising() returns, some GPU memory has not been deallocated yet.
Steps to reproduce
Consider the following code:
#include <opencv2/opencv.hpp>
#include <iostream>
#include <chrono>
#include <thread>
using namespace std::chrono_literals;
using namespace std::string_literals;
void loopNlMeans()
{
auto const imageWidth = 3500;
auto const h = 14.0;
auto const kernelSizeParam = 15;
auto const nLoops = 8;
auto image = cv::UMat::zeros(imageWidth, imageWidth, CV_8UC1);
auto dst = cv::UMat::zeros(imageWidth, imageWidth, CV_8UC1);
cv::theRNG().state = 7;
cv::randu(image, 0, 256);
for (int i = 0; i < nLoops; ++i)
{
std::cout << "Loop "s << i << "\n"s;
cv::fastNlMeansDenoising(image, dst, h, kernelSizeParam, kernelSizeParam);
}
}
int main()
{
loopNlMeans();
}I would expect it to run completely on the GPU, whereas I get the following output:
Loop 0
Loop 1
Loop 2
Loop 3
OpenCL error CL_MEM_OBJECT_ALLOCATION_FAILURE (-4) during call: clEnqueueNDRangeKernel('fastNlMeansDenoising', dims=2, globalsize=28160x110x1, localsize=256x1x1) sync=falseIf I add std::this_thread::sleep_for(1s); inside the for loop, the calculation goes on for more iterations before falling back to the CPU:
Loop 0
Loop 1
Loop 2
Loop 3
Loop 4
Loop 5
Loop 6
OpenCL error CL_MEM_OBJECT_ALLOCATION_FAILURE (-4) during call: clEnqueueNDRangeKernel('fastNlMeansDenoising', dims=2, globalsize=28160x110x1, localsize=256x1x1) sync=falseIssue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues,
answers.opencv.org, Stack Overflow, etc and have not found solution - I updated to latest OpenCV version and the issue is still there
- There is reproducer code and related data files: videos, images, onnx, etc
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
category: 3rdpartycategory: oclcategory: photoquestion (invalid tracker)ask questions and other "no action" items here: https://forum.opencv.orgask questions and other "no action" items here: https://forum.opencv.org