Skip to content

GPU memory still allocated when fastNlMeansDenoising() returns #17789

@angelo-peronio

Description

@angelo-peronio
System information (version)
  • OpenCV => 4.1.1 4.3 (installed using vcpkg)
  • Operating System / Platform => Windows 10 64 Bit
  • Compiler => Visual Studio 2019 version 16.6.5
  • GPU => NVIDIA GeForce GTX 1050 with 4 GiB dedicated video memory
  • NVIDIA driver version => 446.14
Detailed description

I am running cv::fastNlMeansDenoising() on the GPU via T-API / UMats / OpenCL. I am not able to loop over a series of images of the same size: after a few iterations the calculation falls back on the CPU with the error message CL_MEM_OBJECT_ALLOCATION_FAILURE (-4). It seems that when cv::fastNlMeansDenoising() returns, some GPU memory has not been deallocated yet.

Steps to reproduce

Consider the following code:

#include <opencv2/opencv.hpp>
#include <iostream>
#include <chrono>
#include <thread>

using namespace std::chrono_literals;
using namespace std::string_literals;

void loopNlMeans()
{
    auto const imageWidth = 3500;
    auto const h = 14.0;
    auto const kernelSizeParam = 15;
    auto const nLoops = 8;

    auto image = cv::UMat::zeros(imageWidth, imageWidth, CV_8UC1);
    auto dst = cv::UMat::zeros(imageWidth, imageWidth, CV_8UC1);
    cv::theRNG().state = 7;
    cv::randu(image, 0, 256);

    for (int i = 0; i < nLoops; ++i)
    {
        std::cout << "Loop "s << i << "\n"s;
        cv::fastNlMeansDenoising(image, dst, h, kernelSizeParam, kernelSizeParam);
    }
}

int main()
{
    loopNlMeans();
}

I would expect it to run completely on the GPU, whereas I get the following output:

Loop 0
Loop 1
Loop 2
Loop 3
OpenCL error CL_MEM_OBJECT_ALLOCATION_FAILURE (-4) during call: clEnqueueNDRangeKernel('fastNlMeansDenoising', dims=2, globalsize=28160x110x1, localsize=256x1x1) sync=false

If I add std::this_thread::sleep_for(1s); inside the for loop, the calculation goes on for more iterations before falling back to the CPU:

Loop 0
Loop 1
Loop 2
Loop 3
Loop 4
Loop 5
Loop 6
OpenCL error CL_MEM_OBJECT_ALLOCATION_FAILURE (-4) during call: clEnqueueNDRangeKernel('fastNlMeansDenoising', dims=2, globalsize=28160x110x1, localsize=256x1x1) sync=false
Issue submission checklist
  • I report the issue, it's not a question
  • I checked the problem with documentation, FAQ, open issues,
    answers.opencv.org, Stack Overflow, etc and have not found solution
  • I updated to latest OpenCV version and the issue is still there
  • There is reproducer code and related data files: videos, images, onnx, etc

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions