Skip to content

PERF: Make ImageGridSampler multi-threaded#1017

Merged
N-Dekker merged 1 commit intomainfrom
GridSampler-MultiThreaded
Jan 16, 2024
Merged

PERF: Make ImageGridSampler multi-threaded#1017
N-Dekker merged 1 commit intomainfrom
GridSampler-MultiThreaded

Conversation

@N-Dekker
Copy link
Copy Markdown
Member

@N-Dekker N-Dekker commented Jan 7, 2024

Initial benchmark (4096x4096 image, 2x2 grid spacing, no mask) suggests a performance improvement by a factor of ~2 (VS2019 Release build, 6 cores, 12 logical processors, 48 work units).

Benchmark code:

  using PixelType = int;
  enum
  {
    // Or in 3D:
    //    ImageSizeValue = 200,
    //    Dimension = 3U
    ImageSizeValue = 4096U,
    Dimension = 2U
  };
  using ImageType = itk::Image<PixelType, Dimension>;
  using SamplerType = itk::ImageGridSampler<ImageType>;

  constexpr auto imageSize = ImageType::SizeType::Filled(ImageSizeValue);
  const auto     image = CreateImage<PixelType>(imageSize);

  const auto generateSamples = [image](const bool useMultiThread) {
    elx::DefaultConstruct<SamplerType> sampler{};
    sampler.SetUseMultiThread(useMultiThread);
    sampler.SetSampleGridSpacing(itk::MakeFilled<SamplerType::SampleGridSpacingType>(2));
    sampler.SetInput(image);

    using namespace std::chrono;
    const auto timePoint = high_resolution_clock::now();
    sampler.Update();
    std::cout << (useMultiThread ? "MT" : "ST")
              << " Duration: " << duration_cast<duration<double>>(high_resolution_clock::now() - timePoint).count()
              << " seconds (" << sampler.GetNumberOfWorkUnits() << " work units)" << std::endl;
    return std::move(DerefRawPointer(sampler.GetOutput()).CastToSTLContainer());
  };

  for (int i{}; i < 5; ++i)
  {
    const auto multiThreadedlyGeneratedSamples = generateSamples(true);
    const auto singleThreadedlyGeneratedSamples = generateSamples(false);
    EXPECT_EQ(multiThreadedlyGeneratedSamples, singleThreadedlyGeneratedSamples);
  }

Output (2D, 4096x4096):

MT Duration: 0.0322745 seconds (48 work units)
ST Duration: 0.0521161 seconds (48 work units)
MT Duration: 0.0334572 seconds (48 work units)
ST Duration: 0.0613757 seconds (48 work units)
MT Duration: 0.0318485 seconds (48 work units)
ST Duration: 0.0530818 seconds (48 work units)
MT Duration: 0.029546 seconds (48 work units)
ST Duration: 0.0480266 seconds (48 work units)
MT Duration: 0.0299087 seconds (48 work units)
ST Duration: 0.0483876 seconds (48 work units)

Output for 3D (200x200x200):

MT Duration: 0.0032904 seconds (48 work units)
ST Duration: 0.0042067 seconds (48 work units)
MT Duration: 0.0020699 seconds (48 work units)
ST Duration: 0.0033008 seconds (48 work units)
MT Duration: 0.0022022 seconds (48 work units)
ST Duration: 0.0032825 seconds (48 work units)
MT Duration: 0.0017621 seconds (48 work units)
ST Duration: 0.0032061 seconds (48 work units)
MT Duration: 0.0017278 seconds (48 work units)
ST Duration: 0.0032169 seconds (48 work units)

@N-Dekker
Copy link
Copy Markdown
Member Author

Update: just did a performance benchmark with mask:

  using PixelType = int;
  enum
  {
    //    ImageSizeValue = 200U,
    //    Dimension = 3U
    ImageSizeValue = 4096U,
    Dimension = 2U
  };
  using ImageType = itk::Image<PixelType>;
  using SamplerType = itk::ImageGridSampler<ImageType>;

  constexpr auto imageSize = ImageType::SizeType::Filled(ImageSizeValue);
  const auto     image = CreateImage<PixelType>(imageSize);

  using MaskSpatialObjectType = itk::ImageMaskSpatialObject<Dimension>;
  const auto maskImage = CreateImage<MaskSpatialObjectType::PixelType>(ImageDomain(*image));

  unsigned int i{};

  for (std::uint8_t & maskValue : itk::ImageBufferRange(*maskImage))
  {
    maskValue = (i % 3U == 0) ? std::uint8_t{ 1 } : std::uint8_t{ 0 };
    ++i;
  }

  const auto maskSpatialObject = MaskSpatialObjectType::New();
  maskSpatialObject->SetImage(maskImage);
  maskSpatialObject->Update();

  const auto generateSamples = [image, maskSpatialObject](const bool useMultiThread) {
    elx::DefaultConstruct<SamplerType> sampler{};

    sampler.SetInput(image);

    sampler.SetMask(maskSpatialObject);
    sampler.SetUseMultiThread(useMultiThread);

    sampler.SetSampleGridSpacing(itk::MakeFilled<SamplerType::SampleGridSpacingType>(2));
    sampler.SetInput(image);
    using namespace std::chrono;
    const auto timePoint = high_resolution_clock::now();
    sampler.Update();
    std::cout << (useMultiThread ? "MT" : "ST")
              << " Duration: " << duration_cast<duration<double>>(high_resolution_clock::now() - timePoint).count()
              << " seconds" << std::endl;
    return std::move(DerefRawPointer(sampler.GetOutput()).CastToSTLContainer());
  };

  for (int i{}; i < 5; ++i)
  {
    generateSamples(true);
    generateSamples(false);
  }

Output (VS2019 Release):

MT Duration: 0.0521832 seconds
ST Duration: 0.190727 seconds
MT Duration: 0.0584814 seconds
ST Duration: 0.205479 seconds
MT Duration: 0.0795572 seconds
ST Duration: 0.172589 seconds
MT Duration: 0.0609036 seconds
ST Duration: 0.199334 seconds
MT Duration: 0.0654479 seconds
ST Duration: 0.171322 seconds

So this case also supports using multi-threading 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant