Fix memory issues in cuIO due to removal of memory padding by ttnghia · Pull Request #13586 · rapidsai/cudf

ttnghia · 2023-06-16T00:51:12Z

After rmm removed memory padding (rapidsai/rmm#1278), some of cuIO code started to have out-of-bound access issues because many of its compute kernels shift the input pointers back and forth to satisfy some alignment.

This adds back padding to various memory buffers so the buffers now will have some extra space enough for such shifting.

With this fix, the reported issues (#13567, #13571, #13570) no longer show up.

Closes:

ttnghia · 2023-06-16T00:52:17Z

@vuule Please also check the entire codebase more carefully. I tried to find as many related buffers as possible but I may still miss some.

ttnghia · 2023-06-16T00:58:42Z

I just realize that there is a duplicate fix for Parquet (#13585), which implements a different approach than this: adding 8 bytes of padding to the buffers all the times, instead of align_up to the nearest multiple of 8.

cpp/src/io/parquet/writer_impl.cu

etseidl · 2023-06-16T22:32:04Z

I just realize that there is a duplicate fix for Parquet (#13585), which implements a different approach than this: adding 8 bytes of padding to the buffers all the times, instead of align_up to the nearest multiple of 8.

The approach here is superior so I closed #13585.

karthikeyann · 2023-06-19T16:25:21Z

cpp/include/cudf/detail/utilities/alignment.hpp

+ * @return The aligned value
+ */
+inline std::size_t align_up(std::size_t value, std::size_t alignment)
+{


nit:
If alignment is going to be compile-time constant, it could be a template parameter and a static_assert could be used to enforce it as power of 2 during compilation.
If not, an assert statement to check for power of 2 would be helpful here for debugging purposes.

harrism

Most of the use cases here are constructing a device_buffer with padding (not alignment, as the comments say). I suggest creating a factory function make_padded_device_buffer() and replace calls to

rmm::device_buffer buf(cudf::detail::align_up(size, 8 /* alignment */), stream);

with

auto buf = make_padded_device_buffer(size, padding, stream);

You may alternatively want to create a globally available memory_resource adaptor padding_adaptor which wraps the default memory resource, and that make_padded_device_buffer uses by default (and hence the padding argument can be optional.)

harrism · 2023-06-20T00:56:09Z

cpp/src/io/avro/reader_impl.cu


-    rmm::device_buffer decomp_block_data(uncomp_size, stream);
+    // Buffer needs to be padded as the compute kernels require aligned data pointers.
+    rmm::device_buffer decomp_block_data(cudf::detail::align_up(uncomp_size, 8 /*alignment*/),


This and all the other device_buffer use cases are padding out the allocation, not changing its alignment. This might lead to confusion. See my main review comment.

How did we know that 8 is the right amount of padding? Also, can we use a constant for this rather than hardcoding the value 8 many times?

I believe that @vuule has some idea about why cuIO kernels require 8-bytes aligned pointers?

No idea. For me there's a gap between the reported issue and this solution, where I don't see why these buffers need the padding, and why this much padding. I don't see this information in the PR.

This reverts commit 872d0b8.

ttnghia · 2023-06-21T00:50:25Z

@harrism I finished implementing padding_memory_resource_adaptor and the rmm::device_buffer make_padded_device_buffer factory function. However, something coming up in my mind. The output rmm::device_buffer can be output from some utility function and then resize at the caller. In such cases, we have little idea if that buffer is already padded or not, and won't know if it will automatically have its size padded during resize.

I think the better solution for this is to explicitly say "padded" in the buffer type. That means, we should have a new buffer type like cudf::padded_device_buffer that can be constructed similar to rmm::device_buffer but it will have its own private padding mechanism. When using it, we can directly infer from its type that any of its memory allocation will involve padding.

wence- · 2023-06-21T16:05:48Z

I think the better solution for this is to explicitly say "padded" in the buffer type. That means, we should have a new buffer type like cudf::padded_device_buffer that can be constructed similar to rmm::device_buffer but it will have its own private padding mechanism. When using it, we can directly infer from its type that any of its memory allocation will involve padding.

I like this idea, because it allows users of buffers to advertise the padding (and/or alignment) that they require. (Aside: this "newtype" idiom is very common in the Haskell and Rust worlds).

I imagine that a nice way to do it is to have a padded_device_buffer that is templated on the required padding that inherits from rmm::device_buffer and just wraps the provided memory resource in one that applies padding. The only thing that templating would prevent, I think, is passing something with padding of 8 (say) to a function that expects padding of 2, which would be safe but the types would disallow it.

bdice · 2023-06-21T20:20:16Z

cpp/include/cudf/detail/utilities/alignment.hpp

+ * @param alignment The amount of bytes to align, must be a power of 2
+ * @return The aligned value
+ */
+inline std::size_t align_up(std::size_t value, std::size_t alignment)


Is this the same as round_up_safe?

cudf/cpp/include/cudf/detail/utilities/integer_utils.hpp

Line 47 in 54b81df

constexpr S round_up_safe(S number_to_round, S modulus)

I think this function could be useful if it took the number of alignment bits. Removes the power of 2 requirement and differentiates it from round_up_safe.

Sorry I'm confused. What is the conclusion here? Should round_up_safe be used to fix the bugs instead of adding align_up?

I would prefer just using round_up_safe here and using a constexpr named variable for the round-up value instead of the hardcoded 8 value.

Indeed, round_up_safe in this use case produces exactly the same output as align_up. So I'm going to switch to use it, as suggested by @davidwendt to get the bugs fixed ASAP. padded_device_buffer will be the follow up work.

Could have also used RMM's function?

Finished adoption of round_up_safe but still strugging to find a good place to put in the constexpr that @davidwendt recommended.

The changes seem to correspond to temporary compression/decompression buffers. Perhaps gpuinflate.hpp?
I'm ok defining it in each .cu file too if there is a chance the padding may be different for the different implementations.

Could have also used RMM's function?

I think we can. However, that is a detail function (rmm::detail::align_up) so I was reluctant to use it.

davidwendt · 2023-06-21T20:26:27Z

I would like to consider the device-buffer-padding work in a follow on PR.
I think the current implementation is fine along with Bradley's comments to get the nightly builds to work again.

bdice

A few small comments. I am happy with this design overall, feel free to address as you see fit.

cpp/src/io/comp/gpuinflate.hpp

Co-authored-by: Bradley Dice <bdice@bradleydice.com>

vuule

Left some notes to help track where the padding requirements originate
Looks like there are two sources - inflate_kernel and nvcomp snappy
the nvcomp one is surprising, we should follow up on it with the nvcomp team.

vuule · 2023-06-22T19:00:26Z

cpp/src/io/avro/reader_impl.cu

+    // Buffer needs to be padded.
+    rmm::device_buffer decomp_block_data(
+      cudf::util::round_up_safe(uncomp_size, BUFFER_PADDING_MULTIPLE), stream);


padding because of inflate_kernel (this is the output of this kernel)

vuule · 2023-06-22T19:01:20Z

cpp/src/io/avro/reader_impl.cu

+    // Buffer needs to be padded.
+    rmm::device_buffer scratch(cudf::util::round_up_safe(temp_size, BUFFER_PADDING_MULTIPLE),
+                               stream);
+    rmm::device_buffer decomp_block_data(
+      cudf::util::round_up_safe(uncompressed_data_size, BUFFER_PADDING_MULTIPLE), stream);


this one is odd, there should be no padding requirement from nvcomp. Have you verified that padding is required in the snappy case?

Why not? It is input to nvcompBatchedSnappyDecompressAsync.
Indeed, we don't have any unit test for AVRO in libcudf thus I can't verify it.

Just checked, there are no alignment/padding requirements from nvcomp here. IMO this part of the change can be reverted.

Are cuDF tests useless in this context?

Then I'll revert this.

Are cuDF tests useless in this context?

There is 0 (zero) C++ test for avro.

I was referring to Python tests, we have some limited coverage there.

We have avro_reader_test in Python but I don't know how to check it with compute-sanitizer.

Update: I ran compute-sanitizer on test_avro_reader_fastavro_integration.py and didn't see any issue 👍

vuule · 2023-06-22T19:04:14Z

cpp/src/io/avro/reader_impl.cu

+      // Buffer needs to be padded.
+      block_data.resize(cudf::util::round_up_safe(block_data.size(), BUFFER_PADDING_MULTIPLE),
+                        stream);
+


padding because of inflate_kernel? (this is an input of this kernel)

vuule · 2023-06-22T19:06:29Z

cpp/src/io/orc/reader_impl.cu

+          stripe_data.emplace_back(
+            cudf::util::round_up_safe(total_data_size, BUFFER_PADDING_MULTIPLE), stream);


based on the failing test, this buffer is the input to the nvcomp snappy decompression.

Assuming the issue has nothing to do with snappy, maybe this padding is needed in the uncompressed case. @ttnghia do you know which test triggered on this one?

Reversing this will cause memory access issue at:

at 0x430 in ...cpp/src/io/comp/gpuinflate.cu:1197:cudf::io::copy_uncompressed_kernel

Based on the repro (without combing through copy_uncompressed_kernel), my understanding is that it requires the input buffers to be padded. I would suggest to add this to places where buffers are padded for this reason, and also add a note to gpu_copy_uncompressed_blocks docs. I really want changes in this PR to be traceable to the root(ish) cause.

vuule · 2023-06-22T19:07:16Z

cpp/src/io/orc/reader_impl.cu

+  // Buffer needs to be padded.
+  rmm::device_buffer decomp_data(
+    cudf::util::round_up_safe(total_decomp_size, BUFFER_PADDING_MULTIPLE), stream);


output of (presumed) nvcomp snappy decompression.

Issue from gpuDecodeOrcColumnData.

cpp/src/io/parquet/reader_impl_preprocess.cu

vuule · 2023-06-22T19:10:02Z

cpp/src/io/text/bgzip_data_chunk_source.cu

this one is about inflate_kernel

vyasr

Fixes look correct to me. Would also like to discuss the adaptor/device_buffer changes in a separate follow-up PR.

ttnghia · 2023-06-23T18:14:12Z

cpp/src/io/parquet/reader_impl_preprocess.cu

-  rmm::device_buffer decomp_pages(total_decomp_size, stream);
+  // Dispatch batches of pages to decompress for each codec.
+  // Buffer needs to be padded.
+  rmm::device_buffer decomp_pages(


All changes for buffers in this file are due to gpuDecodePageData.

cpp/src/io/parquet/writer_impl.cu

ttnghia · 2023-06-23T20:06:18Z

/merge

…kernel (#13643) Fixes memcheck regression in ORC reader uncompressed logic. The regression was introduced in #13396. The original fix is copied from #13586 The error was caught by the nightly build here: https://github.com/rapidsai/cudf/actions/runs/5409203449/jobs/9829059618 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #13643

ttnghia added 3 commits June 15, 2023 15:49

Add alignment function

872d0b8

Fix alignment for buffers

6c136fe

Merge branch 'branch-23.08' into fix_io

249b8c2

ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue non-breaking Non-breaking change labels Jun 16, 2023

ttnghia requested a review from a team as a code owner June 16, 2023 00:51

ttnghia self-assigned this Jun 16, 2023

ttnghia requested review from bdice, harrism and vuule June 16, 2023 00:51

ttnghia mentioned this pull request Jun 16, 2023

Fix memcheck error found in PARQUET_TEST #13585

Closed

3 tasks

Add missing stream param

79d615e

Fix copyright header

26d81ea

etseidl reviewed Jun 16, 2023

View reviewed changes

cpp/src/io/parquet/writer_impl.cu Outdated Show resolved Hide resolved

Reverse an unnecessary change

29c3424

This was linked to issues Jun 16, 2023

[BUG] Invalid memory read in cudf::io::inflate_kernel #13567

Closed

[BUG] Invalid memory read in cudf::io::orc::gpu::gpuDecodeOrcColumnData #13570

Closed

[BUG] Invalid memory read in cudf::io::parquet::gpu::gpuDecodePageData #13571

Closed

karthikeyann reviewed Jun 19, 2023

View reviewed changes

harrism reviewed Jun 20, 2023

View reviewed changes

ttnghia added 3 commits June 20, 2023 09:48

Merge branch 'branch-23.08' into fix_io

4364f0d

Revert "Add alignment function"

a2ccbde

This reverts commit 872d0b8.

Add buffer factory

c42e030

bdice reviewed Jun 21, 2023

View reviewed changes

ttnghia and others added 3 commits June 21, 2023 15:12

Use round_up_safe instead of align_up

a22c1b6

Merge branch 'branch-23.08' into fix_io

092ca20

Use constexpr std::size_t PADDING_MODULUS

75c218d

ttnghia requested review from bdice and davidwendt June 22, 2023 02:24

davidwendt approved these changes Jun 22, 2023

View reviewed changes

bdice approved these changes Jun 22, 2023

View reviewed changes

cpp/src/io/comp/gpuinflate.hpp Outdated Show resolved Hide resolved

cpp/src/io/comp/gpuinflate.hpp Outdated Show resolved Hide resolved

cpp/src/io/comp/gpuinflate.hpp Outdated Show resolved Hide resolved

ttnghia and others added 2 commits June 22, 2023 08:51

Update cpp/src/io/comp/gpuinflate.hpp

bf7197a

Co-authored-by: Bradley Dice <bdice@bradleydice.com>

Rename constant

1b50490

vuule reviewed Jun 22, 2023

View reviewed changes

vyasr approved these changes Jun 22, 2023

View reviewed changes

ttnghia and others added 2 commits June 22, 2023 12:46

Add more comment

1002dcd

Merge branch 'branch-23.08' into fix_io

c178f27

ttnghia mentioned this pull request Jun 22, 2023

[FEA] Investigate why padding is needed for input buffers to compression/decompression kernels #13605

Open

ttnghia added 2 commits June 22, 2023 13:18

Add more comment and link to new issue

3f2a0e3

Reverse changes for avro reader

99c3dc9

ttnghia commented Jun 23, 2023

View reviewed changes

cpp/src/io/parquet/writer_impl.cu Show resolved Hide resolved

Add comments

16ef938

vuule approved these changes Jun 23, 2023

View reviewed changes

rapids-bot bot merged commit 0b4e354 into rapidsai:branch-23.08 Jun 23, 2023

ttnghia deleted the fix_io branch June 23, 2023 22:25

davidwendt mentioned this pull request Jun 29, 2023

Fix memcheck error in ORC reader call to cudf::io::copy_uncompressed_kernel #13643

Merged

3 tasks

		stripe_data.emplace_back(
		cudf::util::round_up_safe(total_data_size, BUFFER_PADDING_MULTIPLE), stream);

Conversation

ttnghia commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ttnghia commented Jun 16, 2023

Uh oh!

ttnghia commented Jun 16, 2023

Uh oh!

Uh oh!

etseidl commented Jun 16, 2023

Uh oh!

karthikeyann Jun 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harrism left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ttnghia commented Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wence- commented Jun 21, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ttnghia Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidwendt commented Jun 21, 2023

Uh oh!

bdice left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vuule left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ttnghia Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ttnghia commented Jun 16, 2023 •

edited

Loading

karthikeyann Jun 19, 2023 •

edited

Loading

ttnghia commented Jun 21, 2023 •

edited

Loading

ttnghia Jun 21, 2023 •

edited

Loading

ttnghia Jun 22, 2023 •

edited

Loading

ttnghia Jun 23, 2023 •

edited

Loading