Skip to content

GH-39782: [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray#39783

Merged
pitrou merged 8 commits intoapache:mainfrom
jorisvandenbossche:gh-39782
Feb 28, 2024
Merged

GH-39782: [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray#39783
pitrou merged 8 commits intoapache:mainfrom
jorisvandenbossche:gh-39782

Conversation

@jorisvandenbossche
Copy link
Copy Markdown
Member

@jorisvandenbossche jorisvandenbossche commented Jan 24, 2024

Rationale for this change

The Array exporter currently uses buffer->data(), which is documented as "The buffer has to be a CPU buffer (is_cpu() is true). Otherwise, an assertion may be thrown or a null pointer may be returned.".
In practice, we indeed return null for non-CPU data. There is only a debug check that asserts the device.

This means that for non-CPU data the C device interface is currently returning null pointers for the buffers.

What changes are included in this PR?

Use the buffer's address() instead of data()

Are these changes tested?

Not yet

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #39782 has been automatically assigned in GitHub to PR creator.

@jorisvandenbossche
Copy link
Copy Markdown
Member Author

jorisvandenbossche commented Jan 24, 2024

I don't think we currently have non-CPU tests for the device interface.
I noticed with testing with nanoarrow, but so not sure there is a quick way to add a test here (apart from setting up CUDA tests for this, which we maybe should do anyway. EDIT: just adding a small test in the existing cuda_test.cc to ensure the buffers are not null is probably, or it properly roundtrips, is probably not that difficult)

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Jan 24, 2024

I don't think we currently have non-CPU tests for the device interface.

Good point. We should set up actual CUDA tests, IMHO.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to distinguish between C Data Interface and C Device Data Interface here? We could use buffer->data() for the former and buffer->address() for the latter - ensuring that non-CPU buffers cannot get exported over the C Data Interface.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, this ArrayExporter is not aware of whether the ArrowArray came from a plain C Data array or a C Device array.

I could add that as a boolean flag to the class so we can distinguish based on that here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should do that.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jan 25, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 15, 2024
@jorisvandenbossche
Copy link
Copy Markdown
Member Author

@github-actions crossbow submit test-cuda-*

@github-actions
Copy link
Copy Markdown

Revision: a3910171759ec4940cf7f1cebf0783060f46c4d7

Submitted crossbow builds: ursacomputing/crossbow @ actions-904d985d92

Task Status
test-cuda-cpp GitHub Actions
test-cuda-python GitHub Actions

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Feb 16, 2024
@jorisvandenbossche
Copy link
Copy Markdown
Member Author

I added a basic roundtrip test to bridge_test.cc with a CUDA device. I first wanted to reuse the existing TestDeviceArrayRoundtrip class, but in the end didn't do that here because that class does several things that don't work for CUDA (like validating the resulting array, checking the allocated pool size (with CUDA we don't have a memory pool that can say how much data is allocated?), etc).
But this might benefit from some refactor in the future.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid CUDA-specific code in non-CUDA directories, perhaps move this test somewhere in src/arrow/gpu?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will move to cuda_test.cc

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting merge Awaiting merge awaiting changes Awaiting changes labels Feb 21, 2024
@jorisvandenbossche
Copy link
Copy Markdown
Member Author

@github-actions crossbow submit test-cuda-*

@github-actions

This comment was marked as outdated.

@jorisvandenbossche
Copy link
Copy Markdown
Member Author

@pitrou updated to move the tests to cuda_test.cc

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a small improvement. LGTM, thank you!

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Feb 28, 2024

@github-actions crossbow submit test-cuda-*

@github-actions
Copy link
Copy Markdown

Revision: 48eac29

Submitted crossbow builds: ursacomputing/crossbow @ actions-6418ae939e

Task Status
test-cuda-cpp GitHub Actions
test-cuda-python GitHub Actions

@pitrou pitrou merged commit 19e874f into apache:main Feb 28, 2024
@pitrou pitrou removed the awaiting change review Awaiting change review label Feb 28, 2024
@jorisvandenbossche jorisvandenbossche deleted the gh-39782 branch February 28, 2024 12:31
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 19e874f.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details. It also includes information about 10 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] ExportDeviceArray for C device interface uses CPU memory address for buffers

3 participants