ARROW-2308: [Python] Make deserialized numpy arrays 64-byte aligned.#1802
ARROW-2308: [Python] Make deserialized numpy arrays 64-byte aligned.#1802robertnishihara wants to merge 9 commits intoapache:masterfrom robertnishihara:numpyalignment
Conversation
|
LGTM! We might also want to have a discussion about the spec and if we want Tensors to be aligned in general/by default. It seems important to me and maybe it's already implied by the sentence Edit: There is a test failure in ipc-write-test we should fix before merging :) |
|
Yes, that seems related, but for Tensors we want 64-byte alignment. |
|
Will review this when I can. I should also revive ARROW-1860 as there are a number of interrelated issues around this stuff |
cpp/src/arrow/ipc/reader.cc
Outdated
|
|
||
| std::unique_ptr<Message> message; | ||
| RETURN_NOT_OK(ReadContiguousPayload(file, &message)); | ||
| RETURN_NOT_OK(ReadContiguousPayload(file, &message, true)); |
There was a problem hiding this comment.
When passing bare bool literals like that as argument value, is it possible to add a comment with the argument name?
Something like:
RETURN_NOT_OK(ReadContiguousPayload(file, &message, true /* aligned */));(not sure the above passes linting, to be honest)
There was a problem hiding this comment.
Good point, I fixed it. If this doesn't pass, then I can just do
bool aligned = true;
RETURN_NOT_OK(ReadContiguousPayload(file, &message, aligned));|
+1, test failures look unrelated |
|
I would appreciate it if someone would try to reconcile the changes in this patch with my existing work in ARROW-1860. I won't be able to look for a few weeks |
|
Additionally, the alignment issues need to be more carefully documented in IPC.md |
| int64_t aligned_offset = PaddedLength(offset); | ||
| int64_t num_extra_bytes = aligned_offset - offset; | ||
| std::shared_ptr<Buffer> dummy_buffer; | ||
| RETURN_NOT_OK(file->Read(num_extra_bytes, &dummy_buffer)); |
There was a problem hiding this comment.
I'm not sure this is the right change. Shouldn't the alignment come from padding after the metadata?
There was a problem hiding this comment.
I'm not sure I understand the comment. This Read is used to skip over the padding. The metadata is read before this block, and the main contents are read after this block.
|
Documentation added in #1837. |
I'm not too sure about the relation to https://issues.apache.org/jira/browse/ARROW-1860, but I thought I'd open this for comment.
This aligns numpy arrays to 64-byte boundaries.
cc @pcmoritz @wesm @yaroslavvb