Skip to content

ARROW-2308: [Python] Make deserialized numpy arrays 64-byte aligned.#1802

Closed
robertnishihara wants to merge 9 commits intoapache:masterfrom
robertnishihara:numpyalignment
Closed

ARROW-2308: [Python] Make deserialized numpy arrays 64-byte aligned.#1802
robertnishihara wants to merge 9 commits intoapache:masterfrom
robertnishihara:numpyalignment

Conversation

@robertnishihara
Copy link
Copy Markdown
Contributor

@robertnishihara robertnishihara commented Mar 27, 2018

I'm not too sure about the relation to https://issues.apache.org/jira/browse/ARROW-1860, but I thought I'd open this for comment.

This aligns numpy arrays to 64-byte boundaries.

cc @pcmoritz @wesm @yaroslavvb

@pcmoritz
Copy link
Copy Markdown
Contributor

pcmoritz commented Mar 28, 2018

LGTM! We might also want to have a discussion about the spec and if we want Tensors to be aligned in general/by default. It seems important to me and maybe it's already implied by the sentence It is required to have all the contiguous memory buffers in an IPC payload aligned at 8-byte boundaries. In other words, each buffer must start at an aligned 8-byte offset.

Edit: There is a test failure in ipc-write-test we should fix before merging :)

@robertnishihara
Copy link
Copy Markdown
Contributor Author

Yes, that seems related, but for Tensors we want 64-byte alignment.

@wesm
Copy link
Copy Markdown
Member

wesm commented Mar 28, 2018

Will review this when I can. I should also revive ARROW-1860 as there are a number of interrelated issues around this stuff

@robertnishihara
Copy link
Copy Markdown
Contributor Author

Thanks @wesm!

@pcmoritz, good catch, the bug should be fixed now.


std::unique_ptr<Message> message;
RETURN_NOT_OK(ReadContiguousPayload(file, &message));
RETURN_NOT_OK(ReadContiguousPayload(file, &message, true));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When passing bare bool literals like that as argument value, is it possible to add a comment with the argument name?
Something like:

  RETURN_NOT_OK(ReadContiguousPayload(file, &message, true /* aligned */));

(not sure the above passes linting, to be honest)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I fixed it. If this doesn't pass, then I can just do

bool aligned = true;
RETURN_NOT_OK(ReadContiguousPayload(file, &message, aligned));

@pcmoritz
Copy link
Copy Markdown
Contributor

pcmoritz commented Apr 4, 2018

+1, test failures look unrelated

@pcmoritz pcmoritz closed this in 26bc4ab Apr 4, 2018
@robertnishihara robertnishihara deleted the numpyalignment branch April 4, 2018 18:34
@wesm
Copy link
Copy Markdown
Member

wesm commented Apr 5, 2018

I would appreciate it if someone would try to reconcile the changes in this patch with my existing work in ARROW-1860. I won't be able to look for a few weeks

@wesm
Copy link
Copy Markdown
Member

wesm commented Apr 5, 2018

Additionally, the alignment issues need to be more carefully documented in IPC.md

int64_t aligned_offset = PaddedLength(offset);
int64_t num_extra_bytes = aligned_offset - offset;
std::shared_ptr<Buffer> dummy_buffer;
RETURN_NOT_OK(file->Read(num_extra_bytes, &dummy_buffer));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right change. Shouldn't the alignment come from padding after the metadata?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the comment. This Read is used to skip over the padding. The metadata is read before this block, and the main contents are read after this block.

@robertnishihara
Copy link
Copy Markdown
Contributor Author

Documentation added in #1837.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants