Skip to content

GH-43833: [Integration] validate that arrow IPC files include a valid embedded IPC stream#43834

Closed
bkietz wants to merge 3 commits intoapache:mainfrom
bkietz:ipc-assert-embedded-stream
Closed

GH-43833: [Integration] validate that arrow IPC files include a valid embedded IPC stream#43834
bkietz wants to merge 3 commits intoapache:mainfrom
bkietz:ipc-assert-embedded-stream

Conversation

@bkietz
Copy link
Copy Markdown
Member

@bkietz bkietz commented Aug 26, 2024

Rationale for this change

IPC files should contain a valid IPC stream with EOS. Some arrow producers don't guarantee that the EOS is written. However there might be readers depending on simply reading the embedded IPC stream (and depending on the EOS to avoid reading the Footer as a corrupted message) so we should validate stream reading from IPC files as well as file reading.

What changes are included in this PR?

arrow-json-integration-test will now also try to read a file as a stream, which will fail if the file doesn't include an EOS or otherwise isn't a valid stream.

Are these changes tested?

yes

Are there any user-facing changes?

no

ARROW_ASSIGN_OR_RAISE(auto footer_cookie, arrow_file->ReadAt(file_size - 10, 10));
auto footer_size =
bit_util::FromLittleEndian(util::SafeLoadAs<int32_t>(footer_cookie->data()));
int64_t footer_offset = 8 + file_size - footer_size - 10;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 8 + is a bit cryptic. I realize it accounts for the 8-byte padded magic at the start of file, but a comment would make this easier to understand.

Or perhaps you can change the checks to something like:

  // The footer is followed by the 10-byte footer length + magic
  const int64_t footer_offset = file_size - footer_size - 10;
  // Make sure the stream ends before the footer
  ARROW_ASSIGN_OR_RAISE(const int64_t stream_size, stream->Tell());
  const int64_t stream_end_offset = stream_size + 8;
  if (stream_end_offset > footer_offset) {

}
ARROW_ASSIGN_OR_RAISE(int64_t stream_size, stream->Tell());

if (footer_offset <= stream_size) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why not footer_offset < stream_size? It should be ok if the stream ends just before the footer.

@github-actions
Copy link
Copy Markdown

Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer.

@github-actions github-actions bot added the Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise label Nov 18, 2025
@github-actions github-actions bot closed this Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting committer review Awaiting committer review Component: C++ Status: stale-warning Issues and PRs flagged as stale which are due to be closed if no indication otherwise

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants