ARROW-2119: [IntegrationTest] Add test case with a stream having no record batches #3871

wesm · 2019-03-12T02:56:41Z

I think it is not a bad idea to not fail on reading and writing streams that have no record batches, rather than raising an error.

This would require code changes in Java and JS at least, so I will need help from others if this is thought to be a good idea. Might be good to add some unit tests around this also

wesm · 2019-03-12T02:59:43Z

See integration test results:

https://gist.github.com/wesm/df111020b498f3e7b261b8667aced4e2

It looks like only C++ can consume its own input. The other 8 entries of the matrix fail

wesm · 2019-03-13T01:40:02Z

Hm, well this is a bit concerning. Failure in the integration tests does not fail the build

fsaintjacques · 2019-03-13T13:51:11Z

Rebase and try again

emkornfield · 2019-03-14T05:20:37Z

@wesm I can take up the Java side of things in the 0.14 (sorry a bit swamped at the moment) release time frame. Does it pay to discuss this on the ML to be sure there is consensus? (Apologies if I missed the thread).

wesm · 2019-03-14T13:37:24Z

yeah I think it would make sense to ensure there is agreement about what should occur with a stream with no batches

emkornfield · 2019-05-16T00:13:02Z

@wesm were you going to follow up on the ML about this? Want me to?

wesm · 2019-05-20T21:29:51Z

@emkornfield I guess I dropped the ball. Do you want to ping the list about it? Then we can try to fix the Java and C++ implementations at least

wesm · 2019-05-23T15:21:19Z

I rebased and set the integration test to only run for C++

wesm · 2019-05-23T15:46:10Z

+1

Re: #3871, [ARROW-2119](https://issues.apache.org/jira/browse/ARROW-2119), and closes [ARROW-5396](https://issues.apache.org/jira/browse/ARROW-5396). This PR updates the JS Readers and Writers to support files and streams with no RecordBatches. The approach here is two-fold: 1. If the Readers' source message stream terminates after reading the Schema message, the Reader will yield a dummy zero-length RecordBatch with the schema. 2. The Writer always writes the schema for any RecordBatch, but skips writing the RecordBatch field metadata if it's empty. This is necessary because the reader and writer don't know about each other when they're communicating via the Node and DOM stream i/o primitives; they only know about the values pushed through the streams. Since the RecordBatchReader and Writer don't yield the Schema message as a standalone value, we pump the stream with a zero-length RecordBatch that contains the schema instead. Author: ptaylor <paul.e.taylor@me.com> Author: Wes McKinney <wesm+git@apache.org> Closes #4373 from trxcllnt/js/fix-no-record-batches and squashes the following commits: c860696 <Wes McKinney> Run no-batches integration test for JS also 86d192d <ptaylor> define an _InternalEmptyRecordBatch class to signal that the reader source stream has no RecordBatches 193b08d <ptaylor> ensure reader and writer support the case where a stream or file has a schema but no recordbatches

trxcllnt mentioned this pull request May 22, 2019

ARROW-5396: [JS] Support files and streams with no record batches #4373

Closed

wesm added 2 commits May 23, 2019 09:04

Add test case with a string having no record batches

f4816fe

Disable no-batches case for JS and Java

acc5ac9

wesm force-pushed the ARROW-2119 branch from 7005718 to acc5ac9 Compare May 23, 2019 14:44

wesm changed the title ~~WIP ARROW-2119: [IntegrationTest] Add test case with a stream having no record batches~~ ARROW-2119: [IntegrationTest] Add test case with a stream having no record batches May 23, 2019

wesm closed this in b3a4e95 May 23, 2019

wesm deleted the ARROW-2119 branch May 23, 2019 15:46

This was referenced May 23, 2019

[C++][Java] Handle Arrow stream with zero record batch #18090

Closed

[CI] Integration test failures do not fail the Travis CI build #21362

Closed

[JS] Ensure reader and writer support files and streams with no RecordBatches #21853

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-2119: [IntegrationTest] Add test case with a stream having no record batches #3871

ARROW-2119: [IntegrationTest] Add test case with a stream having no record batches #3871

wesm commented Mar 12, 2019

Uh oh!

wesm commented Mar 12, 2019

Uh oh!

wesm commented Mar 13, 2019

Uh oh!

fsaintjacques commented Mar 13, 2019

Uh oh!

emkornfield commented Mar 14, 2019

Uh oh!

wesm commented Mar 14, 2019

Uh oh!

emkornfield commented May 16, 2019

Uh oh!

wesm commented May 20, 2019

Uh oh!

wesm commented May 23, 2019

Uh oh!

wesm commented May 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ARROW-2119: [IntegrationTest] Add test case with a stream having no record batches #3871

ARROW-2119: [IntegrationTest] Add test case with a stream having no record batches #3871

Conversation

wesm commented Mar 12, 2019

Uh oh!

wesm commented Mar 12, 2019

Uh oh!

wesm commented Mar 13, 2019

Uh oh!

fsaintjacques commented Mar 13, 2019

Uh oh!

emkornfield commented Mar 14, 2019

Uh oh!

wesm commented Mar 14, 2019

Uh oh!

emkornfield commented May 16, 2019

Uh oh!

wesm commented May 20, 2019

Uh oh!

wesm commented May 23, 2019

Uh oh!

wesm commented May 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants