Better handle rows that break across splits, and other small related fixes#400
Better handle rows that break across splits, and other small related fixes#400HyukjinKwon merged 3 commits intodatabricks:masterfrom
Conversation
srowen
left a comment
There was a problem hiding this comment.
I'm not super happy about the hack here, but I could not find any other way around it. This is a potential correctness issue, so we need to do something. At least, I added more tests to exercise handling of rows that split across splits, and they still pass.
| reader = new InputStreamReader(in, charset) | ||
|
|
||
| if (codec == null) { | ||
| // Hack: in the uncompressed case (see more below), we must know how much the |
There was a problem hiding this comment.
Yea ... I don't like this hack too ... but seems no better way.
HyukjinKwon
left a comment
There was a problem hiding this comment.
Looks good if the styles and tests pass
Codecov Report
@@ Coverage Diff @@
## master #400 +/- ##
==========================================
- Coverage 87.78% 87.73% -0.06%
==========================================
Files 14 14
Lines 745 758 +13
Branches 64 65 +1
==========================================
+ Hits 654 665 +11
- Misses 91 93 +2
Continue to review full report at Codecov.
|
|
Thanks @HyukjinKwon - we may want to cut an 0.6.0 release for this, plus the |
This attempt to address #398
See also #399
The change is I believe explained in comments below.