-
Notifications
You must be signed in to change notification settings - Fork 8k
Fixes bug #69181 (csv parsing drops newlines within fields with SplFileObject::DROP_NEW_LINE) #1148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Running this test fails because when DROP_NEW_LINES flag is present when reading CSV files, fields containing quoted new lines are not anticipated.
Fixes bug #69181 (csv parsing drops newlines within fields with SplFileObject::DROP_NEW_LINE) when SPL_FILE_OBJECT_READ_CSV is set, a newline does not necessarily mean the end of the row, so don't drop. CSV rows can contain multiple quoted newlines.
|
The test that fails in the Travis build is This test is titled "Phar object: iterating via SplFileObject and reading csv", so it seems to me that on line 19 the My PR ignores dropping new lines within CSV rows (keeping the original data, as it is stored in CSV), so the outcome is that when SplFileObject::READ_CSV is set, no new lines are dropped (affecting the expected result from the now-failing test). @laruence and @m6w6 may want to give input to this, regarding the Phar test? |
|
Can one of the admins verify this patch? |
|
I'm not sure the result you're getting in ext/phar/tests/phar_oo_009.phpt is correct. Linefeed within delimiters may be passed as is, true, but the linefeed you have there is not within the delimiters. I think you need to handle this case more carefully. |
|
@smalyshev I'm not sure what the phar test is doing, and I have not managed to run it in isolation.
are you addressing the functionality supplied by my patch, or the test in |
|
The test is failing, because there is a empty line in the CSV, which is not skipped, even though SplFileObject::SKIP_EMPTY is given. That is because the modified code puts a non zero length into intern->u.file.current_line_len, but the line would only be skipped if intern->u.file.current_line_len was zero. Actually, it seems to me that unrelated whether SPL_FILE_OBJECT_READ_CSV is given, the buffer should be investigated from the end, replacing all \n and \r with \0 (decreasing line_len as well), and stopping as soon as another character is detected. IOW: don't make a case differentiation, but replace the strcspn() with something that works from right-to-left. |
|
Diregard my suggested solution; that can't work. |
|
Since this targets an unsupported branch of PHP, and since this seems to introduce inconsistencies of it's own, and since the author seems to have abandoned working on it, I'm closing this PR. Please take this action as encouragement to open a clean PR against a supported branch of PHP. |
|
Thanks, I'll look into the affects on 7 and whether it is still valid. |
This pull request fixes buggy behaviour when using
SplFileObjectto work with CSV files.According to the documentation, the
DROP_NEW_LINEflag drops newlines at the end of a line. http://php.net/manual/en/class.splfileobject.php#splfileobject.constants.drop-new-lineHowever, even though CSV files are obviously stored with new lines at the end of each line, the fields within the CSV can also contain any number of newline characters, as long as they are quoted. In this case, the bug appears, as the DROP_NEW_LINE flag causes c function
strcspnto remove the first, and only the first, newline from the entire row of fields.My solution is to ignore the
strcspnfunction when parsing CSV files, because if there are any newlines within fields, the fields MUST be delimited, therefore no newlines should be dropped within the delimiters.