Skip to content

Conversation

@gbidsilva
Copy link
Contributor

@gbidsilva gbidsilva commented Aug 30, 2023

This fix is related to : https://issues.apache.org/jira/browse/CSV-147.

If we have some faulty data in the CSV then the current error message which we are getting is something similar to below.
java.io.IOException: (line 2) invalid char between encapsulated token and delimiter

With this fix, what we will be getting is something similar to below,
java.io.IOException: An error occurred while tying to parse the CSV content. Error in line: 2, position: 94, last parsed content: ...rec4,rec5,rec6,rec7,rec8

Update
It has been decided to only to add the record position into the exception message and treat getLastParsedContent method as a new feature. Therefore this PR only contains the position related changes.

@garydgregory
Copy link
Member

I'm OK with adding the position but I am guessing someone will create a security issue for data exfiltration.

@gbidsilva
Copy link
Contributor Author

@garydgregory :
That is a good point. But IMO, whenever someone log the exception, at that point they should consider the security risks.
Isn't usually exception provide such details? as I remember I have came across such scenarios.
What would be the best practice in this case ? - good to know :-)

@gbidsilva
Copy link
Contributor Author

@elharo: Thank you for the feedback. Changes are updated in the PR now.

@gbidsilva
Copy link
Contributor Author

@garydgregory:
I have looked around what would be the best practice in adding data records in Exceptions. It seems that arguments are there for both the sides. However, as you suggested, we should avoid adding such details in Exception and keep it consistent with how commons were creating exceptions so far.
To rectify this, I have removed the content part from the exception message and made 'getLastParsedContent()' method public. Therefore, if there is any need then anyone can call that method and can get the last parsed content.
Test cases are added to the PR :-)
Appreciate your feedback on this.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbidsilva
Thank you for your updates. Please see my comments.

@gbidsilva
Copy link
Contributor Author

@garydgregory @elharo :
Requested changes are done.

@gbidsilva gbidsilva requested a review from garydgregory August 31, 2023 14:20
@gbidsilva
Copy link
Contributor Author

@garydgregory :
I understand your point. In this case, we will treat getLastParsedContent method as a new feature and decide whether it is actually needed or not in the product.
To make this PR work, I have removed the getLastParsedContent method content and only kept the positioning related changes.
Appreciate your feedback :-)

@gbidsilva
Copy link
Contributor Author

@garydgregory : let us know if there is anymore change to be done in this PR.

@gbidsilva
Copy link
Contributor Author

@garydgregory @elharo
Exception handling moved to Lexer class.

// error invalid char between token and next delimiter
throw new IOException("(line " + getCurrentLineNumber() +
") invalid char between encapsulated token and delimiter");
throw new IOException("Invalid char between encapsulated token and delimiter at line: " + getCurrentLineNumber() + ", position: " + getCharacterPosition());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be an IOException but that issue is not new with this PR

.build();

CSVParser csvParser = csvFormat.parse(stringReader);
Exception exception = assertThrows(UncheckedIOException.class, () -> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UncheckedIOException is not right either, but again not new in this PR

@gbidsilva
Copy link
Contributor Author

@garydgregory : Checkstyle issue has been fixed.

@gbidsilva
Copy link
Contributor Author

@garydgregory : Anything pending from development side for this to be merged ?

@gbidsilva
Copy link
Contributor Author

@gbidsilva Thank you for your updates. Please see my comments.

Completed.

@codecov-commenter
Copy link

codecov-commenter commented Sep 13, 2023

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.87%. Comparing base (7104598) to head (c0759cd).
⚠️ Report is 844 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master     #347   +/-   ##
=========================================
  Coverage     97.87%   97.87%           
  Complexity      549      549           
=========================================
  Files            11       11           
  Lines          1178     1179    +1     
  Branches        204      204           
=========================================
+ Hits           1153     1154    +1     
  Misses           13       13           
  Partials         12       12           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@garydgregory garydgregory merged commit a33b24c into apache:master Sep 13, 2023
@garydgregory garydgregory changed the title [CSV-147] Error message optimization during faulty CSV record read [CSV-147] Better error message during faulty CSV record read Sep 13, 2023
asfgit pushed a commit that referenced this pull request Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants