Skip to content

Add support in Commons CSV for tracking byte positions during parsing#10

Merged
DarrenJAN merged 1 commit into
marklogic:1.12.1-marklogic-releasefrom
DarrenJAN:1.12.1-marklogic-release
Nov 7, 2024
Merged

Add support in Commons CSV for tracking byte positions during parsing#10
DarrenJAN merged 1 commit into
marklogic:1.12.1-marklogic-releasefrom
DarrenJAN:1.12.1-marklogic-release

Conversation

@DarrenJAN

Copy link
Copy Markdown

Add support in Commons CSV for tracking byte positions during parsing.

Summary of Modifications

Test Data Files: Added new test data files, and updated pom.xml to exclude these files from RAT checks, avoiding unapproved license checks.
CSVParser class:
Constructor Enhancements
a. Added support for an optional parameter -- String encoding--, which specifies the encoding to use for the reader.
CSVRecord class
private long characterByte: start byte position of this record
Add new Constructor: support track byte positions in record class
ExtendedBufferedReader Class:
private long bytesRead: Tracks the number of bytes read so far.
private long bytesReadMark: Stores the marked byte position.
CharsetEncoder encoder: Encoder used to calculate byte size of characters.
getCharBytes(int current): This function calculates character bytes based on UTF-8 encoding. Note: it only supports UTF-8 due to the encoding algorithm used. Full encoding can be supported and we just need more effort on this.
reset() and mark() Methods: Enhanced to prevent consuming characters and bytes unintentionally.

Test result:
mvn
image
image
Pass unit tests and other restrictions

…apache#9)

Add support in Commons CSV for tracking byte positions during parsing
@DarrenJAN DarrenJAN merged commit f0a2398 into marklogic:1.12.1-marklogic-release Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant