Add support in Commons CSV for tracking byte positions during parsing by DarrenJAN · Pull Request #10 · marklogic/commons-csv

DarrenJAN · 2024-11-07T02:28:47Z

Add support in Commons CSV for tracking byte positions during parsing.

Summary of Modifications

Test Data Files: Added new test data files, and updated pom.xml to exclude these files from RAT checks, avoiding unapproved license checks.
CSVParser class:
Constructor Enhancements
a. Added support for an optional parameter -- String encoding--, which specifies the encoding to use for the reader.
CSVRecord class
private long characterByte: start byte position of this record
Add new Constructor: support track byte positions in record class
ExtendedBufferedReader Class:
private long bytesRead: Tracks the number of bytes read so far.
private long bytesReadMark: Stores the marked byte position.
CharsetEncoder encoder: Encoder used to calculate byte size of characters.
getCharBytes(int current): This function calculates character bytes based on UTF-8 encoding. Note: it only supports UTF-8 due to the encoding algorithm used. Full encoding can be supported and we just need more effort on this.
reset() and mark() Methods: Enhanced to prevent consuming characters and bytes unintentionally.

Test result:
mvn

Pass unit tests and other restrictions

…apache#9) Add support in Commons CSV for tracking byte positions during parsing

Add support in Commons CSV for tracking byte positions during parsing (…

3e13b9d

…apache#9) Add support in Commons CSV for tracking byte positions during parsing

DarrenJAN merged commit f0a2398 into marklogic:1.12.1-marklogic-release Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support in Commons CSV for tracking byte positions during parsing#10

Add support in Commons CSV for tracking byte positions during parsing#10
DarrenJAN merged 1 commit into
marklogic:1.12.1-marklogic-releasefrom
DarrenJAN:1.12.1-marklogic-release

DarrenJAN commented Nov 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DarrenJAN commented Nov 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant