fix: create multiple dataframes from same CSVReader by AuPath · Pull Request #44 · cefriel/mapping-template

AuPath · 2025-03-27T08:26:04Z

fixed by moving to version 3.6.0 of the fastcsv library (closes CsvReader is consumed after extracting DataFrame #38)
fixed handling of file encoded with BOM (closes UTF-8 BOM encoded CSV file #42)

The behaviour is now to read the whole CSV file (or String) into a List<NamedCsvRecord> and not in a stream as was previously (accidentaly) the case. While this does work and means that #38 is corrected this also means that the whole file is always kept in memory.

This does not necessarily make sense as i might have a huge CSV file where i am interested in a single column that i can obtain with the getDataframe(String... columns) method. While the created dataframe is small i still have the whole file in memory, even if no other dataframe with more than that column is ever created.

We should discuss possible optimizations for this scenario.

- fixed by moving to version 3.6.0 of the fastcsv library (closes #38) - fixed handling of file encoded with BOM (closes #42)

marioscrock · 2025-03-28T08:56:43Z

Thanks for fixing this! Reading the file as a stream was not "accidental" but on purpose to reduce the memory footprint. We should check the best trade-off to solve the issue of multiple readings while keeping the possibility of processing iteratively the CSV (e.g., large CSV where I need to access only one column). Maybe we can have a CSVStreamReader and CSVReader or something similar?

marioscrock · 2025-04-09T15:00:05Z

@AuPath I pushed an attempt to introduce CSVStreamReader and refactor duplicated operations in CSVReaderAbstract. Please review the latest commit

marioscrock · 2025-04-10T15:16:27Z

Fix also setOnlyDistinct behaviour in CSVReader and other readers (cf. #45)

fix: create multiple dataframes from same CSVReader

6d64704

- fixed by moving to version 3.6.0 of the fastcsv library (closes #38) - fixed handling of file encoded with BOM (closes #42)

AuPath requested a review from marioscrock March 27, 2025 08:26

AuPath self-assigned this Mar 27, 2025

Add CSVStreamReader and abstract class CSVReaderAbstract

0da8a37

marioscrock approved these changes Apr 9, 2025

View reviewed changes

marioscrock assigned marioscrock and unassigned AuPath Apr 9, 2025

Fix setOnlyDistinct behaviour in Reader (close #45)

5a6d930

marioscrock added 2 commits May 5, 2025 12:13

fix resolveIRI behaviour if not absolute URI

03c8ec2

Add slash encoding

117f2a6

marioscrock merged commit 4a769e9 into main Jun 9, 2025

AuPath deleted the fix-csvreader branch June 17, 2025 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: create multiple dataframes from same CSVReader#44

fix: create multiple dataframes from same CSVReader#44
marioscrock merged 5 commits intomainfrom
fix-csvreader

AuPath commented Mar 27, 2025

Uh oh!

marioscrock commented Mar 28, 2025

Uh oh!

marioscrock commented Apr 9, 2025

Uh oh!

marioscrock commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AuPath commented Mar 27, 2025

Uh oh!

marioscrock commented Mar 28, 2025

Uh oh!

marioscrock commented Apr 9, 2025

Uh oh!

marioscrock commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants