Skip to content

remove utf8 encoding from csv read for latest CEC modules#141

Closed
sjanzou wants to merge 1 commit into
developfrom
unicode_csvread
Closed

remove utf8 encoding from csv read for latest CEC modules#141
sjanzou wants to merge 1 commit into
developfrom
unicode_csvread

Conversation

@sjanzou

@sjanzou sjanzou commented Jun 3, 2022

Copy link
Copy Markdown
Collaborator

No description provided.

@sjanzou

sjanzou commented Jun 4, 2022

Copy link
Copy Markdown
Collaborator Author

Goes with SAM pull request NatLabRockies/SAM#1066

@brtietz brtietz changed the base branch from patch to develop June 7, 2022 21:47
@cpaulgilman

cpaulgilman commented Jun 8, 2022

Copy link
Copy Markdown
Collaborator

@sjanzou It looks like we originally added UTF-8 to support unicode in file names: 0fbe95d. I don't see a related issue in WEX, SSC, or SAM so am not sure what problem that fixed, but we should avoid reintroducing whatever that problem was.

The current proposed fix is to avoid issues when reading CSV files created from the CEC module and inverter equipment list Excel files. I think we can address this in our effort to improve our CEC library process instead of modifying WEX: When we convert the CEC data from Excel into SAM's inverter and module library CSV files, the unicode characters can cause problems in the SAM user interface, and may be the cause of some problems we are having with SSC/6parsolve. One workaround is to manually remove unicode characters from the CSV file with regular expression search and replace [^\x00-\x7F]+ with empty character or use Python with its more robust Excel reading packages instead of LK for the conversion process. (These problematic characters appear in both string labels and data columns.)

@sjanzou

sjanzou commented Jun 9, 2022

Copy link
Copy Markdown
Collaborator Author

@sjanzou It looks like we originally added UTF-8 to support unicode in file names: 0fbe95d. I don't see a related issue in WEX, SSC, or SAM so am not sure what problem that fixed, but we should avoid reintroducing whatever that problem was.

The current proposed fix is to avoid issues when reading CSV files created from the CEC module and inverter equipment list Excel files. I think we can address this in our effort to improve our CEC library process instead of modifying WEX: When we convert the CEC data from Excel into SAM's inverter and module library CSV files, the unicode characters can cause problems in the SAM user interface, and may be the cause of some problems we are having with SSC/6parsolve. One workaround is to manually remove unicode characters from the CSV file with regular expression search and replace [^\x00-\x7F]+ with empty character or use Python with its more robust Excel reading packages instead of LK for the conversion process. (These problematic characters appear in both string labels and data columns.)

@cpaulgilman, agreed. I will remove this pull request and the related SAM pull request 1066 and we will keep the CEC processing of Unicode separately and can revisit in the future, if necessary.

@sjanzou sjanzou closed this Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants