Skip to content

Fix encoding problem with comprehensive tests#68

Merged
simonw merged 1 commit intomainfrom
claude/fix-encoding-issue-JQDw8
Dec 21, 2025
Merged

Fix encoding problem with comprehensive tests#68
simonw merged 1 commit intomainfrom
claude/fix-encoding-issue-JQDw8

Conversation

@simonw
Copy link
Owner

@simonw simonw commented Dec 21, 2025

This addresses the encoding issue reported in PR #52 where CSV files encoded in Latin-1/ISO-8859-1 couldn't be processed because the default encoding was hardcoded as UTF-8.

Changes:

  • Add --encoding option to specify character encoding for CSV files
  • Modify build_csv_convert_string() to accept and use the encoding parameter
  • Add validation that --encoding requires --csv or --dialect
  • Add comprehensive tests for Latin-1, ISO-8859-1, and UTF-16 encodings

Examine all currently open issues and PRs and craft a solution to the reported encoding problem with tests

@simonw
Copy link
Owner Author

simonw commented Dec 21, 2025

You forgot to add --encoding to the README and the tests are failing

This addresses the encoding issue reported in PR #52 where CSV files
encoded in Latin-1/ISO-8859-1 couldn't be processed because the default
encoding was hardcoded as UTF-8.

Changes:
- Add --encoding option to specify character encoding for CSV files
- Modify build_csv_convert_string() to accept and use the encoding parameter
- Add validation that --encoding requires --csv or --dialect
- Add comprehensive tests for Latin-1, ISO-8859-1, and UTF-16 encodings
- Document --encoding option in README
@simonw simonw force-pushed the claude/fix-encoding-issue-JQDw8 branch from 7a571a9 to bea5847 Compare December 21, 2025 19:01
@simonw
Copy link
Owner Author

simonw commented Dec 21, 2025

CleanShot 2025-12-21 at 11 30 12@2x

@simonw simonw merged commit 51209c3 into main Dec 21, 2025
10 checks passed
@simonw simonw added the enhancement New feature or request label Dec 21, 2025
@simonw
Copy link
Owner Author

simonw commented Dec 21, 2025

In manually testing this I found this didn't work:

cd /tmp && mkdir -p encoding-demo && cd encoding-demo
git init
printf 'id,name,city\n1,José,São Paulo\n2,François,Montréal\n3,Müller,München\n' | iconv -f UTF-8 -t ISO-8859-1 > places.csv
git add places.csv && git commit -m "Add places data"

Then from my ~/dev/git-history checkout

uv run git-history file places.db places.csv --csv --id id --encoding latin-1 --repo /tmp/encoding-demo

Gave an error:

Error: File not found: /Users/simon/Dropbox/dev/git-history/places.csv

@simonw
Copy link
Owner Author

simonw commented Dec 21, 2025

simonw added a commit that referenced this pull request Dec 21, 2025
simonw added a commit that referenced this pull request Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants