gh-130197: Test various encodings with pygettext#132244
gh-130197: Test various encodings with pygettext#132244tomasr8 wants to merge 4 commits intopython:mainfrom
Conversation
serhiy-storchaka
left a comment
There was a problem hiding this comment.
I do not think that duplicating this test with multiple encodings is needed. It is enough to test with one encoding -- and it should not be Latin1 or Windows-1252, which are often the default encoding. The CPU time can be spent on different tests.
Please add also non-ASCII comments.
Finally, we need to add tests for non-ASCII filenames on non-UTF-8 locale. I afraid that i18n_data cannot be used for this -- we need to try several locales with different encodings and generate an input file with corresponding name.
We need also to test the stderr output for files with non-ASCII file name and non-ASCII source encoding on non-UTF-8 locale. It contains a file name and may contain a fragment of the source text.
For context: #131902 (comment)
We currently set the charset of the POT file as the default encoding on the system (
fp.encoding):cpython/Tools/i18n/pygettext.py
Lines 574 to 576 in f5639d8
To have reproducible tests regardless of the OS they are running on, we set
-X utf8in the tests. As a consequence, the POT charset is always set toutf-8. I don't think there's an easy way to control that if we want to test other output encodings.. At least with these tests we know that non-utf8 input files can be read correctly.cc @serhiy-storchaka Let me know if this is what you had in mind for the tests!