Skip to content

Patch edflib.c on Windows to enable opening UTF-8 encoded paths#237

Merged
skjerns merged 2 commits intoholgern:masterfrom
myd7349:fopen-utf8
Nov 9, 2023
Merged

Patch edflib.c on Windows to enable opening UTF-8 encoded paths#237
skjerns merged 2 commits intoholgern:masterfrom
myd7349:fopen-utf8

Conversation

@myd7349
Copy link
Copy Markdown
Contributor

@myd7349 myd7349 commented Sep 4, 2023

This should fix file opening issue with non-ASCII characters in paths when local encoding is not UTF-8.

This should fix file opening issue with non-ASCII characters in paths
when local encoding is not UTF-8.
@skjerns
Copy link
Copy Markdown
Collaborator

skjerns commented Sep 5, 2023

that looks good :) could you also write a test case that tests for the new behaviour?

@myd7349
Copy link
Copy Markdown
Contributor Author

myd7349 commented Sep 5, 2023

Hi! @skjerns I have tested this PR on this branch: https://github.com/myd7349/pyedflib/commits/ci

And it seems that the current test cases are able to cover the changes made in this PR.

Without changes made by this PR:

https://github.com/myd7349/pyedflib/actions/runs/6086619850/job/16513382367

We can see that there are a lot of UserWarnings caused by Unicode path (windows-2019 Python 3.9):

pyedflib\tests\test_edfreader.py ..............                          [ 25%]
pyedflib\tests\test_edfwriter.py ...........................             [ 73%]
pyedflib\tests\test_highlevel.py ...............                         [100%]

============================== warnings summary ===============================
pyedflib/tests/test_edfreader.py::TestEdfReader::test_BdfReader_Read_accented_file
pyedflib/tests/test_edfreader.py::TestEdfReader::test_read_incorrect_file
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:224: UserWarning: Attempting to write Unicode file D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_file_��'���.bdf on Windows. Consider changing your locale to UTF8.
    self.handle = open_file_writeonly(self.path, self.file_type, self.n_channels)

pyedflib/tests/test_edfreader.py: 12 warnings
pyedflib/tests/test_edfwriter.py: 626 warnings
pyedflib/tests/test_highlevel.py: 590 warnings
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:953: DeprecationWarning: `sample_rate` is deprecated and will be removed in a future release.                           Please use `sample_frequency` instead
    warnings.warn("`sample_rate` is deprecated and will be removed in a future release. \

pyedflib/tests/test_edfreader.py::TestEdfReader::test_BdfReader_Read_accented_file
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfreader.py:320: UserWarning: the filename D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_file_��'���.bdf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) 
    f = pyedflib.EdfReader(self.bdf_accented_file, pyedflib.READ_ALL_ANNOTATIONS, pyedflib.DO_NOT_CHECK_FILE_SIZE)

pyedflib/tests/test_edfreader.py::TestEdfReader::test_EdfReader_Legacy_Header_Info
  C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\case.py:550: DeprecationWarning: Variable 'gender' is deprecated, use 'sex' instead.
    method()

pyedflib/tests/test_edfreader.py::TestEdfReader::test_EdfReader_headerInfos
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfreader.py:182: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    np.testing.assert_equal(f.getGender(), 'Male')  # deprecated

pyedflib/tests/test_edfreader.py::TestEdfReader::test_read_incorrect_file
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfreader.py:368: UserWarning: the filename D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_file_��'���.bdf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) 
    f = pyedflib.EdfReader(self.bdf_accented_file)

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:801: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    self.assertEqual(f.getGender(), expected,

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:808: DeprecationWarning: Function 'setGender' is deprecated, use 'setSex' instead.
    f.setGender(sex)  # deprecated

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:822: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    self.assertEqual(f.getGender(), expected,

pyedflib/tests/test_edfwriter.py: 1 warning
pyedflib/tests/test_highlevel.py: 14 warnings
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:800: DeprecationWarning: The 'sample_rate' parameter is deprecated. Please use 'sample_frequency' instead.
    warnings.warn("The 'sample_rate' parameter is deprecated. Please use "

pyedflib/tests/test_highlevel.py::TestHighLevel::test_fortran_write
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:811: UserWarning: signals are in Fortran order. Will automatically transfer to C order for compatibility with edflib.
    warnings.warn('signals are in Fortran order. Will automatically '

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_unicode
  D:\\a\\pyedflib\\pyedflib\\pyedflib\\highlevel.py:354: UserWarning: the filename D:\\a\\pyedflib\\pyedflib\\pyedflib\\tests\\data\\tmp_utf8-\u4e2d\u6587\u017a\u0105\u015f\u3186\uc6b4\u02b7\u1a04\u2161\u0259\u041f\u0440\U0001f916.edf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) \n    with pyedflib.EdfReader(edf_file) as f:

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_unicode_at_start
  D:\\a\\pyedflib\\pyedflib\\pyedflib\\edfwriter.py:224: UserWarning: Attempting to write Unicode file D:\\a\\pyedflib\\pyedflib\\pyedflib\\tests\\data\\\u4e2d\u6587\u017a\u0105\u015f\u3186\uc6b4\u02b7\u1a04\u2161\u0259\u041f\u0440\U0001f916.edf on Windows. Consider changing your locale to UTF8.\n    self.handle = open_file_writeonly(self.path, self.file_type, self.n_channels)

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_unicode_at_start
  D:\\a\\pyedflib\\pyedflib\\pyedflib\\highlevel.py:354: UserWarning: the filename D:\\a\\pyedflib\\pyedflib\\pyedflib\\tests\\data\\\u4e2d\u6587\u017a\u0105\u015f\u3186\uc6b4\u02b7\u1a04\u2161\u0259\u041f\u0440\U0001f916.edf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) \n    with pyedflib.EdfReader(edf_file) as f:

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_write_accented
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:224: UserWarning: Attempting to write Unicode file D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_��'���.edf on Windows. Consider changing your locale to UTF8.
    self.handle = open_file_writeonly(self.path, self.file_type, self.n_channels)

pyedflib/tests/test_highlevel.py::TestHighLevel::test_read_write_accented
  D:\a\pyedflib\pyedflib\pyedflib\highlevel.py:354: UserWarning: the filename D:\a\pyedflib\pyedflib\pyedflib\tests\data\tmp_��'���.edf contains Unicode, but Windows does not fully support this. Please consider changing your locale to support UTF8. Attempting to load file via workaround (https://github.com/holgern/pyedflib/pull/100) 
    with pyedflib.EdfReader(edf_file) as f:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

With this PR, https://github.com/myd7349/pyedflib/actions/runs/6086686496/job/16513596128, we can see that those UserWarning caused by Unicode path on Windows are gone (windows-2019 Python3.9):

pyedflib\tests\test_edfreader.py ..............                          [ 25%]
pyedflib\tests\test_edfwriter.py ...........................             [ 73%]
pyedflib\tests\test_highlevel.py ...............                         [100%]

============================== warnings summary ===============================
pyedflib/tests/test_edfreader.py: 12 warnings
pyedflib/tests/test_edfwriter.py: 626 warnings
pyedflib/tests/test_highlevel.py: 590 warnings
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:953: DeprecationWarning: `sample_rate` is deprecated and will be removed in a future release.                           Please use `sample_frequency` instead
    warnings.warn("`sample_rate` is deprecated and will be removed in a future release. \

pyedflib/tests/test_edfreader.py::TestEdfReader::test_EdfReader_Legacy_Header_Info
  C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\unittest\case.py:550: DeprecationWarning: Variable 'gender' is deprecated, use 'sex' instead.
    method()

pyedflib/tests/test_edfreader.py::TestEdfReader::test_EdfReader_headerInfos
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfreader.py:182: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    np.testing.assert_equal(f.getGender(), 'Male')  # deprecated

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:801: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    self.assertEqual(f.getGender(), expected,

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:808: DeprecationWarning: Function 'setGender' is deprecated, use 'setSex' instead.
    f.setGender(sex)  # deprecated

pyedflib/tests/test_edfwriter.py: 12 warnings
  D:\a\pyedflib\pyedflib\pyedflib\tests\test_edfwriter.py:822: DeprecationWarning: Method 'getGender' is deprecated, use 'getSex' instead.
    self.assertEqual(f.getGender(), expected,

pyedflib/tests/test_edfwriter.py: 1 warning
pyedflib/tests/test_highlevel.py: 14 warnings
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:800: DeprecationWarning: The 'sample_rate' parameter is deprecated. Please use 'sample_frequency' instead.
    warnings.warn("The 'sample_rate' parameter is deprecated. Please use "

pyedflib/tests/test_highlevel.py::TestHighLevel::test_fortran_write
  D:\a\pyedflib\pyedflib\pyedflib\edfwriter.py:811: UserWarning: signals are in Fortran order. Will automatically transfer to C order for compatibility with edflib.
    warnings.warn('signals are in Fortran order. Will automatically '

@myd7349
Copy link
Copy Markdown
Contributor Author

myd7349 commented Sep 5, 2023

In the CyEdfReader.open function, it first attempts to encode the file name as utf-8 and then passes it to the EDF file reading interface provided by edflib. If opening the file fails, it will then attempt to use get_short_file_name. However, in c_edf.edfopen_file_writeonly, it first tries get_short_file_name and then attempts utf-8 encoding. To make use of the function interfaces that support utf-8 paths, which we have patched, I have removed the get_short_file_name from the latter, while the corresponding logic in the former has been retained.

@skjerns skjerns merged commit 51f4db8 into holgern:master Nov 9, 2023
@skjerns
Copy link
Copy Markdown
Collaborator

skjerns commented Nov 9, 2023

thanks for your addition :)

@myd7349 myd7349 deleted the fopen-utf8 branch December 3, 2023 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants