Fix unicode support for python binding on windows#21961
Fix unicode support for python binding on windows#21961liuxingbaoyu wants to merge 10 commits intoopencv:4.xfrom
Conversation
VadimLevin
left a comment
There was a problem hiding this comment.
Thank you for PR. Can you verify if this solution works if MBCS is not enabled (Based on this comment the issue is not always reproducible)?
Thank you so much for your review! Do you mean to test the utf8 mode introduced since Windows 10? On my computer, when this feature is enabled, python starts abnormally (because the username contains unicode characters). I checked python's source and it should work fine in utf8 mode. With one exception, utf8 mode was introduced in April 2018, and Very few people use utf8 mode though, especially before version 1903 (because that would break most software). In #18305 (comment) I explained why the code in #18305 (comment) cannot be reproduced . You can try the tests in this PR, it should be 100% reproducible. |
|
I don't have an access to the computer with Windows right now, but overall idea is to check if this patch don't introduce the regression to Windows users with enabled UTF-8 mode. static inline PyObject* PyString_FromString(const char* u)
{
#if defined(_WIN32)
PyObject* str = PyUnicode_DecodeMBCS(u, strlen(u), NULL);
if (str) {
return str;
}
// clears error indicator from failed conversion
PyErr_Clear();
#endif
return PyUnicode_FromString(u);
}
// PyString_FromStringAndSize looks similar to the above implementation.and static inline PyObject* PyUnicode_AsBytes(PyObject* unicode) {
#if defined(_WIN32)
PyObject* bytes = PyUnicode_AsMBCSString(obj);
if (bytes) {
return bytes;
}
// clears error indicator from failed conversion
PyErr_Clear();
#endif
return PyUnicode_AsUTF8String(obj);
} |
|
You are right! |
Do you mean the case when MBCS conversion fails, but UTF8 produces wrong result? But in this case it means that user provided a string in unnatural encoding for the OS. In my opinion, the only way to resolve such situation is to require to provide an encoding. Can you check whenever MBCS encoding fails if provided string is in UTF8 on you Windows machine? |
That's what I meant! I'm worried about garbled characters instead of throwing exceptions when utf8 mode is enabled (when passing a utf8 string to But it's solved now! |
|
I've run into the same issue, and it looks like this PR just got left hanging. @VadimLevin , do you have the bandwidth to take another look since you looked at it in the past? |
92cebcc to
ff457dd
Compare
|
The CI failure after rebasing doesn't seem to be related to this PR, it succeeds on windows, and this PR only changes things under the _WIN32 macro.🤔 |
ff457dd to
9c2cc54
Compare
792dcdf to
94d04eb
Compare
|
@asmorkalov Hi, sorry to bother you, can you approve the CI to run? Thanks! |
|
Thanks! |
The python binding always uses utf8 encoding to handle strings between c and python.
But the opencv c library encoding on windows is mbcs.
Therefore, parameters or returns containing non-ascii characters will become garbled or throw exceptions.
This PR makes python binding use mbcs encoding on windows to process strings in and out of the c library, making it support unicode.
Fixes #4242
Fixes #17600
Fixes #18305
Fixes #21960
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.