New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-99593: Add tests for Unicode C API (part 1) #99651
gh-99593: Add tests for Unicode C API (part 1) #99651
Conversation
Add tests for functions corresponding to the str class methods.
| @unittest.skipIf(_testcapi is None, 'need _testcapi module') | ||
| def test_fromobject(self): | ||
| """Test PyUnicode_FromObject()""" | ||
| from _testcapi import unicode_fromobject as fromobject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might check with str subclass and check that the result is not the same object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Lib/test/test_capi/test_unicode.py
Outdated
| self.assertRaises(ValueError, split, 'a|b|c|d', '') | ||
| self.assertRaises(TypeError, split, 'a|b|c|d', ord('|')) | ||
| self.assertRaises(TypeError, split, [], '|') | ||
| # split(NULL, '|') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this comment stand for? Does the function crash with NULL? Same question for similar rsplit() comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.
| self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d') | ||
| self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г') | ||
| self.assertEqual(translate('abc', []), 'abc') | ||
| self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. None is supposed to delete the "b" character: https://docs.python.org/dev/library/stdtypes.html#text-sequence-type-str
The mapping table must map Unicode ordinal integers to Unicode ordinal integers or None (causing deletion of the character).
Is the doc wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. The surprising part is that str.translate() treats None as "delete:
>>> "abc".translate(str.maketrans({'b': None}))
'ac'
Well, it would be nice to update the doc (maybe in a separated PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because str.translate calls PyUnicode_Translate() with the error handler "ignore".
| self.assertEqual(join('', ['a', 'b', 'c']), 'abc') | ||
| self.assertEqual(join(NULL, ['a', 'b', 'c']), 'a b c') | ||
| self.assertEqual(join('|', ['а', 'б', 'в']), 'а|б|в') | ||
| self.assertEqual(join('ж', ['а', 'б', 'в']), 'ажбжв') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind to add a test with empty strings? Like:
>>> '|'.join(('a', '', 'c'))
'a||c'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Lib/test/test_capi/test_unicode.py
Outdated
| #for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1": | ||
| #for i, ch in enumerate(str): | ||
| #self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i) | ||
| #self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this code commented? if it is meaningless for tailmatch, just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.
Lib/test/test_capi/test_unicode.py
Outdated
| @support.cpython_only | ||
| @unittest.skipIf(_testcapi is None, 'need _testcapi module') | ||
| def test_format(self): | ||
| """Test PyUnicode_Contains()""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| self.assertEqual(isidentifier(" "), 0) | ||
| self.assertEqual(isidentifier("["), 0) | ||
| self.assertEqual(isidentifier("©"), 0) | ||
| self.assertEqual(isidentifier("0"), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind to add a test on 32MB? :-) I often want to create such constant, and each time I forgot that it's an invalid identifier :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| @unittest.skipIf(_testcapi is None, 'need _testcapi module') | ||
| def test_fromobject(self): | ||
| """Test PyUnicode_FromObject()""" | ||
| from _testcapi import unicode_fromobject as fromobject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Lib/test/test_capi/test_unicode.py
Outdated
| self.assertRaises(ValueError, split, 'a|b|c|d', '') | ||
| self.assertRaises(TypeError, split, 'a|b|c|d', ord('|')) | ||
| self.assertRaises(TypeError, split, [], '|') | ||
| # split(NULL, '|') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.
| self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d') | ||
| self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г') | ||
| self.assertEqual(translate('abc', []), 'abc') | ||
| self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is wrong.
Lib/test/test_capi/test_unicode.py
Outdated
| #for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1": | ||
| #for i, ch in enumerate(str): | ||
| #self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i) | ||
| #self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.
Lib/test/test_capi/test_unicode.py
Outdated
| @support.cpython_only | ||
| @unittest.skipIf(_testcapi is None, 'need _testcapi module') | ||
| def test_format(self): | ||
| """Test PyUnicode_Contains()""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| self.assertEqual(isidentifier(" "), 0) | ||
| self.assertEqual(isidentifier("["), 0) | ||
| self.assertEqual(isidentifier("©"), 0) | ||
| self.assertEqual(isidentifier("0"), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| self.assertEqual(join('', ['a', 'b', 'c']), 'abc') | ||
| self.assertEqual(join(NULL, ['a', 'b', 'c']), 'a b c') | ||
| self.assertEqual(join('|', ['а', 'б', 'в']), 'а|б|в') | ||
| self.assertEqual(join('ж', ['а', 'б', 'в']), 'ажбжв') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d') | ||
| self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г') | ||
| self.assertEqual(translate('abc', []), 'abc') | ||
| self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. The surprising part is that str.translate() treats None as "delete:
>>> "abc".translate(str.maketrans({'b': None}))
'ac'
Well, it would be nice to update the doc (maybe in a separated PR).
|
Thanks @serhiy-storchaka for the PR |
|
Sorry, @serhiy-storchaka, I could not cleanly backport this to |
|
Sorry @serhiy-storchaka, I had trouble checking out the |
|
Oh, I didn't notice that you want to backport these tests to Python 3.10 and 3.11. You're motivated :-) If it's too complicated, maybe just add them to Python 3.12, no? _testcapi changed a lot since Python 3.11 (splited into multiple files). |
|
I think that we should backport as many tests as possible, otherwise we risk to miss a regression introduced before the particular test was added. Especially if we do so many changes in C API. |
Add tests for functions corresponding to the str class methods.