gh-99593: Add tests for Unicode C API (part 1) #99651

serhiy-storchaka · 2022-11-21T14:59:59Z

Add tests for functions corresponding to the str class methods.

Issue: Add tests for Unicode C API #99593

Add tests for functions corresponding to the str class methods.

vstinner

Very nice! Here is my first review :-)

vstinner · 2022-11-24T14:24:35Z

Lib/test/test_capi/test_unicode.py

+    @unittest.skipIf(_testcapi is None, 'need _testcapi module')
+    def test_fromobject(self):
+        """Test PyUnicode_FromObject()"""
+        from _testcapi import unicode_fromobject as fromobject


You might check with str subclass and check that the result is not the same object.

vstinner · 2022-11-24T14:24:35Z

Lib/test/test_capi/test_unicode.py

+        self.assertRaises(ValueError, split, 'a|b|c|d', '')
+        self.assertRaises(TypeError, split, 'a|b|c|d', ord('|'))
+        self.assertRaises(TypeError, split, [], '|')
+        # split(NULL, '|')


what does this comment stand for? Does the function crash with NULL? Same question for similar rsplit() comment below.

It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.

vstinner · 2022-11-24T14:24:35Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
+        self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
+        self.assertEqual(translate('abc', []), 'abc')
+        self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})


I don't understand. None is supposed to delete the "b" character: https://docs.python.org/dev/library/stdtypes.html#text-sequence-type-str

The mapping table must map Unicode ordinal integers to Unicode ordinal integers or None (causing deletion of the character).

Is the doc wrong?

The doc is wrong.

Ah. The surprising part is that str.translate() treats None as "delete:

>>> "abc".translate(str.maketrans({'b': None})) 'ac'

Well, it would be nice to update the doc (maybe in a separated PR).

Because str.translate calls PyUnicode_Translate() with the error handler "ignore".

vstinner · 2022-11-24T14:24:35Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(join('', ['a', 'b', 'c']), 'abc')
+        self.assertEqual(join(NULL, ['a', 'b', 'c']), 'a b c')
+        self.assertEqual(join('|', ['а', 'б', 'в']), 'а|б|в')
+        self.assertEqual(join('ж', ['а', 'б', 'в']), 'ажбжв')


Would you mind to add a test with empty strings? Like:

>>> '|'.join(('a', '', 'c')) 'a||c'

vstinner · 2022-11-24T14:24:35Z

Lib/test/test_capi/test_unicode.py

+        #for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1":
+            #for i, ch in enumerate(str):
+                #self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i)
+                #self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i)


why is this code commented? if it is meaningless for tailmatch, just remove it?

I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.

vstinner · 2022-11-24T14:24:35Z

Lib/test/test_capi/test_unicode.py

+    @support.cpython_only
+    @unittest.skipIf(_testcapi is None, 'need _testcapi module')
+    def test_format(self):
+        """Test PyUnicode_Contains()"""


really? :-)

vstinner · 2022-11-24T14:24:35Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(isidentifier(" "), 0)
+        self.assertEqual(isidentifier("["), 0)
+        self.assertEqual(isidentifier("©"), 0)
+        self.assertEqual(isidentifier("0"), 0)


Would you mind to add a test on 32MB? :-) I often want to create such constant, and each time I forgot that it's an invalid identifier :-)

serhiy-storchaka

Thank you for your review Victor. I have a problem with reviewing such large volume of code, especially if many lines looks similar, so I can easily miss some types of errors. Without your help I would not find them.

serhiy-storchaka · 2022-11-27T07:52:12Z

Lib/test/test_capi/test_unicode.py

+    @unittest.skipIf(_testcapi is None, 'need _testcapi module')
+    def test_fromobject(self):
+        """Test PyUnicode_FromObject()"""
+        from _testcapi import unicode_fromobject as fromobject


serhiy-storchaka · 2022-11-27T07:52:12Z

Lib/test/test_capi/test_unicode.py

+        self.assertRaises(ValueError, split, 'a|b|c|d', '')
+        self.assertRaises(TypeError, split, 'a|b|c|d', ord('|'))
+        self.assertRaises(TypeError, split, [], '|')
+        # split(NULL, '|')


It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.

serhiy-storchaka · 2022-11-27T07:52:12Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
+        self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
+        self.assertEqual(translate('abc', []), 'abc')
+        self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})


The doc is wrong.

serhiy-storchaka · 2022-11-27T07:52:12Z

Lib/test/test_capi/test_unicode.py

+        #for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1":
+            #for i, ch in enumerate(str):
+                #self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i)
+                #self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i)


I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.

serhiy-storchaka · 2022-11-27T07:52:12Z

Lib/test/test_capi/test_unicode.py

+    @support.cpython_only
+    @unittest.skipIf(_testcapi is None, 'need _testcapi module')
+    def test_format(self):
+        """Test PyUnicode_Contains()"""


serhiy-storchaka · 2022-11-27T07:52:12Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(isidentifier(" "), 0)
+        self.assertEqual(isidentifier("["), 0)
+        self.assertEqual(isidentifier("©"), 0)
+        self.assertEqual(isidentifier("0"), 0)


serhiy-storchaka · 2022-11-27T07:52:12Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(join('', ['a', 'b', 'c']), 'abc')
+        self.assertEqual(join(NULL, ['a', 'b', 'c']), 'a b c')
+        self.assertEqual(join('|', ['а', 'б', 'в']), 'а|б|в')
+        self.assertEqual(join('ж', ['а', 'б', 'в']), 'ажбжв')


vstinner

LGTM.

vstinner · 2022-11-28T09:25:04Z

Lib/test/test_capi/test_unicode.py

+        self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
+        self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
+        self.assertEqual(translate('abc', []), 'abc')
+        self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})


Ah. The surprising part is that str.translate() treats None as "delete:

>>> "abc".translate(str.maketrans({'b': None})) 'ac'

Well, it would be nice to update the doc (maybe in a separated PR).

miss-islington · 2022-11-29T07:59:59Z

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

miss-islington · 2022-11-29T08:00:03Z

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.11

miss-islington · 2022-11-29T08:00:06Z

Sorry @serhiy-storchaka, I had trouble checking out the 3.10 backport branch.
Please retry by removing and re-adding the "needs backport to 3.10" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.10

vstinner · 2022-11-29T10:51:01Z

Oh, I didn't notice that you want to backport these tests to Python 3.10 and 3.11. You're motivated :-) If it's too complicated, maybe just add them to Python 3.12, no? _testcapi changed a lot since Python 3.11 (splited into multiple files).

serhiy-storchaka · 2022-11-29T14:29:04Z

I think that we should backport as many tests as possible, otherwise we risk to miss a regression introduced before the particular test was added. Especially if we do so many changes in C API.

pythongh-99593: Add tests for Unicode C API (part 1)

7f5362f

Add tests for functions corresponding to the str class methods.

serhiy-storchaka added needs backport to 3.10 needs backport to 3.11 labels Nov 21, 2022

serhiy-storchaka requested a review from vstinner Nov 21, 2022

bedevere-bot mentioned this pull request Nov 21, 2022

Add tests for Unicode C API #99593

Open

bedevere-bot added the awaiting core review label Nov 21, 2022

serhiy-storchaka mentioned this pull request Nov 23, 2022

gh-99593: Add tests for Unicode C API #99594

Draft

vstinner reviewed Nov 24, 2022

View changes

serhiy-storchaka commented Nov 27, 2022

View changes

Address review comments.

545400a

vstinner approved these changes Nov 28, 2022

View changes

bedevere-bot added awaiting merge and removed awaiting core review labels Nov 28, 2022

serhiy-storchaka merged commit deaa8de into python:main Nov 29, 2022
15 checks passed

bedevere-bot removed the awaiting merge label Nov 29, 2022

miss-islington assigned serhiy-storchaka Nov 29, 2022

serhiy-storchaka mentioned this pull request Nov 29, 2022

gh-93649: Split unicode tests from _testcapimodule.c & add some more #95819

Merged

gh-99593: Add tests for Unicode C API (part 1) #99651

gh-99593: Add tests for Unicode C API (part 1) #99651

serhiy-storchaka commented Nov 21, 2022 •

edited by bedevere-bot

vstinner left a comment

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

vstinner Nov 28, 2022

serhiy-storchaka Nov 29, 2022

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

vstinner Nov 24, 2022

serhiy-storchaka Nov 27, 2022

serhiy-storchaka left a comment

serhiy-storchaka Nov 27, 2022

serhiy-storchaka Nov 27, 2022

serhiy-storchaka Nov 27, 2022

serhiy-storchaka Nov 27, 2022

serhiy-storchaka Nov 27, 2022

serhiy-storchaka Nov 27, 2022

serhiy-storchaka Nov 27, 2022

vstinner left a comment

vstinner Nov 28, 2022

miss-islington commented Nov 29, 2022

miss-islington commented Nov 29, 2022

miss-islington commented Nov 29, 2022

vstinner commented Nov 29, 2022

serhiy-storchaka commented Nov 29, 2022

gh-99593: Add tests for Unicode C API (part 1) #99651

gh-99593: Add tests for Unicode C API (part 1) #99651

Conversation

serhiy-storchaka commented Nov 21, 2022 • edited by bedevere-bot

vstinner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner left a comment

Choose a reason for hiding this comment

miss-islington commented Nov 29, 2022

miss-islington commented Nov 29, 2022

miss-islington commented Nov 29, 2022

vstinner commented Nov 29, 2022

serhiy-storchaka commented Nov 29, 2022

serhiy-storchaka commented Nov 21, 2022 •

edited by bedevere-bot