Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99593: Add tests for Unicode C API (part 1) #99651

Merged
merged 2 commits into from Nov 29, 2022

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 21, 2022

Add tests for functions corresponding to the str class methods.

Add tests for functions corresponding to the str class methods.
Copy link
Member

@vstinner vstinner left a comment

Very nice! Here is my first review :-)

@unittest.skipIf(_testcapi is None, 'need _testcapi module')
def test_fromobject(self):
"""Test PyUnicode_FromObject()"""
from _testcapi import unicode_fromobject as fromobject
Copy link
Member

@vstinner vstinner Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might check with str subclass and check that the result is not the same object.

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.assertRaises(ValueError, split, 'a|b|c|d', '')
self.assertRaises(TypeError, split, 'a|b|c|d', ord('|'))
self.assertRaises(TypeError, split, [], '|')
# split(NULL, '|')
Copy link
Member

@vstinner vstinner Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this comment stand for? Does the function crash with NULL? Same question for similar rsplit() comment below.

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.

self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
self.assertEqual(translate('abc', []), 'abc')
self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})
Copy link
Member

@vstinner vstinner Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. None is supposed to delete the "b" character: https://docs.python.org/dev/library/stdtypes.html#text-sequence-type-str

The mapping table must map Unicode ordinal integers to Unicode ordinal integers or None (causing deletion of the character).

Is the doc wrong?

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc is wrong.

Copy link
Member

@vstinner vstinner Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. The surprising part is that str.translate() treats None as "delete:

>>> "abc".translate(str.maketrans({'b': None}))
'ac'

Well, it would be nice to update the doc (maybe in a separated PR).

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because str.translate calls PyUnicode_Translate() with the error handler "ignore".

self.assertEqual(join('', ['a', 'b', 'c']), 'abc')
self.assertEqual(join(NULL, ['a', 'b', 'c']), 'a b c')
self.assertEqual(join('|', ['а', 'б', 'в']), 'а|б|в')
self.assertEqual(join('ж', ['а', 'б', 'в']), 'ажбжв')
Copy link
Member

@vstinner vstinner Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to add a test with empty strings? Like:

>>> '|'.join(('a', '', 'c'))
'a||c'

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1":
#for i, ch in enumerate(str):
#self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i)
#self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i)
Copy link
Member

@vstinner vstinner Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this code commented? if it is meaningless for tailmatch, just remove it?

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.

@support.cpython_only
@unittest.skipIf(_testcapi is None, 'need _testcapi module')
def test_format(self):
"""Test PyUnicode_Contains()"""
Copy link
Member

@vstinner vstinner Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? :-)

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

self.assertEqual(isidentifier(" "), 0)
self.assertEqual(isidentifier("["), 0)
self.assertEqual(isidentifier("©"), 0)
self.assertEqual(isidentifier("0"), 0)
Copy link
Member

@vstinner vstinner Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to add a test on 32MB? :-) I often want to create such constant, and each time I forgot that it's an invalid identifier :-)

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka left a comment

Thank you for your review Victor. I have a problem with reviewing such large volume of code, especially if many lines looks similar, so I can easily miss some types of errors. Without your help I would not find them.

@unittest.skipIf(_testcapi is None, 'need _testcapi module')
def test_fromobject(self):
"""Test PyUnicode_FromObject()"""
from _testcapi import unicode_fromobject as fromobject
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.assertRaises(ValueError, split, 'a|b|c|d', '')
self.assertRaises(TypeError, split, 'a|b|c|d', ord('|'))
self.assertRaises(TypeError, split, [], '|')
# split(NULL, '|')
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It crashes. It was the first test written by me 4 years ago, before I lost my sign, so I missed to add word CRASHES here.

self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
self.assertEqual(translate('abc', []), 'abc')
self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc is wrong.

#for str in "\xa1", "\u8000\u8080", "\ud800\udc02", "\U0001f100\U0001f1f1":
#for i, ch in enumerate(str):
#self.assertEqual(tailmatch(str, ch, 0, len(str), 1), i)
#self.assertEqual(tailmatch(str, ch, 0, len(str), -1), i)
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied it from other tests (for find/index/count), but did not adapted it to tailmatch yet. I think it is easier to remove it now.

@support.cpython_only
@unittest.skipIf(_testcapi is None, 'need _testcapi module')
def test_format(self):
"""Test PyUnicode_Contains()"""
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

self.assertEqual(isidentifier(" "), 0)
self.assertEqual(isidentifier("["), 0)
self.assertEqual(isidentifier("©"), 0)
self.assertEqual(isidentifier("0"), 0)
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.assertEqual(join('', ['a', 'b', 'c']), 'abc')
self.assertEqual(join(NULL, ['a', 'b', 'c']), 'a b c')
self.assertEqual(join('|', ['а', 'б', 'в']), 'а|б|в')
self.assertEqual(join('ж', ['а', 'б', 'в']), 'ажбжв')
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@vstinner vstinner left a comment

LGTM.

self.assertEqual(translate('abcd', {ord('a'): 'A', ord('b'): ord('B'), ord('c'): '<>'}), 'AB<>d')
self.assertEqual(translate('абвг', {ord('а'): 'А', ord('б'): ord('Б'), ord('в'): '<>'}), 'АБ<>г')
self.assertEqual(translate('abc', []), 'abc')
self.assertRaises(UnicodeTranslateError, translate, 'abc', {ord('b'): None})
Copy link
Member

@vstinner vstinner Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. The surprising part is that str.translate() treats None as "delete:

>>> "abc".translate(str.maketrans({'b': None}))
'ac'

Well, it would be nice to update the doc (maybe in a separated PR).

@serhiy-storchaka serhiy-storchaka merged commit deaa8de into python:main Nov 29, 2022
15 checks passed
@miss-islington
Copy link
Contributor

miss-islington commented Nov 29, 2022

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒🤖 I'm not a witch! I'm not a witch!

@miss-islington
Copy link
Contributor

miss-islington commented Nov 29, 2022

Sorry, @serhiy-storchaka, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.11

@miss-islington
Copy link
Contributor

miss-islington commented Nov 29, 2022

Sorry @serhiy-storchaka, I had trouble checking out the 3.10 backport branch.
Please retry by removing and re-adding the "needs backport to 3.10" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker deaa8dee48beeae9928a418736da0608f2f18361 3.10

@vstinner
Copy link
Member

vstinner commented Nov 29, 2022

Oh, I didn't notice that you want to backport these tests to Python 3.10 and 3.11. You're motivated :-) If it's too complicated, maybe just add them to Python 3.12, no? _testcapi changed a lot since Python 3.11 (splited into multiple files).

@serhiy-storchaka
Copy link
Member Author

serhiy-storchaka commented Nov 29, 2022

I think that we should backport as many tests as possible, otherwise we risk to miss a regression introduced before the particular test was added. Especially if we do so many changes in C API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants