Skip to content

Adding support to Unicode characters over codepoint 0xffff#63

Merged
sigmavirus24 merged 5 commits intoyaml:masterfrom
peterkmurphy:master
Aug 8, 2017
Merged

Adding support to Unicode characters over codepoint 0xffff#63
sigmavirus24 merged 5 commits intoyaml:masterfrom
peterkmurphy:master

Conversation

@peterkmurphy
Copy link
Copy Markdown
Contributor

This patch is aimed at solving issue #25 . As a side effect, the testing code has been trimmed to accommodate the fixes. Why have I done that? I might as well repeat what I said on gmail:

The problem I am having is with the testing code, which will need a lot of unwrangling. There are some frankly ... "bizarre" assumptions in what won't work in parseable code, except it sometimes will be parseable by accident. For example:


def test_unicode_input_errors(unicode_filename, verbose=False):
    data = open(unicode_filename, 'rb').read().decode('utf-8')
    for input in [data.encode('latin1', 'ignore'), # <--- Look at this!
                    data.encode('utf-16-be'), data.encode('utf-16-le'),
                    codecs.BOM_UTF8+data.encode('utf-16-be'),
                    codecs.BOM_UTF16_BE+data.encode('utf-16-le'),
                    codecs.BOM_UTF16_LE+data.encode('utf-8')+'!']:
        try:
            yaml.load(input)
        except yaml.YAMLError, exc:
            if verbose:
                print exc
        else:
            raise AssertionError("expected an exception")

The idea: let's cause some bizarre combinations of byte sequences, attempt to parse it, and if it doesn't throw a YAMLError, raise an exception. Except that when one does data.encode('latin1', 'ignore') on data, one results in ten line breaks, which is happily parseable as YAML. So no exception raised, so AssertionError.

What should I do in this case - remove test_unicode_input_errors from the PyYaml testing code? Yes, the number of tests will go down, which is generally not a good thing, but if the tests are based on dodgy assumptions...

In some cases I have altered testing code; others I have removed them.


PyYAML is written by Kirill Simonov <xi@resolvent.net>. It is released
under the MIT license. See the file LICENSE for more details.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing trailing newlines is bad form as it's diff noise. I'd undo it for all the files you've added it for, which seems to be every file you've touched. Also files should end with a trailing newline to be POSIX valid

if index == 0:
# Leading indicators are special characters.
if ch in u'#,[]{}&*!|>\'\"%@`':
if ch in u'#,[]{}&*!|>\'\"%@`':
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trimming trailing whitespace is bad form as it's diff noise

@peterkmurphy
Copy link
Copy Markdown
Contributor Author

peterkmurphy commented May 10, 2017 via email

@adamchainz
Copy link
Copy Markdown

@peterkmurphy I'm not an admin on this project, I can't do anything to your PR.

@samdmarshall
Copy link
Copy Markdown

you may need to ping @sigmavirus24 or another commiter for this project to get this merged and a new release made.

adamchainz pushed a commit to adamchainz/pyyaml that referenced this pull request May 16, 2017
adamchainz pushed a commit to adamchainz/pyyaml that referenced this pull request May 16, 2017
@adamchainz
Copy link
Copy Markdown

I copied this and tidied it up in #65

@peterkmurphy
Copy link
Copy Markdown
Contributor Author

peterkmurphy commented May 16, 2017 via email

@sigmavirus24 sigmavirus24 merged commit 94c3f07 into yaml:master Aug 8, 2017
@sigmavirus24
Copy link
Copy Markdown
Contributor

Thanks @peterkmurphy! 🎉 ✨

@jborean93
Copy link
Copy Markdown

@ingydotnet is there any chance this fix could be backported to the 3.x branch and a new release made? I can install the pre-release 4.x builds but I haven't seen any action recently that indicates a full release will be made on those changes.

cc @nitzmahone

@nitzmahone
Copy link
Copy Markdown
Member

I'd be +1 for that

@ingydotnet
Copy link
Copy Markdown
Member

@perlpunk and I are meeting up in a week. We might be able to discuss it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants