Skip to content

fix: non-utf8 source causes panic, close #3798#3884

Merged
youknowone merged 4 commits intoRustPython:mainfrom
killme2008:fix/non-utf8-panic
Jul 15, 2022
Merged

fix: non-utf8 source causes panic, close #3798#3884
youknowone merged 4 commits intoRustPython:mainfrom
killme2008:fix/non-utf8-panic

Conversation

@killme2008
Copy link
Copy Markdown
Contributor

Try to fix #3798 .

I found that it's panic at unicode_name2:character : https://github.com/progval/unicode_names2/blob/master/src/lib.rs#L323

  let mut buf = [0u8; MAX_NAME_LENGTH + 1];
    for (place, byte) in buf.iter_mut().zip(name.bytes()) {
        *place = ASCII_UPPER_MAP[byte as usize]
    }
    let search_name = &buf[..name.len()];

The unicode name's length exceeds MAX_NAME_LENGTH:

pub const MAX_NAME_LENGTH: usize = 88;

The patch just added a checking before calling unicode_name2:character. But MAX_NAME_LENGTH is not public in unicode_name2, so i added a const to lexer.rs:

const MAX_UNICODE_NAME: usize = 88;

@killme2008
Copy link
Copy Markdown
Contributor Author

killme2008 commented Jul 14, 2022

The extra_tests/snippets/non_utf8.txt is in fact a text file taken from the issue example but github recognize it as binary.

$ cat extra_tests/snippets/non_utf8.txt
" ġÄ%\N{\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\222{{˘"ġ›"Ûˇˇ�$"�ˇÄ%\N{222+-. ġÄ%\N{222{{˘"ġ›

@killme2008 killme2008 force-pushed the fix/non-utf8-panic branch from 62acbd2 to 96488a8 Compare July 14, 2022 11:51
Copy link
Copy Markdown
Contributor

@fanninpm fanninpm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the assert_raises context manager from the testutils module1. (My suggestion will cause RustPython to fail the test because it raises the wrong exception.)

Footnotes

  1. https://github.com/RustPython/RustPython/blob/main/extra_tests/snippets/testutils.py

Comment thread extra_tests/snippets/syntax_non_utf8.py Outdated
Comment thread extra_tests/snippets/syntax_non_utf8.py
Comment thread parser/src/lexer.rs
killme2008 and others added 3 commits July 15, 2022 10:40
Co-authored-by: fanninpm <fanninpm@miamioh.edu>
Co-authored-by: fanninpm <fanninpm@miamioh.edu>
Copy link
Copy Markdown
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch and fix, thank you!

@youknowone youknowone merged commit 7be4860 into RustPython:main Jul 15, 2022
@killme2008 killme2008 deleted the fix/non-utf8-panic branch July 15, 2022 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

non-utf8 source causes panic

3 participants