-
Notifications
You must be signed in to change notification settings - Fork 129
Description
I upgraded to ics v0.6 (still Python 3.7.1 Anaconda) after posting my last issue and found some new problems. The EBNF file doesn't allow for Unicode characters above \uFFFF, such as most smileys. (Grinning face is \u0001F601, for example.)
I've modified the .ebnf file with new regular expressions and that got past the smiley issue, but it stopped on an EN_DASH and I don't know why...
Here's the file content now with the old/original values commented out (I'm not saying these are correct as I haven't checked the appropriate RFCs):
# QSAFE_CHAR = ?"[ \x21\x23-\x7E\u0080-\uffff]";
# QSAFE_CHAR_STAR = ?"[ \x21\x23-\x7E\u0080-\uffff]*";
QSAFE_CHAR = ?"[^\x00-\x1F\x22\x7F]";
QSAFE_CHAR_STAR = ?"[^\x00-\x1F\x22\x7F]*";
# SAFE_CHAR = ?"[ \x21\x23-\x2B\x2D-\x39\x3C-\x7E\u0080-\uffff]" ;
# SAFE_CHAR_STAR = ?"[ \x21\x23-\x2B\x2D-\x39\x3C-\x7E\u0080-\uffff]*" ;
SAFE_CHAR = ?"[^\x00-\x1F\x22\x2C\x3A\x3B\x7F]" ;
SAFE_CHAR_STAR = ?"[^\x00-\x1F\x22\x2C\x3A\x3B\x7F]*" ;
# VALUE_CHAR = ?"[ \x21-\x7E\u0080-\uffff]";
# VALUE_CHAR_STAR = ?"[ \x21-\x7E\u0080-\uffff]*";
VALUE_CHAR = ?"[^\x00-\x1F\x7F]";
VALUE_CHAR_STAR = ?"[^\x00-\x1F\x7F]*";
Inverting the REs and selecting the characters I don't want vs. those that I do, not only does the list get shorter but the other Unicode characters are implicitly allowed. Or should be, at least.
I've stepped through the code all the way into the GRAMMAR and a little bit deeper, but debugging a parser is not something I want to do in my spare time. ;) Maybe the tatsu library has its own problems...?
I did notice that the tatsu parser uses Python's built in re module unless regex is available. Given that the re module has a number of deficiencies, I'll try installing regex and have another go at it.