Skip to content

New grammar approach doesn't allow for some Unicode chars #211

@Azhrei

Description

@Azhrei

I upgraded to ics v0.6 (still Python 3.7.1 Anaconda) after posting my last issue and found some new problems. The EBNF file doesn't allow for Unicode characters above \uFFFF, such as most smileys. (Grinning face is \u0001F601, for example.)

I've modified the .ebnf file with new regular expressions and that got past the smiley issue, but it stopped on an EN_DASH and I don't know why...

Here's the file content now with the old/original values commented out (I'm not saying these are correct as I haven't checked the appropriate RFCs):

# QSAFE_CHAR       = ?"[ \x21\x23-\x7E\u0080-\uffff]";
# QSAFE_CHAR_STAR  = ?"[ \x21\x23-\x7E\u0080-\uffff]*";
QSAFE_CHAR         = ?"[^\x00-\x1F\x22\x7F]";
QSAFE_CHAR_STAR    = ?"[^\x00-\x1F\x22\x7F]*";

# SAFE_CHAR        = ?"[ \x21\x23-\x2B\x2D-\x39\x3C-\x7E\u0080-\uffff]" ;
# SAFE_CHAR_STAR   = ?"[ \x21\x23-\x2B\x2D-\x39\x3C-\x7E\u0080-\uffff]*" ;
SAFE_CHAR          = ?"[^\x00-\x1F\x22\x2C\x3A\x3B\x7F]" ;
SAFE_CHAR_STAR     = ?"[^\x00-\x1F\x22\x2C\x3A\x3B\x7F]*" ;

# VALUE_CHAR       = ?"[ \x21-\x7E\u0080-\uffff]";
# VALUE_CHAR_STAR  = ?"[ \x21-\x7E\u0080-\uffff]*";
VALUE_CHAR         = ?"[^\x00-\x1F\x7F]";
VALUE_CHAR_STAR    = ?"[^\x00-\x1F\x7F]*";

Inverting the REs and selecting the characters I don't want vs. those that I do, not only does the list get shorter but the other Unicode characters are implicitly allowed. Or should be, at least.

I've stepped through the code all the way into the GRAMMAR and a little bit deeper, but debugging a parser is not something I want to do in my spare time. ;) Maybe the tatsu library has its own problems...?

I did notice that the tatsu parser uses Python's built in re module unless regex is available. Given that the re module has a number of deficiencies, I'll try installing regex and have another go at it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions