Skip to content

Bad escapes in broken links regexp #672

@jwilk

Description

@jwilk

proselint/checks/links/broken.py contains the following code (with boring parts of the regexp omitted):

    regex = re.compile(
        r"""...boring...
        |[^\s`!()\[\]{};:\'".,<>?\xab\xbb\u201c\u201d\u2018\u2019\u21a9]))""",
        re.U)

But the \uXXXX escape sequences for regexps were added only in Python 3.3.
In earlier versions, \u stands for literal u.
In other words, in Python 2.7, this code is equivalent to:

    regex = re.compile(
        r"""...boring...
        |[^\s`!()\[\]{};:\'".,<>?\xab\xbb01289acdu]))""",
        re.U)

...which is certainly not what you wanted.

The dubious regexp was found using pydiatra.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug: fixedA bug that has been fixed in a releasetype: fixA bug fix

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions