Skip to content

Probable bug: IterableCodeExtractor::calculateMatchScore() - mbstring vs PCRE #117

@jrfnl

Description

@jrfnl

Note: This is purely based on a code-review, so I have no test case currently (though I can make one up if needs be).

In the IterableCodeExtractor::calculateMatchScore() method there are two things which caught my attention:

  1. The preg_quote() function is used to escape arbitrary strings, while mb_ereg() is used for the regex matching.
    This is problematic as preg_quote() is part of the PCRE extension, which uses the PCRE regex engine, while mb_ereg() is part of the MBString extension, which uses the Oniguruma regex engine.
    These engines are not 100% compatible, so in effect you could be escaping too much/too little by using preg_quote(), such as the delimiter (used in PCRE, not used in MBString).
  2. The mb_ereg() function is used without the mb_internal_encoding or the mb_regex_encoding being set.
    Whether mb_internal_encoding is needed may depend on where the input comes from, however mb_regex_encoding should most definitely be set.
    The default mb_regex_encoding is EUC-JP in PHP 5.4 and 5.5 and only became UTF-8 in PHP 5.6, though as other code may have also called this function, the encoding being the default can not be relied upon and it should be set before using mb_ereg().
    Note: mb_internal_encoding was deprecated in PHP 5.6 in favour of default_encoding, so you need a compatibility layer here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions