Default to using strtolower and only use mb_strtolower when available and necessary#92
Conversation
1e32678 to
6705488
Compare
| $kwd = $this->language->case_insensitive ? mb_strtolower($match[0]) : $match[0]; | ||
| $kwd = $this->language->case_insensitive ? self::strToLower($match[0]) : $match[0]; |
There was a problem hiding this comment.
Is there any language that has a keyword with multibyte characters? If not, maybe using mb_strtolower() was overkill in the first place?
There was a problem hiding this comment.
Maybe the warning about mb_strtolower() not being available should be conditional based whether the language has multibyte keywords?
There was a problem hiding this comment.
Another possibility would be to check if the string contains any non-printable ASCII like preg_match( '/[^\x20-\x7E]/', $match[0] ) and then only try to use mb_strtolower() if that is the case.
There was a problem hiding this comment.
I ended up using /[^\x00-\x7F]/, courtesy of a SO answer, to check if the code we're highlighting has any Unicode characters. Right now, I'm defaulting to using mb_strtolower if it's available. Is there a performance gain if I default to using strtolower?
There was a problem hiding this comment.
Yes, very much so: https://3v4l.org/gRXk3
mb_strtolower() is 10x slower than strtolower().
There was a problem hiding this comment.
Even throwing in a preg_match() call for each iteration with strtolower(), mb_strtolower() is still slower: https://3v4l.org/Rj9bo
There was a problem hiding this comment.
Oof, I did not expect mb_strtolower() to be such a performance hit. I appreciate the testcase and example!
I've switched the logic to default to strtolower and only use mb_strtolower if needed and when available.
There was a problem hiding this comment.
I've found strtolower being up to 50x faster ;)
strtolower and only use mb_strtolower when available and necessary
westonruter
left a comment
There was a problem hiding this comment.
LGTM, other than the test failure.
|
I have no idea what's going on with the PHP 7.3 environment, it was originally passing but re-running passed tests now fail. I ran locally with two different 7.3 versions and all tests passed, so I'm not too concerned about merging. |
Currently,
ext-mbstringis a hard Composer dependency, if this plugin is bundled and redistributed, like in a WordPress plugin, then that can't always be met. This has brought into question the necessity of usingmb_strtolower.After some discussion, this PR changes
ext-mbstringinto a suggested dependency for when you need to highlight code snippets with Unicode characters. We now default to usingstrtolowersince it is much faster than themb_equivalent.The new behavior is as follows: if Unicode is detected in the code snippet, then we look to see if
mb_strtoloweris available and use it. If it is not available, it will throw a warning once indicating that the extension or a polyfill should be installed and continue to usestrtoloweras to fail gracefully./cc @westonruter