Conversation
bc49b95 to
4b031cb
Compare
4b031cb to
f16f0c6
Compare
|
f16f0c6 to
bdcf827
Compare
move regexes near implementation commented verbose regex for http pattern renamed extra_uri_schemes to extra_schemes
bdcf827 to
be83e7e
Compare
don't try other url types if one already matched no-op function if trim is not enabled avoid backtracking when matching trailing punctuation match head and tail punctuation separately don't scan for unbalanced parentheses more than necessary ensure email domain starts and ends with a word character
|
After looking over the code more, I identified a number of optimizations, so I committed another refactor. |
| for i, word in enumerate(words): | ||
| match = _punctuation_re.match(word) | ||
| head, middle, tail = "", word, "" | ||
| match = re.match(r"^([(<]|<)+", middle) |
There was a problem hiding this comment.
This calls re.match() with the same pattern string for each iteration of the loop (and similar is done below). I don’t think the re module caches recently- or frequently-compiled patterns, in which case these calls would be needlessly recompiling the same pattern in every iteration, is that right? If so, is it worth compiling these patterns once, and reusing the resulting pattern objects here?
There was a problem hiding this comment.
re caches the last 512 patterns used. Compiling every single pattern at import time increases startup time for projects, I'd seen some more (general) discussions about this recently so decided these could be inlined.
There was a problem hiding this comment.
Oh interesting, good to know, thanks!
Fixes #522
Fixes #827
Fixes #1172
The 'urlize' function does not as expected both in terms of some missing support. Among a few a few of the behaviors were a lack of support for 5/8 of the original TLDs, emails prefixed with
mailto:would not be returned as links, and no support forftp://.This PR does not "fix" the latter point, but does add a
policythat can be set so that the function will create the links. This can also be used to add support for any arbitrary URI scheme.Specifically:
extra_uri_schemespolicy and parameter to address lack offtp:support and allow for other schemes (for exampletel:) to be added as needed. (issue urlize filter ignores ftp:// URLs #522 and PR Added ftp urls to be catched in urlize #587)urlizefunction does not compose parentheses correctly at the end #827)https//andhttps://links are well-formed.httpsinstead ofhttp