Handle quoted external CSS URLs in privacy plugin#7651
Handle quoted external CSS URLs in privacy plugin#7651squidfunk merged 1 commit intosquidfunk:masterfrom
Conversation
|
Same as #7650 (comment), need a little more input. |
Ah nice catch, missed that one! Will adapt the PR. Here's a reproduction zip with all 3 quote styles: 9.5.42-css-quoted-urls-missing-from-privacy-plugin.zip With the current version, only the unquoted external URL gets fetched. Let me know if these 2 repro zips are clear enough @squidfunk 🙇 Edit: pushed the new regex and also updated the regex101 link above. Should be ready for another round 🏓 |
7bca798 to
c4105b0
Compare
|
Thanks! I believe we need to allow spaces before and after the URL, as according to syntax level 3, specifically how URL tokens are parsed. There are two ways how URLs are handled: if they contain a string, they're just considered to be normal function tokens, which definitely demands for whitespace after and before. If they contain a verbatim URL, I believe we need to support whitespace as well. url\(\s*([\"']?)(?P<url>http?[^)'\"]+)\1\s*\)This now works correctly with the following strings, albeit it consumes the trailing whitespace on the verbatim version, which should not be a problem though: /* correct */
url("https://example.com/images/myImg.jpg");
url('https://example.com/images/myImg.jpg');
url( 'https://example.com/images/myImg.jpg' );
url( "https://example.com/images/myImg.jpg" );
url(https://example.com/images/myImg.jpg);
url( https://example.com/images/myImg.jpg );
/* mismatching */
url('https://example.com/images/myImg.jpg);
url("https://example.com/images/myImg.jpg);
url('https://example.com/images/myImg.jpg");
url(https://example.com/images/myImg.jpg');
url(https://example.com/images/myImg.jpg");
/* non-http links */
url("data:image/jpg;base64,iRxVB0…");
url(myImg.jpg);
url(#IDofSVGpath);PS: How can I share on regex101? I'm too stupid to find the share button 😅 |
c4105b0 to
e3268f7
Compare
|
Perfect, thanks! Added the new pattern. Hehe, Ctrl+S should save it and give you a modal with the link: https://regex101.com/r/LVJJfK/1https://regex101.com/r/LVJJfK/1. Not sure if it requires starting over when receiving a link though. I also just checked and as you say at least urlparse seems to be happy with trailing whitespace: >>> from urllib.parse import urlparse
>>> urlparse("https://example.com/images/myImg.jpg ")
ParseResult(scheme='https', netloc='example.com', path='/images/myImg.jpg ', params='', query='', fragment='') |
|
Perfect, thanks for investigating! I think this is safe to merge then 🤟 |
|
Released as part of 9.5.43. |


This one is very similar to #7650, but keeping it separate as it covers different files. I was wondering why our external woff/woff2 files weren't being downloaded and noticed this issue.
The back-reference helps get the exact URL and to avoid more false positives on invalid quoting, though there are still some - but still better than just expecting quotes on either side IMO. Also, this is probably caught by CSS linters and build system anyway.
I had to introduce named groups to handle this, hence the slight change in logic as
findalldoesn't support named groups. But IMO this might also help in the future if more edge cases show up.🛠️ with ❤️ at Siemens