3.7 regex fix#2162
Conversation
To make debugging easier, log contents of fetched URLs at trace level. Browsers process Javascript, handle iframes, and more that flexget doesn't; seeing the actual data processed by flexget rather than looking at the URL in your browser makes it easier to figure out why you're not getting any results.
#1558 will systemically handle this problem.
https://docs.python.org/3.7/library/re.html#re.sub In Python 3.6 onwards, the replacement text can't contain \w or any similar regex sequences. Use re.split and join instead.
|
An alternate fix is duplicating backslashes in the replacement string: res = re.sub(' +', blank.replace('\\', r'\\') + '*', res, flags=re.UNICODE)Note also that the third positional argument of re.sub() is a maximal count of replacements, not flags. Passing re.UNICODE as a third positional argument you just order re.sub() to make not more than 128 replacements. There is the same bug in other places. |
|
I can do that instead - would that be preferred? Thanks, |
|
@tobinjt |
|
This is currently blocking 3.7 support and the only issue left discussed seems to be how to style the line. The proposed fix works as far as I can see and the |
|
I've changed it to use replace() instead of split() and join(). |
https://docs.python.org/3.7/library/re.html#re.sub In Python 3.6 onwards, the replacement text can't contain \w or any similar regex sequences.
https://docs.python.org/3/library/re.html#re.escape "Changed in version 3.7: Only characters that can have special meaning in a regular expression are escaped." "/" does not have special meaning in regular expressions so it is no longer escaped.
|
Added commits fixing another 2 failing tests, 1 failing test remains. |
flexget/tests/test_rtorrent.py
Outdated
| assert 'd.directory.set=\\/data\\/downloads' in fields | ||
| assert ('d.directory.set=\\/data\\/downloads' in fields | ||
| # Python 3.7+. | ||
| or 'd.directory.set=/data/downloads' in fields) |
There was a problem hiding this comment.
Isn't that additional slash intentional ?
There was a problem hiding this comment.
In Python <= 3.6 more characters were escaped by re.escape(), including '/', but in Python >= 3.7 re.escape() only escapes regex metacharacters like '$' and '.'. To make the test pass with Python >= 3.7 we need to look for 'd.directory.set=/data/downloads', with earlier versions we need to look for 'd.directory.set=\\/data\\/downloads'.
I've pushed a new commit that expands the comment, let me know if it's clear enough or it needs more details.
flexget/tests/test_rtorrent.py
Outdated
| fields = [p for p in called_args[2:]] | ||
| assert len(fields) == 3 | ||
| assert 'd.directory.set=\\/data\\/downloads' in fields | ||
| assert ('d.directory.set=\\/data\\/downloads' in fields |
There was a problem hiding this comment.
You could use implementation independent test:
assert ('d.directory.set=' + re.escape('/data/downloads')) in fields|
Hmm, I don't think those failing tests are my fault: How can I trigger those tests again? |
|
@tobinjt The rtorrent test case isn't doing anything regexp related, it is testing the exact data sent to the rtorrent API. That's why I find it all a bit weird and question the slash change. |
|
@JohnDoee https://github.com/Flexget/Flexget/blob/develop/flexget/plugins/clients/rtorrent.py#L315 |
|
What's the status on this? |
|
@cvium To me it looks like it can be merged as is. The rtorrent regex escape stuff is just plain strange and completely unrelated to how rtorrent API works. Might be smart to fix correctly in a different pull request if the tests doesn't pass (although I think they were fixed). |
|
@tobinjt @JohnDoee I had to revert these changes. I should've looked at it more carefully. The changes in this PR are definitely not the correct way to handle this.
The keyword here is unknown escapes. There are quite a few known escapes that are affected by simply replacing I think a better solution for the |
|
@cvium should I report bug here or IRC is enough? |
|
There is no bug (anymore). |
|
As the author of this change in Python 3.7 I found this PR correct. Have I missed something? |
|
Well, it broke something that worked previously. This is an excerpt from IRC: I also added a unit test that verifies that the capture groups could be referenced with |
|
Ah, if |
|
@serhiy-storchaka well, then you should update wiki, I think. Because right now |
|
@serhiy-storchaka oh, I see, it was notice for yourself, not for me. |
|
Sorry for the breakage :( @serhiy-storchaka Is there a function in Python 3.7 that escapes the necessary characters but doesn't touch others? I wasn't able to find one. @cvium It would be great if there were tests for that functionality. Are there other places where this is likely to break? |
Motivation for changes:
Fix breakage with Python 3.7.
Detailed changes:
https://docs.python.org/3.7/library/re.html#re.sub
In Python 3.6 onwards, the replacement text can't contain \w or any similar regex sequences. Use re.split and join instead.
Addressed issues:
Without this change this crash occurs:
To Do:
3 tests still fail with similar errors, but tracking them down is hard because an exception is raised when handling the exception :(