3.7 regex fix by tobinjt · Pull Request #2162 · Flexget/Flexget

tobinjt · 2018-07-01T22:24:33Z

Motivation for changes:

Fix breakage with Python 3.7.

Detailed changes:

https://docs.python.org/3.7/library/re.html#re.sub
In Python 3.6 onwards, the replacement text can't contain \w or any similar regex sequences. Use re.split and join instead.

Addressed issues:

Without this change this crash occurs:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/flexget/task.py", line 486, in __run_plugin
    return method(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flexget/event.py", line 23, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flexget/plugins/filter/all_series.py", line 48, in on_task_metainfo
    if guess_entry(entry, config=group_settings):
  File "/usr/local/lib/python3.7/site-packages/flexget/plugins/metainfo/series.py", line 47, in guess_entry
    allow_seasonless=allow_seasonless)
  File "/usr/local/lib/python3.7/site-packages/flexget/plugins/parsers/plugin_parsing.py", line 74, in parse_series
    return parser.parse_series(data, name=name, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flexget/plugins/parsers/parser_internal.py", line 47, in parse_series
    parser.parse(data)
  File "/usr/local/lib/python3.7/site-packages/flexget/utils/titles/series.py", line 225, in parse
    name_to_re(name, self.ignore_prefixes, self) for name in [self.name] + self.alternate_names)
  File "/usr/local/lib/python3.7/site-packages/flexget/utils/tools.py", line 205, in __init__
    list.__init__(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/flexget/utils/titles/series.py", line 225, in <genexpr>
    name_to_re(name, self.ignore_prefixes, self) for name in [self.name] + self.alternate_names)
  File "/usr/local/lib/python3.7/site-packages/flexget/plugins/parsers/parser_common.py", line 85, in name_to_re
    res = re.sub(' +', blank + '*', res, re.UNICODE)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \w at position 5

To Do:

3 tests still fail with similar errors, but tracking them down is hard because an exception is raised when handling the exception :(

To make debugging easier, log contents of fetched URLs at trace level. Browsers process Javascript, handle iframes, and more that flexget doesn't; seeing the actual data processed by flexget rather than looking at the URL in your browser makes it easier to figure out why you're not getting any results.

…elop

#1558 will systemically handle this problem.

…elop

https://docs.python.org/3.7/library/re.html#re.sub In Python 3.6 onwards, the replacement text can't contain \w or any similar regex sequences. Use re.split and join instead.

serhiy-storchaka · 2018-07-03T17:43:31Z

An alternate fix is duplicating backslashes in the replacement string:

res = re.sub(' +', blank.replace('\\', r'\\') + '*', res, flags=re.UNICODE)

Note also that the third positional argument of re.sub() is a maximal count of replacements, not flags. Passing re.UNICODE as a third positional argument you just order re.sub() to make not more than 128 replacements. There is the same bug in other places.

tobinjt · 2018-07-06T21:40:15Z

I can do that instead - would that be preferred?

Thanks,

paranoidi · 2018-08-16T13:34:35Z

@tobinjt
Hmm, I think duplicate \'s are preferred, it's proper escaping after all ..

JohnDoee · 2018-08-21T09:19:09Z

This is currently blocking 3.7 support and the only issue left discussed seems to be how to style the line.

The proposed fix works as far as I can see and the flags issue highlighted by @serhiy-storchaka should be moved to a different issue.

…elop

tobinjt · 2018-08-21T23:55:40Z

I've changed it to use replace() instead of split() and join().
There are still a small number of test failures but this resolves the vast majority.

…elop

https://docs.python.org/3.7/library/re.html#re.sub In Python 3.6 onwards, the replacement text can't contain \w or any similar regex sequences.

https://docs.python.org/3/library/re.html#re.escape "Changed in version 3.7: Only characters that can have special meaning in a regular expression are escaped." "/" does not have special meaning in regular expressions so it is no longer escaped.

tobinjt · 2018-08-25T16:34:19Z

Added commits fixing another 2 failing tests, 1 failing test remains.

JohnDoee · 2018-08-25T16:38:08Z

flexget/tests/test_rtorrent.py

-        assert 'd.directory.set=\\/data\\/downloads' in fields
+        assert ('d.directory.set=\\/data\\/downloads' in fields
+                # Python 3.7+.
+                or 'd.directory.set=/data/downloads' in fields)


Isn't that additional slash intentional ?

In Python <= 3.6 more characters were escaped by re.escape(), including '/', but in Python >= 3.7 re.escape() only escapes regex metacharacters like '$' and '.'. To make the test pass with Python >= 3.7 we need to look for 'd.directory.set=/data/downloads', with earlier versions we need to look for 'd.directory.set=\\/data\\/downloads'.

I've pushed a new commit that expands the comment, let me know if it's clear enough or it needs more details.

serhiy-storchaka · 2018-08-25T17:12:11Z

flexget/tests/test_rtorrent.py

        fields = [p for p in called_args[2:]]
        assert len(fields) == 3
-        assert 'd.directory.set=\\/data\\/downloads' in fields
+        assert ('d.directory.set=\\/data\\/downloads' in fields


You could use implementation independent test:

assert ('d.directory.set=' + re.escape('/data/downloads')) in fields

Good idea, done.

tobinjt · 2018-08-25T18:11:51Z

Hmm, I don't think those failing tests are my fault:

Build-agent version 0.1.437-d180d4d7 (2018-08-22T16:07:19+0000)
Starting container flexget/cci-python:2.7
  image cache not found on this host, downloading flexget/cci-python:2.7

  Error pulling image flexget/cci-python:2.7: Error response from daemon: received unexpected HTTP status: 503 Service Unavailable... retrying
  image cache not found on this host, downloading flexget/cci-python:2.7

  Error pulling image flexget/cci-python:2.7: Error response from daemon: received unexpected HTTP status: 503 Service Unavailable... retrying
  image cache not found on this host, downloading flexget/cci-python:2.7

How can I trigger those tests again?

JohnDoee · 2018-08-25T18:16:56Z

@tobinjt The rtorrent test case isn't doing anything regexp related, it is testing the exact data sent to the rtorrent API. That's why I find it all a bit weird and question the slash change.

tobinjt · 2018-08-25T18:24:45Z

@JohnDoee https://github.com/Flexget/Flexget/blob/develop/flexget/plugins/clients/rtorrent.py#L315
re.escape() is used by the plugin, not by the test, and the test validates that the data is set correctly by the plugin so it needs to change.

…elop

cvium · 2018-09-11T06:52:42Z

What's the status on this?

JohnDoee · 2018-09-11T07:37:36Z

@cvium To me it looks like it can be merged as is.

The rtorrent regex escape stuff is just plain strange and completely unrelated to how rtorrent API works. Might be smart to fix correctly in a different pull request if the tests doesn't pass (although I think they were fixed).

…elop

tobinjt · 2018-09-12T19:57:59Z

@JohnDoee @cvium As far as I know it can be merged - there are no outstanding comments and it fixes a huge number of test failures under Python 3.7.
There are other problems with Python 3.7 but they will need to be fixed separately.

Thanks,

This reverts commit a72fd85.

cvium · 2018-09-30T17:53:38Z

@tobinjt @JohnDoee I had to revert these changes. I should've looked at it more carefully. The changes in this PR are definitely not the correct way to handle this.

Changed in version 3.7: Unknown escapes in repl consisting of '\' and an ASCII letter now are errors.

The keyword here is unknown escapes. There are quite a few known escapes that are affected by simply replacing \ with \\ in the repl string.

I think a better solution for the manipulate plugin may be to restrict the use of backslashes in the config schema.

asm0dey · 2018-09-30T18:02:49Z

@cvium should I report bug here or IRC is enough?

cvium · 2018-09-30T18:05:02Z

There is no bug (anymore).

serhiy-storchaka · 2018-09-30T18:47:09Z

As the author of this change in Python 3.7 I found this PR correct. Have I missed something?

cvium · 2018-09-30T19:01:57Z

Well, it broke something that worked previously. This is an excerpt from IRC:

<removed> Hey guys
<removed> Suddenly my replace patterns are broken
<removed> they look like this: replace:
<removed> regexp: 'e(\d{2}-)?(\d{2})'
<removed> format: 'e\2'
<removed> and suddenly I see \2 in names!

I also added a unit test that verifies that the capture groups could be referenced with \1 prior to the changes, but not after.

serhiy-storchaka · 2018-09-30T19:05:23Z

Ah, if replace_config['format'] can contain group references, it shouldn't be escaped.

asm0dey · 2018-09-30T19:23:35Z

@serhiy-storchaka well, then you should update wiki, I think. Because right now Manipulate post on wiki tells that syntax is correct.
@cvium thank you for reverting merge and fixing bug this way, but I suppose one day you'll need to add compatibility with python 3.7.

asm0dey · 2018-09-30T19:31:58Z

@serhiy-storchaka oh, I see, it was notice for yourself, not for me.

tobinjt · 2018-09-30T21:16:10Z

Sorry for the breakage :(

@serhiy-storchaka Is there a function in Python 3.7 that escapes the necessary characters but doesn't touch others? I wasn't able to find one.

@cvium It would be great if there were tests for that functionality. Are there other places where this is likely to break?

tobinjt added 13 commits December 25, 2016 17:04

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

cce83b3

…elop

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

1b578ff

…elop

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

0e22d0c

…elop

Merge branch 'develop' of github.com:tobinjt/Flexget into develop

a3850ed

Collapse log lines; add comments.

1909bc4

Remove exception handling.

0307fdf

#1558 will systemically handle this problem.

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

d0773b7

…elop

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

2e9f445

…elop

Merge branch 'develop' of github.com:tobinjt/Flexget into develop

0fbe1e8

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

4dd4993

…elop

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

b320328

…elop

Workaround 3.7 re.sub() restrictions.

0f8686a

https://docs.python.org/3.7/library/re.html#re.sub In Python 3.6 onwards, the replacement text can't contain \w or any similar regex sequences. Use re.split and join instead.

JohnDoee mentioned this pull request Aug 19, 2018

Python3.7 support #2193

Closed

4 tasks

tobinjt added 2 commits August 22, 2018 00:38

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

ac5eee1

…elop

Use replace() rather than split() and join().

3bbbf95

tobinjt mentioned this pull request Aug 22, 2018

Upgrade rpyc and pip-tools. #2195

Closed

tobinjt added 6 commits August 22, 2018 22:38

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

59f374a

…elop

Merge branch 'develop' into 3.7_regex_fix

989d631

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

da73b24

…elop

Workaround 3.7 re.sub() restrictions.

d6cd4e4

https://docs.python.org/3.7/library/re.html#re.sub In Python 3.6 onwards, the replacement text can't contain \w or any similar regex sequences.

Merge branch 'develop' into 3.7_regex_fix

74b2ed7

JohnDoee reviewed Aug 25, 2018

View reviewed changes

Expand explanatory comment.

82bb14b

serhiy-storchaka reviewed Aug 25, 2018

View reviewed changes

Use re.escape() to be version independent.

a1a2cb0

tobinjt added 2 commits August 29, 2018 23:04

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

0765a4f

…elop

Merge branch 'develop' into 3.7_regex_fix

6e2f8c7

tobinjt added 2 commits September 12, 2018 20:38

Merge branch 'develop' of https://github.com/Flexget/Flexget into dev…

5068985

…elop

Merge branch 'develop' into 3.7_regex_fix

30b88d3

cvium merged commit a72fd85 into Flexget:develop Sep 27, 2018

tobinjt deleted the 3.7_regex_fix branch September 27, 2018 22:35

cvium added a commit that referenced this pull request Sep 30, 2018

Revert "3.7 regex fix (#2162)"

b864f32

This reverts commit a72fd85.

Conversation

tobinjt commented Jul 1, 2018

Motivation for changes:

Detailed changes:

Addressed issues:

To Do:

Uh oh!

serhiy-storchaka commented Jul 3, 2018

Uh oh!

tobinjt commented Jul 6, 2018

Uh oh!

paranoidi commented Aug 16, 2018

Uh oh!

JohnDoee commented Aug 21, 2018

Uh oh!

tobinjt commented Aug 21, 2018

Uh oh!

tobinjt commented Aug 25, 2018

Uh oh!

JohnDoee Aug 25, 2018

Choose a reason for hiding this comment

Uh oh!

tobinjt Aug 25, 2018

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Aug 25, 2018

Choose a reason for hiding this comment

Uh oh!

tobinjt Aug 25, 2018

Choose a reason for hiding this comment

Uh oh!

tobinjt commented Aug 25, 2018

Uh oh!

JohnDoee commented Aug 25, 2018

Uh oh!

tobinjt commented Aug 25, 2018

Uh oh!

cvium commented Sep 11, 2018

Uh oh!

JohnDoee commented Sep 11, 2018

Uh oh!

tobinjt commented Sep 12, 2018

Uh oh!

cvium commented Sep 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asm0dey commented Sep 30, 2018

Uh oh!

cvium commented Sep 30, 2018

Uh oh!

serhiy-storchaka commented Sep 30, 2018

Uh oh!

cvium commented Sep 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Sep 30, 2018

Uh oh!

asm0dey commented Sep 30, 2018

Uh oh!

asm0dey commented Sep 30, 2018

Uh oh!

tobinjt commented Sep 30, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cvium commented Sep 30, 2018 •

edited

Loading

cvium commented Sep 30, 2018 •

edited

Loading