added urlrewrite plugin for rmz.cr (rapidmoviez.com|rapidmoviez.eu)#1879
added urlrewrite plugin for rmz.cr (rapidmoviez.com|rapidmoviez.eu)#1879liiight merged 11 commits intoFlexget:developfrom
Conversation
flexget/plugins/sites/rmz.py
Outdated
| accept = True | ||
| log.debug('Url: "%s" matched filter: %s', urls[i], regexp) | ||
| break | ||
| if not accept: |
There was a problem hiding this comment.
Since you break when finding a match, you could just use for-else instead of the accept variable.
There was a problem hiding this comment.
Thanks for the tip, I'll try that.
flexget/plugins/sites/rmz.py
Outdated
| schema = { | ||
| 'type': 'object', | ||
| 'properties': { | ||
| 'links_re': {'type': 'array', 'items': {'format': 'regexp'} } |
There was a problem hiding this comment.
- there's a mismatch between this value and the one in the doc (
url_re). - I'm not sure why is this needed. Don't you know the TLD and format you need for the rewrite? why does the user need to input this? URL rewrite plugin do not need to rely on config.
There was a problem hiding this comment.
- I'll fix that asap.
- The link_re config parameter works just like the link_re parameter in the html input plugin (thats why I renamed it).
The reason for using this parameter is that on the rmz.cr site, they often offer links to several filehosters. This parameter allows the user to choose which filehosters he prefers. For example, the user may have an account for one of the filehosters or he may prefer some filehosters because they offer superior download speed or no file size restrictions for free users.
There was a problem hiding this comment.
ah, gotcha. rmz.cr is an indexing site and the links are to external sites.
maybe name it filehoster instead then?
There was a problem hiding this comment.
good idea, done in the latest commit.
flexget/plugins/sites/rmz.py
Outdated
| def url_rewritable(self, task, entry): | ||
| url = entry['url'] | ||
| rewritable_regex = '^https?:\/\/(www.)?(rmz\.cr|rapidmoviez\.(com|eu))\/.*' | ||
| if re.match(rewritable_regex, url): |
There was a problem hiding this comment.
suggestion:
return re.match(rewritable_regex, url) is not None
flexget/plugins/sites/rmz.py
Outdated
| log.debug('Using links_re filters: %s', str(regexps)) | ||
| else: | ||
| log.debug('No link filters configured, using all found links.') | ||
| for i in reversed(range(len(urls))): |
There was a problem hiding this comment.
I need to go through the array in reversed order because I am removing some elements within the for loop.
I just tried and enumerate did not work well with reversed:
when I tried to use reversed(enumerate(urls)), I got the error:
TypeError: argument to reversed() must be a sequence
Then I tried reversed(list(enumerate(urls))) (which is imho at least as ugly as what I used originally) but that gave me the error:
TypeError: list indices must be integers, not tuple
so I gave up and reverted to reversed(range(len(urls)))
flexget/plugins/sites/rmz.py
Outdated
| urls[i]=urls[i].encode('ascii','ignore') | ||
| if regexps: | ||
| accept = False | ||
| for regexp in regexps: |
There was a problem hiding this comment.
this will crash if regexps is None
There was a problem hiding this comment.
ah ok, maybe just do regexps = self.config.get('links_re', []) and avoid the if statement
There was a problem hiding this comment.
o.k. got it to work like this.
flexget/plugins/sites/rmz.py
Outdated
| if not accept: | ||
| log.debug('Url: "%s" does not match any of the given filters: %s', urls[i], str(regexps)) | ||
| del(urls[i]) | ||
| numlinks=len(urls) |
flexget/plugins/sites/rmz.py
Outdated
| del(urls[i]) | ||
| numlinks=len(urls) | ||
| log.info('Found %d links at %s.',numlinks, entry['url']) | ||
| if numlinks>0: |
There was a problem hiding this comment.
not pythonic, no need to check if higher than 0, just do:
if numlinks:
flexget/plugins/sites/rmz.py
Outdated
| else: | ||
| log.debug('No link filters configured, using all found links.') | ||
| for i in reversed(range(len(urls))): | ||
| urls[i]=urls[i].encode('ascii','ignore') |
There was a problem hiding this comment.
Lacking some whitespace around equals and after comma.
flexget/plugins/sites/rmz.py
Outdated
| if not accept: | ||
| log.debug('Url: "%s" does not match any of the given filters: %s', urls[i], str(regexps)) | ||
| del(urls[i]) | ||
| numlinks=len(urls) |
flexget/plugins/sites/rmz.py
Outdated
| del(urls[i]) | ||
| numlinks=len(urls) | ||
| log.info('Found %d links at %s.',numlinks, entry['url']) | ||
| if numlinks>0: |
flexget/plugins/sites/rmz.py
Outdated
| log.info('Found %d links at %s.',numlinks, entry['url']) | ||
| if numlinks>0: | ||
| entry.setdefault('urls', urls) | ||
| entry['url']=urls[0] |
flexget/plugins/sites/rmz.py
Outdated
|
|
||
| @event('plugin.register') | ||
| def register_plugin(): | ||
| plugin.register(UrlRewriteRmz, 'rmz', interfaces=['urlrewriter', 'task'], api_ver=2) No newline at end of file |
There was a problem hiding this comment.
There should be a newline at the end of the file.
There was a problem hiding this comment.
I added a newline a few commits ago and I just checked on my local machine and with the online editor and there is definitely a newline at the end of the file now.
Don't know why this review did not get marked as outdated...
flexget/plugins/sites/rmz.py
Outdated
| urls=[] | ||
| for element in link_elements: | ||
| urls.extend(element.text.splitlines()) | ||
| regexps = self.config.get('links_re', None) |
There was a problem hiding this comment.
None is already the default value returned by dict.get().
There was a problem hiding this comment.
I replaced this with [] to make sure it does not brake my for loop in line 86
flexget/plugins/sites/rmz.py
Outdated
| def url_rewrite(self, task, entry): | ||
| try: | ||
| page = task.requests.get(entry['url']) | ||
| except Exception as e: |
There was a problem hiding this comment.
Could you please suggest an error class to use? I looked at the html plugin but they do not catch errors at all.
I tried to use requests.exceptions.RequestException but got an error.
There was a problem hiding this comment.
that is the correct exception. What error did you get? did you import it correctly?
There was a problem hiding this comment.
I got this fixed now. I had to add from requests.exceptions import RequestException. Now I am using RequestException instead of Exception.
flexget/plugins/sites/rmz.py
Outdated
| try: | ||
| page = task.requests.get(entry['url']) | ||
| except Exception as e: | ||
| raise UrlRewritingError(e) |
There was a problem hiding this comment.
Probably need to wrap e in str()
flexget/plugins/sites/rmz.py
Outdated
| try: | ||
| soup = get_soup(page.text) | ||
| except Exception as e: | ||
| raise UrlRewritingError(e) |
There was a problem hiding this comment.
Probably need to wrap e in str()
| raise UrlRewritingError(e) | ||
| try: | ||
| soup = get_soup(page.text) | ||
| except Exception as e: |
There was a problem hiding this comment.
This one is also too broad, but since BS4 can raise different exceptions it's fine in this case...
flexget/plugins/sites/rmz.py
Outdated
| else: | ||
| log.debug('Url: "%s" does not match any of the given filehoster filters: %s', urls[i], str(regexps)) | ||
| del(urls[i]) | ||
| num_links=len(urls) |
flexget/plugins/sites/rmz.py
Outdated
| log.debug('Url: "%s" does not match any of the given filehoster filters: %s', urls[i], str(regexps)) | ||
| del(urls[i]) | ||
| num_links=len(urls) | ||
| log.info('Found %d links at %s.',num_links, entry['url']) |
flexget/plugins/sites/rmz.py
Outdated
| break | ||
| else: | ||
| log.debug('Url: "%s" does not match any of the given filehoster filters: %s', urls[i], str(regexps)) | ||
| del(urls[i]) |
There was a problem hiding this comment.
It's a bad idea to iterate over a list and delete elements from it. You should either create a copy of the list or add to another list instead of deleting.
There was a problem hiding this comment.
I never experienced any problem when I was traveling the list in reversed order. Are there any performance implications (although we are talking about 100 entries in the list tops)?
There was a problem hiding this comment.
I read up on the performance implications of del.
switched to adding to a new list.
flexget/plugins/sites/rmz.py
Outdated
| log.debug('Url: "%s" does not match any of the given filehoster filters: %s', urls[i], str(regexps)) | ||
| del(urls[i]) | ||
| num_links = len(urls) | ||
| log.info('Found %d links at %s.',num_links, entry['url']) |
There was a problem hiding this comment.
I think log.verbose level would fit here better
There was a problem hiding this comment.
changed that and added the space after the comma.
…ht for the code review.
flexget/plugins/sites/rmz.py
Outdated
| log.debug('Url: "%s" does not match any of the given filehoster filters: %s', urls[i], str(regexps)) | ||
| if regexps: | ||
| log.debug('Using filehosters_re filters: %s', str(regexps)) | ||
| urls=filtered_urls |
There was a problem hiding this comment.
Missing spaces. See about running pep8 on your code. You can also use pycharm which is free
There was a problem hiding this comment.
Thanks for the tip. I used autopep8 (via eclipse).
|
Thanks, care to update the wiki? |
|
@liiight Sure, I'll write a wiki page and link to it under the "Data operations" section next to the generic urlrewrite plugin. Thanks again for your help with improving the code! |
|
Wiki page is done: and linked from the plugins page (section Modification/Data Operations): |
Motivation for changes:
rmz.cr needs an urlrewrite plugin for the rss feed (rmz.cr/feed) because the feed only provides links back to the rmz.cr release page.
Detailed changes:
Config usage if relevant (new plugin or updated schema):
Log and/or tests output (preferably both):
To Do: