Add url rewrite support for bt.hliang.com#1630
Add url rewrite support for bt.hliang.com#1630liiight merged 7 commits intoFlexget:developfrom jonathanou:develop
Conversation
bt.hliang.com provides asian tv show torrents. However, it does not support rss feeds. To auto download shows, use http plugin to download the homepage of bt.hliang.com, and this url rewrite plugin will do the following: 1. Go through each entry url and follow the link to open the details page. 2. In the details page, search for the actual torrent download url (any link that contains "down.php?[unique id]" 3. Return the url if found, or raise UrlRewriteError.
Forgot to escape '.' and '?' for the "down.php?" regex pattern.
flexget/plugins/sites/hliang.py
Outdated
| @plugin.internet(log) | ||
| def parse_download_page(self, url, requests): | ||
| txheaders = {'User-agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} | ||
| page = requests.get(url, headers=txheaders) |
There was a problem hiding this comment.
This needs to be in a try except block
flexget/plugins/sites/hliang.py
Outdated
|
|
||
| @event('plugin.register') | ||
| def register_plugin(): | ||
| plugin.register(UrlRewriteHliang, 'hliang', groups=['urlrewriter'], api_ver=2) |
1. Wrapped requests.get() call inside try-except 2. Changed plugin registration to specify "interfaces" instead of "groups" Additional changes: 1. Removed unused urllib2 import. 2. Grouped import statements.
| def parse_download_page(self, url, requests): | ||
| txheaders = {'User-agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} | ||
| try: | ||
| page = requests.get(url, headers=txheaders) |
There was a problem hiding this comment.
Please split this to two try except blocks and try avoiding catching Exception if it can be done
There was a problem hiding this comment.
Can you explain why we need 2 try-except blocks instead of one? Also, why is catching Exception not desirable? I'm new to the flexget code, is there some top level catch-all that will handle uncaught exceptions? I think I've seen logs where flexget crashed because Exceptions weren't caught.
There was a problem hiding this comment.
Yeah, there is a catch all but this is irrelevant to my point.
You have two separate operations, request and soup, and each one can fail independently. Having one big block with all poss exceptions is just poor code which is hard to debug and maintain
There was a problem hiding this comment.
I apologize for the questions, I'm not super experienced in python programming. I appreciate any insight you can give me.
So just to clarify a bit:
-
Do you have an example somewhere that demonstrates what you're looking for here? I based my code from Flexget/flexget/plugins/sites/deadfrog.py, which doesn't even put the request call in the try block (a bug, I guess). I took a brief look at other py files in the sites folder and they are all very similar.
-
In my experience with C++/C# programming, when a client calls an interface, it needs to know what exception types to expect if the operation fails. UrlRewritingError seems to be this agreed exception type.
Therefore my thought was it is not appropriate to leak exceptions solely due to implementation details out to the caller. I could one day change my implementation and throw a completely different set of exceptions. Catching all exception types, wrapping it in UrlRewritingError, and rethrowing allows the caller to only ever expect one type of exception. It seems like the right thing to do.
- I don't quite understand the logic behind using different try blocks for different function calls. Yes each one can fail independently, but if all I'm going to do is wrap the exception and throw UrlRewritingError, why is it better for debugging and maintainability for separate try-blocks? We can set breakpoints to figure out which line threw exceptions.
And is having separate try blocks really that much maintainable than a single try with all your implementation in it? I'd almost argue it seems like the code is more readable with a single try-block.
There was a problem hiding this comment.
- Just that a plugin code is what it is, doesn't mean it's OK. We try to constantly improve the code quality of stuff, especially with new PRs where it's easier (since the plugin dev is still around).
- I didn't mean that you shouldn't raise a UrlRewritingError, I meant that if possible, avoiding catching all Exception, since that can swallow up stuff you maybe would like to treat differently. Again, this is more of a global guide line than a rule, and it certainly can be that this fits here.
- By having as minimal code as possible in each
try, you make sure the outcome is expected. This is especially true since you are catchingExceptionin the second part, which could be more relevant to the soup part than request part. So if ANY request error that isn't expected we will still raise that error.
I hope this comes off in the manner which I mean it to, which is just trying to improve the global health of the plugin and not deter you from this in any way.
There was a problem hiding this comment.
Sure, I certainly can understand wanting to maintain a high level of code quality. I just wanted to make sure I understood everything.
I've updated the PR after reading what Exception type is thrown by the requests library. For the soup library, however, it did not seem to guarantee a specific type of Exception. So I've left it still catching Exception as before.
When making request calls, catch request specific exceptions and log the error before raising UrlRewritingError. For the soup library call, it is not immediately clear what Exception types are guaranteed to throw, therefore it is left unchanged.
flexget/plugins/sites/hliang.py
Outdated
| try: | ||
| page = requests.get(url, headers=txheaders) | ||
| except requests.exceptions.RequestException as e: | ||
| log.exception('Cannot open "%s"' % url, e) |
There was a problem hiding this comment.
this should be:
log.error('Cannot open "%s": %s' ,url, e)There was a problem hiding this comment.
I'll change it, but can you explain why we should prefer log.error over log.exception? I read the documentation that said log.exception is the same as log.error and should be called from the exception handler.
From stackoverflow it seems that log.exception will automatically show the traceback and log.error won't, is this something we want in the log?
Change from log.exception to log.error as requested in the pull request. Also fixed how the arguments are being passed, the message takes a format string and the arguments that follow are the values. The previous implementation was incorrect.
flexget/plugins/sites/hliang.py
Outdated
| try: | ||
| soup = get_soup(page.text) | ||
| except Exception as e: | ||
| raise UrlRewritingError(e) |
There was a problem hiding this comment.
Last thing, you need to use str(e) or e.args[0] since you need to pass a str
There was a problem hiding this comment.
I used e.message instead, I think that gives better information.
There was a problem hiding this comment.
Exception.message has been deprecated. Don't use it.
There was a problem hiding this comment.
So what should I use? str(e) like @liiight mentioned?
There was a problem hiding this comment.
Ah I found the answer, I'll use str(e), thanks.
flexget/plugins/sites/hliang.py
Outdated
| page = requests.get(url, headers=txheaders) | ||
| except requests.exceptions.RequestException as e: | ||
| log.error('Cannot open "%s" : %s', url, e) | ||
| raise UrlRewritingError(e) |
Instead of storing the actual exception, store exception message in UrlRewritingError. Printing the exception as a string shows the traceback, which for log files is a little verbose. Showing the the exception message helps give more meaningful info.
Exception.message is deprecated, use str(e) as recommended in https://stackoverflow.com/questions/1272138/baseexception-message-deprecated-in-python-2-6 .
|
@liiight , it's been a couple of days now, is there anything else you'd like me to change? Or is this ready to be merged? |
|
Thanks for this! |
bt.hliang.com provides asian tv show torrents. However, it does not support rss feeds. To auto download shows, use http plugin to download the homepage of bt.hliang.com, and this url rewrite plugin will do the following: