Add url rewrite support for bt.hliang.com by jonathanou · Pull Request #1630 · Flexget/Flexget

jonathanou · 2017-01-15T11:59:03Z

bt.hliang.com provides asian tv show torrents. However, it does not support rss feeds. To auto download shows, use http plugin to download the homepage of bt.hliang.com, and this url rewrite plugin will do the following:

Go through each entry url and follow the link to open the details page.
In the details page, search for the actual torrent download url (any link that contains "down.php?[unique id]"
Return the url if found, or raise UrlRewriteError.

bt.hliang.com provides asian tv show torrents. However, it does not support rss feeds. To auto download shows, use http plugin to download the homepage of bt.hliang.com, and this url rewrite plugin will do the following: 1. Go through each entry url and follow the link to open the details page. 2. In the details page, search for the actual torrent download url (any link that contains "down.php?[unique id]" 3. Return the url if found, or raise UrlRewriteError.

Forgot to escape '.' and '?' for the "down.php?" regex pattern.

liiight · 2017-01-15T12:01:31Z

flexget/plugins/sites/hliang.py

+    @plugin.internet(log)
+    def parse_download_page(self, url, requests):
+        txheaders = {'User-agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
+        page = requests.get(url, headers=txheaders)


This needs to be in a try except block

liiight · 2017-01-15T12:02:40Z

flexget/plugins/sites/hliang.py

+
+@event('plugin.register')
+def register_plugin():
+    plugin.register(UrlRewriteHliang, 'hliang', groups=['urlrewriter'], api_ver=2)


groups was changed interfaces

1. Wrapped requests.get() call inside try-except 2. Changed plugin registration to specify "interfaces" instead of "groups" Additional changes: 1. Removed unused urllib2 import. 2. Grouped import statements.

liiight · 2017-01-15T15:00:24Z

flexget/plugins/sites/hliang.py

+    def parse_download_page(self, url, requests):
+        txheaders = {'User-agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
+        try:
+            page = requests.get(url, headers=txheaders)


Please split this to two try except blocks and try avoiding catching Exception if it can be done

Can you explain why we need 2 try-except blocks instead of one? Also, why is catching Exception not desirable? I'm new to the flexget code, is there some top level catch-all that will handle uncaught exceptions? I think I've seen logs where flexget crashed because Exceptions weren't caught.

Yeah, there is a catch all but this is irrelevant to my point.
You have two separate operations, request and soup, and each one can fail independently. Having one big block with all poss exceptions is just poor code which is hard to debug and maintain

I apologize for the questions, I'm not super experienced in python programming. I appreciate any insight you can give me.

So just to clarify a bit:

Do you have an example somewhere that demonstrates what you're looking for here? I based my code from Flexget/flexget/plugins/sites/deadfrog.py, which doesn't even put the request call in the try block (a bug, I guess). I took a brief look at other py files in the sites folder and they are all very similar.

In my experience with C++/C# programming, when a client calls an interface, it needs to know what exception types to expect if the operation fails. UrlRewritingError seems to be this agreed exception type.

Therefore my thought was it is not appropriate to leak exceptions solely due to implementation details out to the caller. I could one day change my implementation and throw a completely different set of exceptions. Catching all exception types, wrapping it in UrlRewritingError, and rethrowing allows the caller to only ever expect one type of exception. It seems like the right thing to do.

I don't quite understand the logic behind using different try blocks for different function calls. Yes each one can fail independently, but if all I'm going to do is wrap the exception and throw UrlRewritingError, why is it better for debugging and maintainability for separate try-blocks? We can set breakpoints to figure out which line threw exceptions.

And is having separate try blocks really that much maintainable than a single try with all your implementation in it? I'd almost argue it seems like the code is more readable with a single try-block.

Just that a plugin code is what it is, doesn't mean it's OK. We try to constantly improve the code quality of stuff, especially with new PRs where it's easier (since the plugin dev is still around).

I didn't mean that you shouldn't raise a UrlRewritingError, I meant that if possible, avoiding catching all Exception, since that can swallow up stuff you maybe would like to treat differently. Again, this is more of a global guide line than a rule, and it certainly can be that this fits here.

By having as minimal code as possible in each try, you make sure the outcome is expected. This is especially true since you are catching Exception in the second part, which could be more relevant to the soup part than request part. So if ANY request error that isn't expected we will still raise that error.

I hope this comes off in the manner which I mean it to, which is just trying to improve the global health of the plugin and not deter you from this in any way.

Sure, I certainly can understand wanting to maintain a high level of code quality. I just wanted to make sure I understood everything.

I've updated the PR after reading what Exception type is thrown by the requests library. For the soup library, however, it did not seem to guarantee a specific type of Exception. So I've left it still catching Exception as before.

When making request calls, catch request specific exceptions and log the error before raising UrlRewritingError. For the soup library call, it is not immediately clear what Exception types are guaranteed to throw, therefore it is left unchanged.

liiight · 2017-01-17T16:57:51Z

flexget/plugins/sites/hliang.py

        try:
            page = requests.get(url, headers=txheaders)
+        except requests.exceptions.RequestException as e:
+            log.exception('Cannot open "%s"' % url, e)


this should be:

log.error('Cannot open "%s": %s' ,url, e)

I'll change it, but can you explain why we should prefer log.error over log.exception? I read the documentation that said log.exception is the same as log.error and should be called from the exception handler.

From stackoverflow it seems that log.exception will automatically show the traceback and log.error won't, is this something we want in the log?

Change from log.exception to log.error as requested in the pull request. Also fixed how the arguments are being passed, the message takes a format string and the arguments that follow are the values. The previous implementation was incorrect.

liiight · 2017-01-18T08:13:36Z

flexget/plugins/sites/hliang.py

+        try:
+            soup = get_soup(page.text)
+        except Exception as e:
+            raise UrlRewritingError(e)


Last thing, you need to use str(e) or e.args[0] since you need to pass a str

I used e.message instead, I think that gives better information.

Exception.message has been deprecated. Don't use it.

So what should I use? str(e) like @liiight mentioned?

Ah I found the answer, I'll use str(e), thanks.

liiight · 2017-01-18T08:14:03Z

flexget/plugins/sites/hliang.py

+            page = requests.get(url, headers=txheaders)
+        except requests.exceptions.RequestException as e:
+            log.error('Cannot open "%s" : %s', url, e)
+            raise UrlRewritingError(e)


Instead of storing the actual exception, store exception message in UrlRewritingError. Printing the exception as a string shows the traceback, which for log files is a little verbose. Showing the the exception message helps give more meaningful info.

Exception.message is deprecated, use str(e) as recommended in https://stackoverflow.com/questions/1272138/baseexception-message-deprecated-in-python-2-6 .

jonathanou · 2017-01-21T08:14:07Z

@liiight , it's been a couple of days now, is there anything else you'd like me to change? Or is this ready to be merged?

liiight · 2017-01-21T09:21:21Z

Thanks for this!

Jonathan Ouyang added 2 commits January 15, 2017 18:02

Fix regex issue

ba7840e

Forgot to escape '.' and '?' for the "down.php?" regex pattern.

liiight requested changes Jan 15, 2017

View reviewed changes

Resolved pull request feedback.

0968b44

1. Wrapped requests.get() call inside try-except 2. Changed plugin registration to specify "interfaces" instead of "groups" Additional changes: 1. Removed unused urllib2 import. 2. Grouped import statements.

liiight requested changes Jan 15, 2017

View reviewed changes

liiight reviewed Jan 17, 2017

View reviewed changes

Fix logging call.

1a09029

Change from log.exception to log.error as requested in the pull request. Also fixed how the arguments are being passed, the message takes a format string and the arguments that follow are the values. The previous implementation was incorrect.

liiight requested changes Jan 18, 2017

View reviewed changes

Jonathan Ouyang added 2 commits January 18, 2017 22:27

Use str(e) to display exception message.

bea726c

Exception.message is deprecated, use str(e) as recommended in https://stackoverflow.com/questions/1272138/baseexception-message-deprecated-in-python-2-6 .

liiight approved these changes Jan 21, 2017

View reviewed changes

liiight merged commit b28a6ac into Flexget:develop Jan 21, 2017

Conversation

jonathanou commented Jan 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liiight Jan 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathanou commented Jan 21, 2017

Uh oh!

liiight commented Jan 21, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liiight Jan 17, 2017 •

edited

Loading