Skip to content

Add url rewrite support for bt.hliang.com#1630

Merged
liiight merged 7 commits intoFlexget:developfrom
jonathanou:develop
Jan 21, 2017
Merged

Add url rewrite support for bt.hliang.com#1630
liiight merged 7 commits intoFlexget:developfrom
jonathanou:develop

Conversation

@jonathanou
Copy link
Copy Markdown
Contributor

bt.hliang.com provides asian tv show torrents. However, it does not support rss feeds. To auto download shows, use http plugin to download the homepage of bt.hliang.com, and this url rewrite plugin will do the following:

  1. Go through each entry url and follow the link to open the details page.
  2. In the details page, search for the actual torrent download url (any link that contains "down.php?[unique id]"
  3. Return the url if found, or raise UrlRewriteError.

Jonathan Ouyang added 2 commits January 15, 2017 18:02
bt.hliang.com provides asian tv show torrents.  However, it does not support rss feeds.  To auto download shows, use http plugin to download the homepage of bt.hliang.com, and this url rewrite plugin will do the following:

1.  Go through each entry url and follow the link to open the details page.
2.  In the details page, search for the actual torrent download url (any link that contains "down.php?[unique id]"
3. Return the url if found, or raise UrlRewriteError.
Forgot to escape '.' and '?' for the "down.php?" regex pattern.
@plugin.internet(log)
def parse_download_page(self, url, requests):
txheaders = {'User-agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
page = requests.get(url, headers=txheaders)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be in a try except block


@event('plugin.register')
def register_plugin():
plugin.register(UrlRewriteHliang, 'hliang', groups=['urlrewriter'], api_ver=2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groups was changed interfaces

1.  Wrapped requests.get() call inside try-except
2.  Changed plugin registration to specify "interfaces" instead of "groups"

Additional changes:
1.  Removed unused urllib2 import.
2.  Grouped import statements.
def parse_download_page(self, url, requests):
txheaders = {'User-agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
try:
page = requests.get(url, headers=txheaders)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this to two try except blocks and try avoiding catching Exception if it can be done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we need 2 try-except blocks instead of one? Also, why is catching Exception not desirable? I'm new to the flexget code, is there some top level catch-all that will handle uncaught exceptions? I think I've seen logs where flexget crashed because Exceptions weren't caught.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there is a catch all but this is irrelevant to my point.
You have two separate operations, request and soup, and each one can fail independently. Having one big block with all poss exceptions is just poor code which is hard to debug and maintain

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the questions, I'm not super experienced in python programming. I appreciate any insight you can give me.

So just to clarify a bit:

  1. Do you have an example somewhere that demonstrates what you're looking for here? I based my code from Flexget/flexget/plugins/sites/deadfrog.py, which doesn't even put the request call in the try block (a bug, I guess). I took a brief look at other py files in the sites folder and they are all very similar.

  2. In my experience with C++/C# programming, when a client calls an interface, it needs to know what exception types to expect if the operation fails. UrlRewritingError seems to be this agreed exception type.

Therefore my thought was it is not appropriate to leak exceptions solely due to implementation details out to the caller. I could one day change my implementation and throw a completely different set of exceptions. Catching all exception types, wrapping it in UrlRewritingError, and rethrowing allows the caller to only ever expect one type of exception. It seems like the right thing to do.

  1. I don't quite understand the logic behind using different try blocks for different function calls. Yes each one can fail independently, but if all I'm going to do is wrap the exception and throw UrlRewritingError, why is it better for debugging and maintainability for separate try-blocks? We can set breakpoints to figure out which line threw exceptions.

And is having separate try blocks really that much maintainable than a single try with all your implementation in it? I'd almost argue it seems like the code is more readable with a single try-block.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Just that a plugin code is what it is, doesn't mean it's OK. We try to constantly improve the code quality of stuff, especially with new PRs where it's easier (since the plugin dev is still around).
  2. I didn't mean that you shouldn't raise a UrlRewritingError, I meant that if possible, avoiding catching all Exception, since that can swallow up stuff you maybe would like to treat differently. Again, this is more of a global guide line than a rule, and it certainly can be that this fits here.
  3. By having as minimal code as possible in each try, you make sure the outcome is expected. This is especially true since you are catching Exception in the second part, which could be more relevant to the soup part than request part. So if ANY request error that isn't expected we will still raise that error.

I hope this comes off in the manner which I mean it to, which is just trying to improve the global health of the plugin and not deter you from this in any way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I certainly can understand wanting to maintain a high level of code quality. I just wanted to make sure I understood everything.

I've updated the PR after reading what Exception type is thrown by the requests library. For the soup library, however, it did not seem to guarantee a specific type of Exception. So I've left it still catching Exception as before.

When making request calls, catch request specific exceptions and log the error before raising UrlRewritingError.

For the soup library call, it is not immediately clear what Exception types are guaranteed to throw, therefore it is left unchanged.
try:
page = requests.get(url, headers=txheaders)
except requests.exceptions.RequestException as e:
log.exception('Cannot open "%s"' % url, e)
Copy link
Copy Markdown
Member

@liiight liiight Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be:

log.error('Cannot open "%s": %s' ,url, e)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it, but can you explain why we should prefer log.error over log.exception? I read the documentation that said log.exception is the same as log.error and should be called from the exception handler.

From stackoverflow it seems that log.exception will automatically show the traceback and log.error won't, is this something we want in the log?

Change from log.exception to log.error as requested in the pull request.  Also fixed how the arguments are being passed, the message takes a format string and the arguments that follow are the values.  The previous implementation was incorrect.
try:
soup = get_soup(page.text)
except Exception as e:
raise UrlRewritingError(e)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last thing, you need to use str(e) or e.args[0] since you need to pass a str

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used e.message instead, I think that gives better information.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception.message has been deprecated. Don't use it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what should I use? str(e) like @liiight mentioned?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I found the answer, I'll use str(e), thanks.

page = requests.get(url, headers=txheaders)
except requests.exceptions.RequestException as e:
log.error('Cannot open "%s" : %s', url, e)
raise UrlRewritingError(e)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Jonathan Ouyang added 2 commits January 18, 2017 22:27
Instead of storing the actual exception, store exception message in UrlRewritingError. Printing the exception as a string shows the traceback, which for log files is a little verbose.  Showing the the exception message helps give more meaningful info.
@jonathanou
Copy link
Copy Markdown
Contributor Author

@liiight , it's been a couple of days now, is there anything else you'd like me to change? Or is this ready to be merged?

@liiight liiight merged commit b28a6ac into Flexget:develop Jan 21, 2017
@liiight
Copy link
Copy Markdown
Member

liiight commented Jan 21, 2017

Thanks for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants