Skip to content

[added] new search plugin for private tracker torrentday#1597

Merged
liiight merged 8 commits intoFlexget:developfrom
zosky:zosky-td-1
Jan 6, 2017
Merged

[added] new search plugin for private tracker torrentday#1597
liiight merged 8 commits intoFlexget:developfrom
zosky:zosky-td-1

Conversation

@zosky
Copy link
Copy Markdown
Contributor

@zosky zosky commented Dec 31, 2016

i used torrentleech as a starting point. The main difference being that TL uses uname/pass to login, generate the cookies & access the search pages. TD's login page has captcha, so instead i put the 3 cookies it needs as required keys. this should be fine, because in my browser they all have expiry date of 2038. Beyond that, the 2 sites have a slightly different CSS so i'm look for some different classes and divs (this is my first PR ever, so plz be gentle)

Motivation for changes:

TD is my primary private tracker and TL secondary
i'd like to discover stuff. maybe others will too

Detailed changes:

  • new private tracker search plugin

Config usage if relevant (new plugin or updated schema):

    discover:
      from:
        - torrentday:
           uid: xxxxxxxxxxxxx  (required)  NOT YOUR LOGIN. find this in your browser's cookies
           passkey: xxxxxxxxx  (required)  NOT YOUR PASSWORD. see previous
           cfduid: xxxxxxxxxx  (required)  AGAIN IN THE COOKIES
           rss_key: xxxxxxxxx  (required)  get this from your profile page
           category: xxxxxxxx

Log and/or tests output (preferably both):

https://dl.dropboxusercontent.com/u/28529352/flexget-torrentday-test.log

i used torrentleech as a starting point. The main difference being that TL uses uname/pass to login, generate the cookies & access the search pages. TD's login page has captcha, so instead i put the 3 cookies it needs as required keys. this should be fine, because in my browser they all have expiry date of 2038. Beyond that, the 2 sites have a slightly different CSS so i'm look for some different classes and divs (this is my first PR ever, so plz be gentle)
sorted out tabs and spaces
removed 1 garbage line
if 'url' not in entry:
log.error("Didn't actually get a URL...")
else:
log.debug("Got the URL: %s" % entry['url'])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass the args to the logger and let it do the string formatting ie. comma instead of %

Copy link
Copy Markdown
Contributor

@cvium cvium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass the arguments to the logger instead of doing explicit string formatting and handle the requests exceptions. Seems fine otherwise.

cookies["pass"] = config['passkey']
cookies["__cfduid"] = config['cfduid']

page = requests.get(url, cookies=cookies).content
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need more exception handling

exception handling & better debug logging
@zosky zosky changed the title new search plugin for private tracker torrentday [added] new search plugin for private tracker torrentday Dec 31, 2016
try:
page = requests.get(url, cookies=cookies).content
except RequestException as e:
raise PluginError('Could not connect to torrentday: %s', str(e))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PluginError only takes one argument. You have to do the string formatting here.


if not isinstance(config, dict):
config = {}
# sort = SORT.get(config.get('sort_by', 'seeds'))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove any useless comments

# find the torrent names
title = tr.find("a", { "class": "torrentName" })
entry['title'] = title.contents[0]
log.debug('title: %s' % title.contents[0])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String formatting


# construct download URL
torrent_url = ( "https://www.torrentday.com/" + torrent_url + '?torrent_pass=' + config['rss_key'] )
log.debug('RSS-ified download link: %s' % torrent_url)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String formatting

# urllib.quote will crash if the unicode string has non ascii characters, so encode in utf-8 beforehand
url = ('https://www.torrentday.com/browse.php?search=' +
quote(query.encode('utf-8')) + filter_url)
log.debug('Using %s as torrentday search url' % url)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String formatting

try:
page = requests.get(url, cookies=cookies).content
except RequestException as e:
raise PluginError('Could not connect to torrentday')
Copy link
Copy Markdown
Contributor

@cvium cvium Jan 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could've changed it to raise PluginError('Could not connect to torrentday: %s' % e)

Search for name from torrentday.
"""

if not isinstance(config, dict):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this, your schema means config cannot be anything other than a dict

categories = [categories]
# If there are any text categories, turn them into their id number
categories = [c if isinstance(c, int) else CATEGORIES[c] for c in categories]
filter_url = '&cata=yes&c%s=1&clear-new=1' % ','.join(str(c) for c in categories)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not mandatory, but I prefer passing URL params as a dict to requests, makes it more readable:

params = { 'cata': 'yes', 'c%s' % ','.join(str(c) for c in categories): 1, 'clear-new': 1}

Then add it with params=params in the requests call. Just a suggestion

use params rather than putting it all in the url
also removed check for 'config is dict' not necessary, schema mandates it
and fixed crash in scraping seed/leech by stripping number formatting
@paranoidi
Copy link
Copy Markdown
Member

Is it possible to get cookie data by logging into the site, grabbing them from cookies manually sucks ..

@paranoidi
Copy link
Copy Markdown
Member

Looks fine to me besides cumbersome cookie usage.

@zosky
Copy link
Copy Markdown
Contributor Author

zosky commented Jan 3, 2017

i dont like it either, but it works. Their login page has reCaptha so i cant go through the front door & catch their cookies. any suggestions ?

@cvium
Copy link
Copy Markdown
Contributor

cvium commented Jan 5, 2017

One final change I'd like to see is cleaning up your inconsistent use of quotes regarding strings. Sometimes you use double quotes, other times you use single quotes. It has to be one or the other. Single quotes would probably be more in line with the rest of the code.

as requested to match the rest of the project
@liiight liiight merged commit f0d01df into Flexget:develop Jan 6, 2017
@zosky zosky deleted the zosky-td-1 branch January 7, 2017 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants