imdb_watchlist plugin fixed issue #2035 by thawn · Pull Request #2050 · Flexget/Flexget

thawn · 2017-12-30T14:58:46Z

Motivation for changes:

imdb_watchlist plugin did not return any entries any more. See issue #2035 .

Detailed changes:

added new function to parse the new JSON code used by a new React widget. The new widget was introduced by imdb on watchlist pages early December 2017. Interestingly, the react widget uses an openly accessible api that returns JSON code which is easier to parse than the html code
updated the existing code to parse other user lists, where imdb changed the layout (but does not use the react widget)

Addressed issues:

Fixes imdb_watchlist plugin broken #2035 .

Log and/or tests output (preferably both):

new JSON parser:

2017-12-30 15:56 VERBOSE  imdb_watchlist IMPORT_TEST_W   Retrieving imdb list: watchlist
2017-12-30 15:56 DEBUG    imdb_watchlist IMPORT_TEST_W   Requesting: http://www.imdb.com/user/ur25395647/watchlist {u'Accept-Language': u'en-us'}
2017-12-30 15:56 DEBUG    utils.requests IMPORT_TEST_W   Fetching URL http://www.imdb.com/user/ur25395647/watchlist with args () and kwargs {'headers': {u'Accept-Language': u'en-us'}, 'allow_redirects': True, 'params': {u'sort': u'list_order%2Casc', u'title_type': u'tvMiniSeries', u'view': u'detail'}, u'timeout': 30}
2017-12-30 15:56 DEBUG    utils.requests IMPORT_TEST_W   Fetching URL http://www.imdb.com/title/data with args () and kwargs {'allow_redirects': True, 'params': {u'ids': u'tt3959576,tt3847626'}, u'timeout': 30}
2017-12-30 15:56 VERBOSE  imdb_watchlist IMPORT_TEST_W   imdb list contains 2 items
2017-12-30 15:56 DEBUG    imdb_watchlist IMPORT_TEST_W   First entry (imdb id: tt3959576) looks like this: {u'starbar': {u'aggregate': 6.7, u'rating': 0, u'votes': 931, u'id': u'tt3959576', u'auth': u''}, u'ribbon': {u'inWL': False, u'tconst': u'tt3959576'}, u'title': {u'plot': u'A policewoman sets out to discover who murdered her husband, an undercover officer.', u'ratings': {u'rating': 6.7, u'votes': 931, u'canVote': True}, u'movieMeterCurrentRank': 5584, u'primary': {u'href': u'/title/tt3959576', u'year': [u'2015', u''], u'title': u'Black Work'}, u'credits': {u'star': [{u'href': u'/name/nm0809956', u'name': u'Sheridan Smith'}, {u'href': u'/name/nm2439913', u'name': u'Matthew McNulty'}, {u'href': u'/name/nm4359366', u'name': u'Oliver Woollford'}, {u'href': u'/name/nm4835867', u'name': u'Honor Kneafsey'}], u'creator': [{u'href': u'/name/nm4131020', u'name': u'Matt Charman'}]}, u'poster': {u'url': u'https://images-na.ssl-images-amazon.com/images/M/MV5BMDI1YmQ2MGMtMDBkNS00OWViLTlkZTAtY2I4ZmNjZDNlMjAzXkEyXkFqcGdeQXVyMjExMjk0ODk@._V1_.jpg', u'width': 1064, u'height': 1500}, u'type': u'series', u'id': u'tt3959576', u'metadata': {u'release': 1434844800000, u'genres': [u'Crime', u'Drama', u'Thriller'], u'runtime': 2700, u'numberOfEpisodes': 3, u'certificate': u''}}}

updated html parser:

2017-12-30 15:45 VERBOSE  imdb_watchlist IMPORT_TEST     Retrieving imdb list: ls056853267
2017-12-30 15:45 DEBUG    imdb_watchlist IMPORT_TEST     Requesting: http://www.imdb.com/list/ls056853267 {u'Accept-Language': u'en-us'}
2017-12-30 15:45 DEBUG    utils.requests IMPORT_TEST     Fetching URL http://www.imdb.com/list/ls056853267 with args () and kwargs {'headers': {u'Accept-Language': u'en-us'}, 'allow_redirects': True, 'params': {u'sort': u'list_order%2Casc', u'title_type': u'movie', u'page': 1, u'view': u'detail'}, u'timeout': 30}
2017-12-30 15:45 VERBOSE  imdb_watchlist IMPORT_TEST     imdb list contains 123 items
2017-12-30 15:45 DEBUG    imdb_watchlist IMPORT_TEST     100 items found on page 1
2017-12-30 15:45 DEBUG    imdb_watchlist IMPORT_TEST     Requesting: http://www.imdb.com/list/ls056853267 {u'Accept-Language': u'en-us'}
2017-12-30 15:45 DEBUG    utils.requests IMPORT_TEST     Fetching URL http://www.imdb.com/list/ls056853267 with args () and kwargs {'headers': {u'Accept-Language': u'en-us'}, 'allow_redirects': True, 'params': {u'sort': u'list_order%2Casc', u'title_type': u'movie', u'page': 2, u'view': u'detail'}, u'timeout': 30}
2017-12-30 15:45 DEBUG    imdb_watchlist IMPORT_TEST     23 items found on page 2

To Do:

Maybe add the new JSON api to flexget/utils/imdb.py (comments?)

cvium · 2017-12-30T16:20:56Z

flexget/plugins/input/imdb_watchlist.py

+        entries = []
+        for imdb_id in imdb_ids:
+            entry = Entry()
+            if not 'title' in json_data[imdb_id] or not 'primary' in json_data[imdb_id]['title'] or not 'href' in json_data[imdb_id]['title']['primary'] or not 'title' in json_data[imdb_id]['title']['primary']:


120 char limit

cvium · 2017-12-30T16:21:31Z

flexget/plugins/input/imdb_watchlist.py

+            if 'type' in json_data[imdb_id]['title']:
+                imdb_type = json_data[imdb_id]['title']['type']
+            else:
+                imdb_type = ''


You should just initialize this to an empty string at the start of the loop

cvium · 2017-12-30T16:22:52Z

flexget/plugins/input/imdb_watchlist.py

+            total_item_count = len(json_vars['list']['items'])
+        else:
+            total_item_count = 0
+        if total_item_count == 0:


if not total_item_count is more pythonic

cvium · 2017-12-30T16:23:08Z

flexget/plugins/input/imdb_watchlist.py

+        if 'list' in json_vars and 'items' in json_vars['list']:
+            total_item_count = len(json_vars['list']['items'])
+        else:
+            total_item_count = 0


Just initialize this to 0 above the if.

cvium · 2017-12-30T16:23:45Z

flexget/plugins/input/imdb_watchlist.py

+        page = self.fetch_page(task, url, params, headers)
+#         log.debug(page.text)
+        re_result = re.search('IMDbReactInitialState.push\((.+?)\);\n', page.text)
+        json_text = re_result.group(1)


This will crash if it doesn't find a match. Should be wrapped in a try-except and spit out a suitable error message informing the user that the plugin may be broken

Thanks, cvium for the thorough code review!
I left line 121 out of the try-except, since the fetch_page function has its own code that handles HTTPErrors or wrong http status codes.

thawn · 2017-12-30T16:51:04Z

Thanks, cvium for the thorough code review!
I think I managed to implement the suggested changes.

cvium · 2017-12-30T17:22:15Z

flexget/plugins/input/imdb_watchlist.py

+        params = {'ids': ','.join(imdb_ids)}
+        url = 'http://www.imdb.com/title/data'
+        try:
+            page = task.requests.get(url, params=params)


If you give it the argument raise_status=True it will raise an exception when the error code is 4xx, 5xx etc. I think that's what you're going for?

And fyi, Requests raises a RequestException, so catch that instead of HTTPError.

cvium · 2017-12-30T17:23:52Z

flexget/plugins/input/imdb_watchlist.py

+        if page.status_code != 200:
+            raise plugin.PluginError('Unable to get imdb title data. Html status code was: %d.' % page.status_code)
+        json_data = json.loads(page.text)
+        log.verbose('imdb list contains %d items' % len(json_data))


All logging calls should use implicit string interpolation ie. the logger takes an arbitrary number of arguments that it will use for string interpolation. As an example, the above line should be log.verbose('imdb list contains %d items', len(json_data)) (notice the comma).

cvium · 2017-12-30T17:25:27Z

flexget/plugins/input/imdb_watchlist.py


-            items = soup.find_all(attrs={'data-item-id': True, 'class': 'list_item'})
+            items = soup.find_all('div', class_='lister-item')
+            if len(items) == 0:


if not items is the pythonic way

cvium · 2017-12-30T17:26:29Z

flexget/plugins/input/imdb_watchlist.py

+        page = self.fetch_page(task, url, params, headers)
+        try:
+            re_result = re.search('IMDbReactInitialState.push\((.+?)\);\n', page.text)
+            json_text = re_result.group(1)


I don't think there's any need for an explicit variable here. You can just change the line below to json.loads(re_result.group(1)).

cvium · 2017-12-30T17:27:10Z

flexget/plugins/input/imdb_watchlist.py

+            raise plugin.PluginError(
+                'Unable to get imdb list from imdb react widget.' +
+                ' Either the list is empty or the imdb parser plugin is broken.' +
+                ' Original error: %s.' % e.args[0])


I think str(e) is better. You lose some information with e.args[0] (like what kind of exception it is). You will need to add from builtins import * # noqa pylint: disable=unused-import, redefined-builtin to the list of imports though for compatibility.

cvium · 2017-12-30T17:29:19Z

flexget/plugins/input/imdb_watchlist.py

+            return
+        imdb_ids = []
+        for item in json_vars['list']['items']:
+            if is_valid_imdb_title_id(item['const']):


This will crash if const is not present. Better to use dict.get() in this case.

cvium · 2017-12-30T17:32:35Z

flexget/plugins/input/imdb_watchlist.py

+        try:
+            page = task.requests.get(url, params=params)
+        except HTTPError as e:
+            raise plugin.PluginError(e.args[0])


cvium

Overall it's pretty good. Just a few corrections needed.

thawn · 2017-12-30T18:38:03Z

I think I got it now. Also checked entire code again for non-pythonic mishaps ;-).

liiight · 2017-12-31T07:24:41Z

flexget/plugins/input/imdb_watchlist.py

+        if 'all' not in config['type']:
+            title_types = []
+            for title_type in config['type']:
+                title_types.append(TITLE_TYPE_MAP[title_type])


Just a style suggestion, list comprehension fits here well:

title_types = [TITLE_TYPE_MAP[title_type] for title_type in config['type']

liiight · 2017-12-31T07:27:25Z

flexget/plugins/input/imdb_watchlist.py

+        url = 'http://www.imdb.com/title/data'
+        page = self.fetch_page(task, url, params, headers)
+        try:
+            json_data = json.loads(page.text)


You can use .json() on a returned Response

liiight · 2017-12-31T07:27:52Z

flexget/plugins/input/imdb_watchlist.py

+        for imdb_id in imdb_ids:
+            imdb_type = ''
+            entry = Entry()
+            if not ('title' in json_data[imdb_id] and


Redundant parentheses

I deliberately added the parentheses here in order to be able to break the long if statement into several lines (as suggested by cvium).
Also: AFAIK, I would have to change this to if not ... or not ... or not ... : in order to remove the parentheses and still get the same result. Therefore, the parentheses save me several additional negations.
For these reasons, I would like to keep the parentheses if that is o.k.

liiight · 2017-12-31T07:37:44Z

flexget/plugins/input/imdb_watchlist.py

+        log.debug('First entry (imdb id: %s) looks like this: %s', imdb_ids[0], json_data[imdb_ids[0]])
+        entries = []
+        for imdb_id in imdb_ids:
+            imdb_type = ''


Does it makes sense to init this to an empty string?

thawn · 2017-12-31T13:58:11Z

Thanks, @liiight for the thorough code review!
I think I managed to implement the suggested changes (except for the parentheses see my answer to your code comment, why I would like to keep them if that is o.k.).

thawn added 5 commits December 30, 2017 15:31

imdb_watchlist plugin fixed issue Flexget#2035

d6e8bec

imdb_watchlist plugin fixed minor formatting issue

c454a00

imdb_watchlist plugin removed obsolete debug code

9c26c7e

imdb_watchlist plugin added some verbose log output

3e80e72

imdb_watchlist plugin removed unused import datetime

285a7df

cvium reviewed Dec 30, 2017

View reviewed changes

imdb_watchlist plugin: implemented changes suggested by cvium (thanks)

59c9447

cvium reviewed Dec 30, 2017

View reviewed changes

cvium requested changes Dec 30, 2017

View reviewed changes

thawn added 2 commits December 30, 2017 19:13

imdb_watchlist plugin: implemented changes suggested by cvium (thanks)

af2f080

imdb_watchlist plugin: implemented changes suggested by cvium (thanks)

09e86c6

cvium approved these changes Dec 30, 2017

View reviewed changes

liiight reviewed Dec 31, 2017

View reviewed changes

imdb_watchlist plugin: implemented changes suggested by liiight (thanks)

b6d7fb7

liiight merged commit be381aa into Flexget:develop Dec 31, 2017

Conversation

thawn commented Dec 30, 2017

Motivation for changes:

Detailed changes:

Addressed issues:

Log and/or tests output (preferably both):

To Do:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thawn commented Dec 30, 2017

Uh oh!

cvium Dec 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cvium Dec 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cvium left a comment

Choose a reason for hiding this comment

Uh oh!

thawn commented Dec 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thawn commented Dec 31, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cvium Dec 30, 2017 •

edited

Loading

cvium Dec 30, 2017 •

edited

Loading