Series performance improvements by stevezau · Pull Request #2020 · Flexget/Flexget

stevezau · 2017-11-21T10:24:45Z

Motivation for changes:

If 100's of series exist flexget's performance can suffer when parsing.

Detailed changes:

Prefetch Series Data
Better Logic to parsing, right now it loops every entry * configured series..

stevezau · 2017-11-21T11:29:39Z

@gazpachoking @liiight can you take a look at this? not sure other ways we can improve the speeds?

gazpachoking · 2017-11-21T20:35:51Z

Hmm. Is all the manual normalization needed? Series.name is meant to automatically compare with normalized version of the name. How much of a speed difference does this make?

stevezau · 2017-11-21T22:02:04Z

@gazpachoking it didn't work with name.in_. Do you know how to handle that in https://github.com/Flexget/Flexget/blob/develop/flexget/plugins/filter/series.py#L236-L238

stevezau · 2017-11-21T22:03:22Z

With pre-fetch it cut down the process time by 50% for me.

It's still very slow though. if you have an rss feed that has 100 entires and series with 100-200 entries it loops over the 100 entries times by the amount of series you have.. Can you think of a better way? I couldn't.

gazpachoking · 2017-11-22T05:08:05Z

Perhaps we need to implement this on the comparator? http://docs.sqlalchemy.org/en/latest/orm/internals.html#sqlalchemy.orm.properties.ColumnProperty.Comparator.in_

stevezau · 2017-11-22T05:20:37Z

ok, i'll look into that.

Any other idea how we can speed it up. Seems crazy to have to loop over each entry times by the number of series.

If 200 series and 100 entry rss feed that's 20,000 loops

In my case it's more like 30,000-40,000

cvium · 2017-11-22T06:25:12Z

Could we not sort the configured series alphabetically and put them in a dict keyed on first letter like this?

series:
  - a series
  - another series
  - other series
  - third series

-->

series_map = {
    'a': ['a series', 'another series'],
    'o': ['other series'],
    't': ['third series']
}

stevezau · 2017-11-22T06:59:38Z

How will that help @cvium?

Won’t we still need to loop over them all right?

cvium · 2017-11-22T07:01:24Z

We can guess the series name from the title, grab the first letter and look up in the dict and try to match with the (hopefully much shorter) list

stevezau · 2017-11-22T07:35:34Z

Hmm is that safe to do? I guess we need to also check alt names..

cvium · 2017-11-22T07:45:47Z

All possible names would need to have its own node in the tree

stevezau · 2017-11-22T12:41:07Z

@gazpachoking i'm getting lost when trying to figure out how to properly impl series.name.in_

Do you understand how Comparator work? I think we need to impl def in_() but i'm not sure what to return?

gazpachoking · 2017-11-22T14:59:49Z

@stevezau Maybe try this for the comparator:

def in_(self, other):
    return super().in_([normalize_series_name(e) for e in other])

stevezau · 2017-11-22T20:53:13Z

@gazpachoking that's what i thought yesterday but it still calls operate https://github.com/Flexget/Flexget/blob/series_speedup/flexget/plugins/filter/series.py#L238

I can fix it by checking if it's a list in operate but that does not seem right?

liiight · 2017-11-23T12:52:42Z

flexget/plugins/filter/series.py

+        entries_map = {}
+        for entry in task.entries:
+            parsed = parser.parse_series(entry['title'])
+            if parsed.name:


maybe also check parsed.valid, not sure if it'll have a name property if it isn't valid

liiight · 2017-11-23T12:53:32Z

flexget/plugins/filter/series.py

+
+        # Sort Entries into data model similar to https://en.wikipedia.org/wiki/Trie
+        # Only process series if both the entry title and series title first letter match
+        entries_map = {}


you can set it to defaultdict(list), makes for a neater implementation later

liiight · 2017-11-23T12:54:27Z

flexget/plugins/filter/series.py

+            # str() added to make sure number shows (e.g. 24) are turned into strings
+            series_names = [str(s.keys()[0]) for s in config]
+            existing_db_series = session.query(Series).filter(Series.name.in_(series_names))
+            existing_db_series = dict([(s.name_normalized, s) for s in existing_db_series])


existing_db_series = {s.name_normalized: s for s in existing_db_series}

stevezau · 2017-11-23T21:42:13Z

@gazpachoking @liiight @cvium tests are passing. Ready for review.

cvium · 2017-11-26T08:37:42Z

flexget/plugins/filter/series.py

                    db_series.alternate_names = [alt for alt in db_series.alternate_names if alt.alt_name in alts]
                    # Add/update the possibly new alternate names
                else:
+                    # TODO: Remove, added for debugging


Forget something @stevezau ?

stevezau added 4 commits November 21, 2017 15:52

unused imports

3f04b55

speedup SeriesDBManager

a4eeb8a

pre_load identified_by from db

733b4c8

better pre-fetch logic

6a2aec5

stevezau requested review from gazpachoking and liiight November 21, 2017 10:27

stevezau added 2 commits November 21, 2017 22:21

more performance improvements

5633176

remove debug statement

c482fe1

add some debugging as getting unique constraint errors

403fd03

stevezau added 4 commits November 23, 2017 12:14

clean up querying for series

33a013e

impl trie logic to speed up parsing series

2b6326c

Added more logic to reduce queries in on_task_filter

c3688e2

Attempt fix some tests

d158903

liiight reviewed Nov 23, 2017

View reviewed changes

use defaultdict

b611f41

Flexget deleted a comment from liiight Nov 23, 2017

liiight and others added 5 commits November 23, 2017 16:00

fixed identified type not being saved to 'auto' due to redundant expunge

1d94ecb

dict comprehension

78fdca6

fallback to using entry title on failed parsing

e99cf9c

rename char to first_char to remove defined outside scope warning

21e7f2f

if parsing the title fails then map each word to a series. catch-all.

8b385be

stevezau self-assigned this Nov 23, 2017

stevezau changed the title ~~Prefetch Series Data~~ Speedup Series Parsing Nov 23, 2017

stevezau changed the title ~~Speedup Series Parsing~~ Series performance improvements Nov 23, 2017

stevezau merged commit 99ff56c into develop Nov 25, 2017

stevezau deleted the series_speedup branch November 26, 2017 07:26

cvium reviewed Nov 26, 2017

View reviewed changes

cvium mentioned this pull request Jan 12, 2018

Incorrect handling on parentheses in the latest versions #2057

Closed

Conversation

stevezau commented Nov 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation for changes:

Detailed changes:

Uh oh!

stevezau commented Nov 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gazpachoking commented Nov 21, 2017

Uh oh!

stevezau commented Nov 21, 2017

Uh oh!

stevezau commented Nov 21, 2017

Uh oh!

gazpachoking commented Nov 22, 2017

Uh oh!

stevezau commented Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cvium commented Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevezau commented Nov 22, 2017

Uh oh!

cvium commented Nov 22, 2017

Uh oh!

stevezau commented Nov 22, 2017

Uh oh!

cvium commented Nov 22, 2017

Uh oh!

stevezau commented Nov 22, 2017

Uh oh!

gazpachoking commented Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevezau commented Nov 22, 2017

Uh oh!

liiight Nov 23, 2017

Choose a reason for hiding this comment

Uh oh!

liiight Nov 23, 2017

Choose a reason for hiding this comment

Uh oh!

liiight Nov 23, 2017

Choose a reason for hiding this comment

Uh oh!

stevezau commented Nov 23, 2017

Uh oh!

cvium Nov 26, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stevezau commented Nov 21, 2017 •

edited

Loading

stevezau commented Nov 21, 2017 •

edited

Loading

stevezau commented Nov 22, 2017 •

edited

Loading

cvium commented Nov 22, 2017 •

edited

Loading

gazpachoking commented Nov 22, 2017 •

edited

Loading