Conversation
|
@gazpachoking @liiight can you take a look at this? not sure other ways we can improve the speeds? |
|
Hmm. Is all the manual normalization needed? |
|
@gazpachoking it didn't work with name.in_. Do you know how to handle that in https://github.com/Flexget/Flexget/blob/develop/flexget/plugins/filter/series.py#L236-L238 |
|
With pre-fetch it cut down the process time by 50% for me. It's still very slow though. if you have an rss feed that has 100 entires and series with 100-200 entries it loops over the 100 entries times by the amount of series you have.. Can you think of a better way? I couldn't. |
|
Perhaps we need to implement this on the comparator? http://docs.sqlalchemy.org/en/latest/orm/internals.html#sqlalchemy.orm.properties.ColumnProperty.Comparator.in_ |
|
ok, i'll look into that. Any other idea how we can speed it up. Seems crazy to have to loop over each entry times by the number of series. If 200 series and 100 entry rss feed that's 20,000 loops In my case it's more like 30,000-40,000 |
|
Could we not sort the configured series alphabetically and put them in a dict keyed on first letter like this? --> |
|
How will that help @cvium? Won’t we still need to loop over them all right? |
|
We can guess the series name from the title, grab the first letter and look up in the dict and try to match with the (hopefully much shorter) list |
|
Hmm is that safe to do? I guess we need to also check alt names.. |
|
All possible names would need to have its own node in the tree |
|
@gazpachoking i'm getting lost when trying to figure out how to properly impl series.name.in_ Do you understand how Comparator work? I think we need to impl def in_() but i'm not sure what to return? |
|
@stevezau Maybe try this for the comparator: def in_(self, other):
return super().in_([normalize_series_name(e) for e in other]) |
|
@gazpachoking that's what i thought yesterday but it still calls operate https://github.com/Flexget/Flexget/blob/series_speedup/flexget/plugins/filter/series.py#L238 I can fix it by checking if it's a list in operate but that does not seem right? |
| entries_map = {} | ||
| for entry in task.entries: | ||
| parsed = parser.parse_series(entry['title']) | ||
| if parsed.name: |
There was a problem hiding this comment.
maybe also check parsed.valid, not sure if it'll have a name property if it isn't valid
flexget/plugins/filter/series.py
Outdated
|
|
||
| # Sort Entries into data model similar to https://en.wikipedia.org/wiki/Trie | ||
| # Only process series if both the entry title and series title first letter match | ||
| entries_map = {} |
There was a problem hiding this comment.
you can set it to defaultdict(list), makes for a neater implementation later
flexget/plugins/filter/series.py
Outdated
| # str() added to make sure number shows (e.g. 24) are turned into strings | ||
| series_names = [str(s.keys()[0]) for s in config] | ||
| existing_db_series = session.query(Series).filter(Series.name.in_(series_names)) | ||
| existing_db_series = dict([(s.name_normalized, s) for s in existing_db_series]) |
There was a problem hiding this comment.
existing_db_series = {s.name_normalized: s for s in existing_db_series}
|
@gazpachoking @liiight @cvium tests are passing. Ready for review. |
| db_series.alternate_names = [alt for alt in db_series.alternate_names if alt.alt_name in alts] | ||
| # Add/update the possibly new alternate names | ||
| else: | ||
| # TODO: Remove, added for debugging |
Motivation for changes:
If 100's of series exist flexget's performance can suffer when parsing.
Detailed changes: