Skip to content

new sort_by_weight plugin#1533

Merged
paranoidi merged 14 commits intoFlexget:developfrom
Andy2244:sort_by_weighted
Dec 19, 2016
Merged

new sort_by_weight plugin#1533
paranoidi merged 14 commits intoFlexget:developfrom
Andy2244:sort_by_weighted

Conversation

@Andy2244
Copy link
Copy Markdown
Contributor

@Andy2244 Andy2244 commented Dec 4, 2016

Motivation for changes:

I wanted the ability to better sort the discover results, so the best release could be picked-up. Basically i wanted to weight different stats/data fields, so the filter plugins can pick the best from the top.

Detailed changes:

  • new sort_by_weight plugin that takes multiple fields as input (at least 2)
  • stores the calculated weighted result in the sort_by_weight_sum and sorts so the highest weight entry is on top of the list
  • can use multiple config parameters to improve default rules

Config usage if relevant (new plugin or updated schema):

Simple

sort_by_weighted:
      - field: content_size
        weight: 80
      - field: newznab_grabs
        weight: 25

Advanced

        sort_by_weighted:
          - field: content_size
            weight: 80              # we want large files mainly = good quality
            delta_distance: 500     # anything within 500 MB gets the same weight
          - field: newznab_pubdate
            weight: 25              # we still like new releases
            delta_distance: 7 days      # anything within 7 days is similar
            upper_limit: 60         # confine results to 0-60 days, anything older 60 days gets 0 weight
            inverse: yes            # inverse weight results for date/age fields (oldest age gets lowest score)
          - field: newznab_grabs
            weight: 25              # we like releases that others already downloaded, aka safeguard against crap
            upper_limit: 100        # anything over 100 grabs is fine and gets maximum weight

Example of my own 720p based movie newznab settings:

    sort_by_weight:
      - field: content_size
        weight: 80
        delta_distance: 500
        upper_limit: 10000
      - field: newznab_hydraindexerscore
        weight: 25
        delta_distance: 1
      - field: newznab_pubdate
        weight: 30
        delta_distance: 30 days
        upper_limit: 500
        inverse: yes
      - field: newznab_grabs
        weight: 25
        upper_limit: 100

@cvium
Copy link
Copy Markdown
Contributor

cvium commented Dec 4, 2016

I think splitting limits_min_max into two min_value and max_value is better for the users. Alternatively, it could be called range, but instead of a list it's just a special string like 0-100. I prefer the former though.

The current format is weird and hackish imo.

field: Name of the sort field
weight: The sort weight used, values between 10-200 are good starts
weight_default: The default weight used if a sort 'field' could not be found or had a invalid entry (default is: 0)
inverse: Use inverse weighting for the field, example: Date/Age fields
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly inverse or weight_default?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line that I'm commenting on. Line 38.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inverse weighting means that the lowest result slot gets the highest score and the highest gets the lowest score. So for Age/Date fields the entries with (0 days) get the maximum weight and older entries start getting lower and lower weights.


log = logging.getLogger('sort_by_weighted')

SUPPORTED_TYPES = (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this exactly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types?
Mainly to safeguard from using unsupported fields. Sorting by string for example would need something like "similarity" between them for expected results.

def on_task_filter(self, task, config):
# [field] = [weight, weight_default, delta, inverse, [min,max]]
settings = {}
for centry in config:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

centry is a weird variable name imo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aye, will be changed

@Andy2244 Andy2244 changed the title new sort_by_weighted plugin [WIP] new sort_by_weighted plugin Dec 5, 2016
@Andy2244 Andy2244 changed the title [WIP] new sort_by_weighted plugin new sort_by_weighted plugin Dec 5, 2016
@Andy2244
Copy link
Copy Markdown
Contributor Author

Andy2244 commented Dec 5, 2016

I'm happy with it at this point, so consider it for a merge.

'minItems': 2
}

# def on_task_start(self, task, config):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments that serve no purpose should be removed

@cvium
Copy link
Copy Markdown
Contributor

cvium commented Dec 5, 2016

I feel the schema is overly complex, but I'm not sure how to remedy that.

@Andy2244
Copy link
Copy Markdown
Contributor Author

Andy2244 commented Dec 5, 2016

I don't expect this plugin to-be widely used and remember you only need this at a bare minimum, which looks less scary and still delivers usable results.

sort_by_weighted:
      - field: content_size
        weight: 80
      - field: newznab_grabs
        weight: 25

remove unused comment
@paranoidi
Copy link
Copy Markdown
Member

paranoidi commented Dec 6, 2016

Why is it 'sort_by_weighted' instead of 'sort_by_weight'? Could this all be combined into sort plugin. Not a huge fan of multiple plugins doing same things ..

@Andy2244
Copy link
Copy Markdown
Contributor Author

Andy2244 commented Dec 6, 2016

The name was just what came first to mind and coveys the idea of a 'weighted' score sort. As for combining the sorts, this would make the config schema more complex, but could be done. I simply did not feel confident enough to fiddle with a existing plugin and existing configs.
Also keep in mind that sort_by work on all_entries, while weighted uses accepted/undecided by design. I really had a hard time cramping yet another config parameter in the schema to set the states used for the sort.

value = value.days
elif isinstance(value, bool):
value = int(value)
if len(settings[key]) == 6:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic value

if value is None:
continue
if key not in max_values:
if len(settings[key]) == 6:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic value

else:
max_value = value
else:
if len(settings[key]) == 6:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic value

continue
if key not in max_values:
if len(settings[key]) == 6:
max_value = min(value, settings[key][5])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic value?

@paranoidi
Copy link
Copy Markdown
Member

Assuming backwards compatibility it would be fine to combine this into sort.

@Andy2244
Copy link
Copy Markdown
Contributor Author

Andy2244 commented Dec 6, 2016

Can refactor the local settings array to a dict, to remove the magic values. I can try combining the two, but might need help with the schema.

@Andy2244 Andy2244 changed the title new sort_by_weighted plugin [WIP] new sort_by_weighted plugin Dec 6, 2016
@paranoidi
Copy link
Copy Markdown
Member

Just took a fresh look at sort_byvpkugin, not so sure about merging anymore. It seems to be very field specific ..

@Andy2244
Copy link
Copy Markdown
Contributor Author

Andy2244 commented Dec 7, 2016

Yes i also feel it works quite different, so might confuse usage. I'm trying to refactor the algorithm so it can work with any type that has a lt/gt comparator, mainly to support the Quality type.

max value fixes
Quality type support
proper timedelta handling
@Andy2244 Andy2244 changed the title [WIP] new sort_by_weighted plugin [WIP] new sort_by_weight plugin Dec 8, 2016
fixed crash on weight calculation
fixed lower defaults
updated examples
@Andy2244
Copy link
Copy Markdown
Contributor Author

Andy2244 commented Dec 9, 2016

oki, did refactor the code to technically support all types, but the weight results for types like string are somewhat undefined. The mayor types like int/float/bool/Quality/datetime/timedelta work as expected.
I still don't like merging it with the sort_by plugin, since the config gets more complex and its more clear that this is a special type of sort.

return
config = self.prepare_config(config)
log.info('sorting ´undecided´,´accepted´ entries by weight!')
log.info('sorting undecided, accepted entries by weight')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like unnecessary logging. You would expect it to run if you've enabled it...

Copy link
Copy Markdown
Contributor Author

@Andy2244 Andy2244 Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make clear that the weighting is only calculated for undecided, accepted entries, which is different to how sort_by works. It still sorts all entries at the end, but the other states get the default weight of 0.

@Andy2244 Andy2244 changed the title [WIP] new sort_by_weight plugin new sort_by_weight plugin Dec 9, 2016
@Andy2244
Copy link
Copy Markdown
Contributor Author

Just wondering do i have to add a test for every new plugin i write, to-be accepted for a merge? So far none of my plugins have been merged yet or is there a minimum time that has to lapse without new changes?
What exactly are the requirements to get something merged? I see some pull request are like 6 months old, so i'm a little confused.

@paranoidi
Copy link
Copy Markdown
Member

@Andy2244 Process is slow sometimes , but your contributions are greatly appreciated! Join IRC / Gitter for more hands on involvement :)

Unit tests are greatly appreciated as we are going to maintain this plugin long term. Only other issue that I can spot now is the too long docstring which should not prevent merging :)

Can you add some unit test and we'll get this one merged right away! It's not required but I would feel much better if it had some.

cleanup exceptions, logging
@Andy2244
Copy link
Copy Markdown
Contributor Author

Implemented suggested changes.

PS: I know its bad practice, but i want to finish the Anidb lookup plugin first, before i look at the testcode for each my plugins.

@paranoidi
Copy link
Copy Markdown
Member

@Andy2244 Understandable, a lot to take in such short time. I'm merging this in. Please update the wiki page to include this plugin! :)

@paranoidi paranoidi merged commit a6cf081 into Flexget:develop Dec 19, 2016
@Andy2244 Andy2244 deleted the sort_by_weighted branch January 6, 2017 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants