Skip to content

Duplicates filter: Allow multiple fields and treat first field differently#1231

Closed
BackSlasher wants to merge 2 commits intoFlexget:developfrom
BackSlasher:duplicates-first
Closed

Duplicates filter: Allow multiple fields and treat first field differently#1231
BackSlasher wants to merge 2 commits intoFlexget:developfrom
BackSlasher:duplicates-first

Conversation

@BackSlasher
Copy link
Copy Markdown
Contributor

Motivation for changes:

Allow accepting "unique" entires using the "duplicates" plugin

Detailed changes:

  • field can now be a list of multiple fields. All fields must match to trigger "duplicate" behaviour.
    Entire None values are ignored
  • First result can be "ignored", leading to a "unique" selection (reject all duplicates except first match)
  • Entries are enumerated to a list before iteration. Helps me with positioning.

Addressed issues:

(no tickets)
Personal usecase - select only one entry of each kind (e.g. only one torrent for a movie)

Config usage if relevant (new plugin or updated schema):

None

Log and/or tests output (preferably both):

None

@liiight
Copy link
Copy Markdown
Member

liiight commented Jun 11, 2016

How is this different than seen plugin?

@BackSlasher
Copy link
Copy Markdown
Contributor Author

BackSlasher commented Jun 11, 2016

Seen is cross task and persists between runs.

@liiight
Copy link
Copy Markdown
Member

liiight commented Jun 11, 2016

Only by default, it can be used locally

@BackSlasher
Copy link
Copy Markdown
Contributor Author

@liiight does the "seen" plugin apply for the current entries, or only for entries seen before this run?

@liiight
Copy link
Copy Markdown
Member

liiight commented Jun 12, 2016

If you mean for entires in the same task, I don't think so, but possibly.
But why would you receive the same entries several time during the same task?

@BackSlasher
Copy link
Copy Markdown
Contributor Author

Usage example:

  1. I'm searching for a torrent of MOVIENAME
  2. I get 30 results
  3. I run lookup_trakt
  4. I use the "duplicates" plugin with the movie_name field, only passing one torrent belonging to MOVIENAME

@cvium
Copy link
Copy Markdown
Contributor

cvium commented Jun 12, 2016

Why? Why not use movie_list or something?

@BackSlasher
Copy link
Copy Markdown
Contributor Author

I don't want to use movie_list. I don't its assumption that a movie is "done" as soon as it's torrent is accepted.
I'm working on a different approach that moves the persistent state out of Flexget's DB.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use enumerate instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no answer to that. What are the benefits of it?
I only moved to list because I had some weird behavior when directly accessing items (task.entries[i])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: I'll happily modify if anyone has a better idea for this "loop from this item onward"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.python.org/2/library/functions.html#enumerate

It let's you do something like

for idx, val in enumerate(list(task.entries)):
  bla bla

@cvium
Copy link
Copy Markdown
Contributor

cvium commented Jun 12, 2016

I'm not sure I fully agree with this PR but if the old behaviour still works without having to change config, then we can see about merging it when issues have been worked out.

At a glance it seems highly inefficient. You iterate through the list of entries for every entry ie. O(n^2) -- my brain is pretty fried so there's probably a tighter upper bound than O(n^2). Is there no way to optimize this?

@BackSlasher
Copy link
Copy Markdown
Contributor Author

If there is a better way to achieve this without using duplicates I'll be happy to know about it. Ideally I'd have an only_one plugin that gets a list of fields, and rejects every item that shares these fields with an already-accepted item.
I basically played around with the existing duplicates plugin code - I guess it was largely unused if it's so under-optimized (it did a full n*n iteration)

@gazpachoking
Copy link
Copy Markdown
Member

For the movie situation, there is the seen_movies plugin already

@BackSlasher
Copy link
Copy Markdown
Contributor Author

seen_movies is based on seen (which has the side effect of persisting), and cannot be generalized towards other uses

@gazpachoking
Copy link
Copy Markdown
Member

You can put seen_movies in local mode, to have it apply to just one task. That being said, I'm not necessarily against a better general way, I just worry about things being even more confusing with multiple ways to do the same thing.

@BackSlasher
Copy link
Copy Markdown
Contributor Author

@gazpachoking I toyed with the idea of modifying seen to immediately reject entries based on earlier entries in the same run, but figured it will break more stuff than adding functionality to duplicates. If the will of the Flexget masters is to modify seen, I can look into this

@gazpachoking
Copy link
Copy Markdown
Member

@BackSlasher This change might be fine, haven't thought about it fully. @paranoidi You made duplicates plugin, right? Any input?

@BackSlasher
Copy link
Copy Markdown
Contributor Author

@paranoidi Ping
Reminder: we can merge this or enrich seen, I think it's a good idea to merge this for starters.

@BackSlasher
Copy link
Copy Markdown
Contributor Author

@gazpachoking ping

Serves a use case when you want to accept the first item but
reject any duplicates following
All fields must match to be treated as duplicates
@paranoidi
Copy link
Copy Markdown
Member

It is semantically very weird to use duplicates plugin to get first unique match. I would make a separate plugin for that purpose which is more suitably named (ie. unique). I was hesitant to even add the action support for this plugin originally. I'm declining this due that concern. I would be willing to accept this refined as a separate plugin though :)

@paranoidi paranoidi closed this Jul 25, 2016
@paranoidi
Copy link
Copy Markdown
Member

@BackSlasher Consider doing another PR. Contributions are appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants