Skip to content

Allow identical group names#7558

Merged
koppor merged 14 commits into
JabRef:mainfrom
sauliusg:saulius-allow-identical-group-names
Apr 12, 2021
Merged

Allow identical group names#7558
koppor merged 14 commits into
JabRef:mainfrom
sauliusg:saulius-allow-identical-group-names

Conversation

@sauliusg

@sauliusg sauliusg commented Mar 21, 2021

Copy link
Copy Markdown
Contributor

Allow creation of groups with names that already exist somewhere in the group hierarchy.

This in the essence fixes #7554 by making repeated group names "first class citizens" in JabRef, allowing to create them in the GUI (with a warning, of course). It seems suitable for many users (as judged by group interface discussions) and does not bring any undesired consequences (I played for a while with it, everything works as expected, assuming you are fine with the fact that two groups with the same names contain always the same entries. For me this seems OK.).

I offer this PR since the fix was very simple (SEH == Sunday Evening Hack ;) ), and brings a lot of useful functionality IMHO. The alternative name suggestions are possible but will need considerably more effort...

Screenshot at 2021-03-21 17-56-22

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable – not yet but will do later -- S.G.)
  • Tests created for changes (if applicable – seems not, all tests pass S.G.)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository (S.G. -- not yet but will do later).

@tobiasdiez

Copy link
Copy Markdown
Member

Thanks for your PR.

Just a few questions I had while thinking about this:

  • What should happen if the user renames one of the groups? If the other group is not renamed as well then it contains no items...
  • I think it's already possible to have to groups with the same entries: Group A is an "assign entries manually" group, and Group B is a keyword-based group, for the field "groups" and value the name of Group A. The only constraint is currently that Group A and Group B have to have different names, but something as "Group A (dup)" as a workaround doesn't seem to bad. Does this work? Is it already sufficient?

@sauliusg

sauliusg commented Mar 22, 2021

Copy link
Copy Markdown
Contributor Author

What should happen if the user renames one of the groups? If the other group is not renamed as well then it contains no items...

A decent and easy strategy would be to detect this case and leave the old group name as well as the new group name in all entries of the original group. Then both the new (renamed) group and the old group will contain the same entries, and from the point of renaming they start their independent life, so to say :)

Amending the rename code should be easy, I can do that.

A more elaborate solution would be to ask if a user wants to reassign the entries to the new group, to the old group(s) or to all of them. Similar to the question that is asked when changing group Hierarchical content. But this is more work, so the first try would be to leave entries in both old and new group, as by default.

@sauliusg

sauliusg commented Mar 22, 2021

Copy link
Copy Markdown
Contributor Author

I think it's already possible to have to groups with the same entries: Group A is an "assign entries manually" group, and Group B is a keyword-based group, for the field "groups" and value the name of Group A.

The only constraint is currently that Group A and Group B have to have different names,

This is exactly what I am trying to avoid! I mean, I try to remove this constraint and allow users to create groups with identical names if they wish so.

but something as "Group A (dup)" as a workaround doesn't seem to bad. Does this work? Is it already sufficient?

Well, for my purposes of using JabRef, no, it is not sufficient.

The primary need is to allow identical group names in different parts of the hierarchy. Without this, I just can not use the groups; of example I can not migrate automatically my bibliographies from Zotero where I do have identical group names in different parts of the tree; and migrating our old JabRef 3.x group assignment is also not possible.

The fact that two groups with the same name always get the same entries is for me an unanticipated feature, which can be put to good use once you understand it. Actually, think about groups not as tree leaves, but as sets in a mathematical sense. A set (group) is defined by its members; and the same set can be a member of any other sets (e.g. branches of a tree).

If I need to define a special group of a different kind, then I'll probably find it too complicated to maintain and will not do it. Think what happens when I rename your primary group (Group A)? All my 'symlink' keyword based groups (Group B and friends) will become invalid, unless JabRef fixes them automatically; and how does it know when to fix keyword groups and when not? Most often when I change a keyword in an entry, this means that it must appear in a different keyword group, not that the original keyword group is retargetted to a new keyword...

@sauliusg

sauliusg commented Mar 22, 2021

Copy link
Copy Markdown
Contributor Author

What should happen if the user renames one of the groups? If the other group is not renamed as well then it contains no items...

Also, removing a group should leave entries assigned to the group if other instances of the same group exist... I can take care of this as well.

@tobiasdiez

tobiasdiez commented Mar 22, 2021

Copy link
Copy Markdown
Member

I'm a bit undecided. On the one hand, I agree that allowing duplicate names would be natural, and useful in some cases. On the other hand, it feels like opening a (small) box of pandora since there are so many things you have to think about: renaming groups, deleting, moving in the tree, manually editing of the groups tree in the bib file. Also it makes it harder to migrate to other solutions (say id-based or full-path-of-group) since the name is no longer unique. Another point is that we were thinking about adding a nice user interface for the groups, and for this a "group name > group in tree" map would be needed. In summary, allowing duplicate names definitely increases the burden for future maintenance.

I'm tending to keep the error. Advanced users such as yourself can then still add groups with the same name by editing the groups tree manually in the bib file.
@JabRef/developers what do you think?

@sauliusg

Copy link
Copy Markdown
Contributor Author

I'm a bit undecided. On the one hand, I agree that allowing duplicate names would be natural, and useful in some cases.

Very much so!

On the other hand, it feels like opening a (small) box of pandora since there are so many things you have to think about: renaming groups, deleting, moving in the tree, manually editing of the groups tree in the bib file.

It seems to be not that bad, actually less dangerous than having full paths in group names...

The mentioned actions seem to make a "Closed Complete Set of Functions" (CRUD?):

  1. Renaming a group is pretty much as it is now. Extension will be: if another group with the same name exists, leave the old group name in all entries, and add the new group name. If the group was unique: remove old group name from entries, add new group name to these entries. Very natural. No information is lost; the action is actually reversible! The user can later rename the new group again, move it in the tree or "weed out" some entries (without affecting the original group);
  2. Deleting the group is very similar to renaming: if another group with the same name exists, no entries need to be changed, otherwise group name is removed from all entries.

I'm now experimenting with this functionality in my branch and will add modifications for 1) and 2) to the PR.

  1. moving in the tree: trivial, as it is now: just change the position in the list and the indentation number. Can be done both in GUI and by editing/sed-editing the BibTeX file.
  2. manually editing of the groups tree: trivial – just delete one row and insert it in another position if yo want it elsewhere. This is how I produced the test cases with identical groups before the patch of this PR was implemented; tree editing is independent from entry editing, unless you want to rename the group manually and retain all entries in it.
  3. editing groups in entries: exactly as it is now, just add a group name you want and you have added the entry to that named group. It will appear in all places of the tree where this group is mentioned. The entry editing is independent from the tree.

No dangers seem to be lurking so far.

Also it makes it harder to migrate to other solutions (say id-based or full-path-of-group) since the name is no longer unique. Another point is that we were thinking about adding a nice user interface for the groups, and for this a "group name > group in tree" map would be needed. In summary, allowing duplicate names definitely increases the burden for future maintenance.

The most painful migration was from groups 3.x to groups 4, where a lot of people (including me :( ) lost their group assignment data... Well, my day was saved by a Subversion repo where all my databases reside and a bunch of Unix scripts that restored the group assignments. Others might have been less lucky...

I see the group assignment by people to be a very valuable information – a lot of manual work went into assigning and classifying entries, and people rely on them when finding information they need quickly.

So an information-preserving migration strategy will be needed anyway.

A possible migration would be:

  1. When migrating from the current PR Allow identical group names #7558 to a tree where groups in different tree branches are different:
    -- if the new format supports the "linked" groups, i.e. groups that all get the same entries when an entry is added to one of them, convert synonymous groups to such entries;
    -- if the new format does not support such functionality, just retain groups with identical names and identical sets of entries, and warn the user that the behaviour of JabRef has changed.

  2. When migrating from the old group format (3.x) to the new one as in this PR:
    -- If all groups have unique names, nothing needs to be changed;
    -- If some groups in different places of the tree have the same name, make only those groups unique by either appending a group parent names (as in 'Treatment (Medicine > Asthma)' and 'Treatment (Biocompatibility > Surfaces)', or by appending just a serial number, and warn the user about renaming. The user can then adjust group names if they wish, but, must importantly, their precious division of entries into groups is not lost.

I'm tending to keep the error. Advanced users such as yourself can then still add groups with the same name by editing the groups tree manually in the bib file.

Well, even if I go for this, I (and my colleagues) need some reassurance that JabRef will keep the present functionality in the future, so that our work of adding these groups and assigning entries to them does not go down the drain. And keeping such functionality in sight is probably as much work for the JabRef team as keeping it with the suggested changes added... :)

Also, if I need to use external scripts/editors for managing my BibTeX file... well, its doable, but I lose some integration and the nice JabRef features like auto-fetch of bibliographies and OA papers... Would be a pity. Also, it does not seem like a good advertisement for a GUI program to say "well, you of course can have this feature but you need to save data, open a text editor, edit groups by hand (watch out the syntax!), restart/reread the data into Jabref..."

@JabRef/developers what do you think?

Keen to hear the answers!

@sauliusg

sauliusg commented Mar 22, 2021

Copy link
Copy Markdown
Contributor Author

Update in commits leading to 6f5b119 : when a group with duplicated name is deleted, its name is removed from the entries only when there is no other group with the same name left. Works for "delete group and subgroups" command.

Gradle or IDE) and to attach a debugger to it.
@sauliusg

This comment has been minimized.

@Siedlerchr

Copy link
Copy Markdown
Member

I don't use Intellij, but did you configure it as stated here? https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace#configure-your-ide

@sauliusg

This comment has been minimized.

@sauliusg

This comment has been minimized.

Saulius Grazulis added 2 commits March 28, 2021 18:52
other groups that had the same name as the old (original) renamed group.
@sauliusg

Copy link
Copy Markdown
Contributor Author

As discussed in #7551:

I have just noticed that when I rename one of the dupliactes, its entries are reassigned to the new group and vanish from all old group duplicates. This is not what I (and probably most other users) would expect; the fix should be easy – if the group(s) with the original name exist somewhere in the tree, the old group name should be retained in the entries. The entries will thus belong to two groups if one of the duplicates is renamed. Only of the group name vanishes from the tree, it should be removed from the entries as well.

I'll amend the PR to fix this.

The latest commit fixes this issue. Now entries are not removed from the remaining synonymous groups if one of them is renamed; in that case, entries become assigned to both groups, the new and the old. Seems like a very natural behaviour to me.

How do you find it?

@tobiasdiez

Copy link
Copy Markdown
Member

The more I think about this the more I'm convinced that it was a bad idea on my side to use the group name as the identifier. I propose we drop the requirement that the "groups" field should be human-readable, and instead use a unique identifier for each group. The "groups" field in the entry editor would then use a chips view to give a nice UI to manage group memberships.

@JabRef/developers input very much appreciated!

@sauliusg

Copy link
Copy Markdown
Contributor Author

As a side note: the proposed PR seems to also fix the #7556 :)

@sauliusg

sauliusg commented Mar 28, 2021

Copy link
Copy Markdown
Contributor Author

The more I think about this the more I'm convinced that it was a bad idea on my side to use the group name as the identifier. I propose we drop the requirement that the "groups" field should be human-readable, and instead use a unique identifier for each group. The "groups" field in the entry editor would then use a chips view to give a nice UI to manage group memberships.

If I may chime in:

Sure, if you want to have unique keys identifying your internal objects, it is a bad idea to rely on external names.

That said, I would not say that your idea to use user-provided names as identifiers is so bad. It depends on what you want to do. If you take users' provided group names at their face value, as names that users want to use to identify their groups, then you current implementation logically follows, and also a behaviour suggested in PR #7558 logically follows.

After a bit of testing and playing around, I think I now see the full rationale of placing group names into entries, and the emerging behaviour is quite useful – provided that you frankly admit that users can create groups with identical names (and GUI checks will not stop them ;) ).

Note that when we have intermediate groups, such as 'Language' in 'Computer/Language/SQL' and 'Data/Language/SQL' or 'Human/Language/Esperanto', the 'Language' group, if it is a union group and has no entries of its own, behaves exactly as it used to behave in the old system.

Moreover, if you have some entries in the 'Language' group, it makes perfect sense to make a subgroup 'Language/others', where 'others' will contain all "unassigned" languages, and move all these entries there. Then the 'Language' union group will continue to display all entries as it used before, and all other 'Language' groups elsewhere in the tree would continue to show the entries in their subgroups; a user is free to include groups 'others' into those 'Language' groups or not. To my taste, this is perfectly usable, quite transparent and rather useful.

@koppor

koppor commented Mar 29, 2021

Copy link
Copy Markdown
Member

We discussed this at our dev call. We are not sure about the possibly overlooked consequences. Since we are software engineers, we like empirical software engineering ^^. We do the shortcut and started a survey on Twitter - hopefully many persons will answer it: https://twitter.com/JabRef_org/status/1376602527832158215

@sauliusg

Copy link
Copy Markdown
Contributor Author

We discussed this at our dev call. We are not sure about the possibly overlooked consequences. Since we are software engineers, we like empirical software engineering ^^. We do the shortcut and started a survey on Twitter - hopefully many persons will answer it: https://twitter.com/JabRef_org/status/1376602527832158215

Good idea. General user feedback is very important, and users should know what to expect.

@sauliusg

Copy link
Copy Markdown
Contributor Author

We are not sure about the possibly overlooked consequences. Since we are software engineers, we like empirical software engineering ^^.

Why not running the https://github.com/sauliusg/jabref/tree/saulius-allow-identical-group-names branch in a sandbox and see how it behaves? Bugs and usability issues will surface (empirically) after some time of playing around :).

@sauliusg

Copy link
Copy Markdown
Contributor Author

@tobiasdiez @koppor Any more thoughts on group implementation(s)?

Should I go on to implemented dashboard-like info as suggested by @AEgit in #7554?

…roup-names

Updating my branch so that it is in sync with the JabRef master.
@koppor

koppor commented Apr 10, 2021

Copy link
Copy Markdown
Member

Note to myself for the devcall tomorrow: I think, I agree with #7554 (comment).

@sauliusg

Copy link
Copy Markdown
Contributor Author

Note to myself for the devcall tomorrow: I think, I agree with #7554 (comment).

Thanks for the reply; waiting for the devcall outcome.

@koppor

koppor commented Apr 12, 2021

Copy link
Copy Markdown
Member

DevCall: We accept the risk of some unforseen side effects with the benefit having identical groups working.

@koppor koppor merged commit 6903e0e into JabRef:main Apr 12, 2021
@sauliusg

Copy link
Copy Markdown
Contributor Author

DevCall: We accept the risk of some unforseen side effects with the benefit having identical groups working.

Thank you very much for accepting this change! You save my year – I start to migrate my groups immediately, and will offer the JavRef dev version to my co-workes as soon as the stability issues are resolved.

In case there are unforeseen problems, for sure I feel obliged to invest my effort to help resolving them, as much as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate groups in an input file are not detected

4 participants