Skip to content

Add plain citation parsing with LLM#11831

Merged
koppor merged 14 commits into
JabRef:mainfrom
InAnYan:add-llm-citation-parsing
Oct 6, 2024
Merged

Add plain citation parsing with LLM#11831
koppor merged 14 commits into
JabRef:mainfrom
InAnYan:add-llm-citation-parsing

Conversation

@InAnYan

@InAnYan InAnYan commented Sep 25, 2024

Copy link
Copy Markdown
Member

Closes #11825
Closes #11805
Implements #11825 (comment).

Updated dialog for online parsing:
image

Updated dialog for offline parsing:
image

New setting:
image

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
    - [ ] Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@koppor

koppor commented Sep 25, 2024

Copy link
Copy Markdown
Member

"Parser choice" is nice.

  • always display that dropdown.
  • Remove "offline" / "online" in title
  • Add "offline" to dropdown
  • Adapt text ("Parser choice")
  • Add "Grobid" to Parser choice if enabled (GROBID enable din preference)
  • Add "LLM" to Parser choice if enabled (AI enabled in preferences)
  • Remove Dialog asking to enable Grobid

@Siedlerchr

Copy link
Copy Markdown
Member

I would propose a tab layout (e.g. like in OO style selection dialog) between Grobid/OpenAI

@koppor

koppor commented Sep 26, 2024

Copy link
Copy Markdown
Member

I would propose a tab layout (e.g. like in OO style selection dialog) between Grobid/OpenAI

The selection of the service is done very raRely. Thus. I Like the approach. The selection should persist across sessions.

@ThiloteE

Copy link
Copy Markdown
Member

Haven't tested the PR yet, but the screenshots show a divergence between the header of the window and the warning.

"Plain references parser (online)" vs "plain citation parsing"

It should not differ.

@ThiloteE

Copy link
Copy Markdown
Member
  • Also, I am not sure, if this preference should be in "Web search"
  • And if we use LLM's it can also be "online", right?

@koppor

koppor commented Sep 27, 2024

Copy link
Copy Markdown
Member
  • Also, I am not sure, if this preference should be in "Web search"

What do you mean by "this"?

Current state:

  • User can enable Grobid
  • Grobid preference is in "Web search"
    image
  • User can enable AI
  • This functionality makes use of Grobid and/or AI.
  • AI is used at other places.
  • Grobid functionality is also used at other places (e.g., importing a PDF into JabRef)
* And if we use LLM's it can also be "online", right?

Grobid is also online.

Therefore, I proposed to remove both "online" and "offline" and move that decision to the dropdown.

@ThiloteE

Copy link
Copy Markdown
Member

With "this" I meant the new preference that was showing in #11831 (comment) under web-search.

@ThiloteE

Copy link
Copy Markdown
Member

AI (LLMs) can be both online and offline.

@koppor

koppor commented Sep 29, 2024

Copy link
Copy Markdown
Member

AI (LLMs) can be both online and offline.

Therefore "online" and "offline" are requested to be removed. See #11831 (comment)

@koppor

koppor commented Sep 29, 2024

Copy link
Copy Markdown
Member

With "this" I meant the new preference that was showing in #11831 (comment) under web-search.

Since we don't have a tab "Import" and I don't think, this is so important, I propose it too keep it in the tab. However (!), please move it to the "Remote services" part. There, we already have Grobid-related stuff.

image

@InAnYan

InAnYan commented Oct 3, 2024

Copy link
Copy Markdown
Member Author

image

This is not GUI.

I also propose to change the name to RuleBasedBibtexExtractor.

This aligns very well with the history of natural lanuguage processing (NLP):

  • Rule-based methods (old BibtexExtractor).
  • Statistical methods (like GROBID, however this is not right, they can also use neural networks).
  • Neural-network methods (like LLM).

@InAnYan

InAnYan commented Oct 3, 2024

Copy link
Copy Markdown
Member Author

I also propose to change all names to "Plain Citation Parsing". Synonyms are:

  • "BibTex extractor" - it's too general
  • "Extract BibTex from plain text" - "plain text" - really? I think you meant a citation, not just an arbitrary text.
  • "Plain references parser" - another name. This is good actually, because it correlates to "References" section in papers. But I still propose to name it "citations", because that section could also be named "Bibliography". Citation is a very specific term and it depicts better what JabRef does.

@InAnYan

InAnYan commented Oct 3, 2024

Copy link
Copy Markdown
Member Author

WAIT

All of this time offline parsing did not support parsing of several entries (separated by two blank lines)? WTF

@InAnYan

InAnYan commented Oct 3, 2024

Copy link
Copy Markdown
Member Author

image

WTF? "JabRef server" - I thought it connected to some kind of GROBID server

UPDATE: Yes, it really connects to JabRef server. But I still renamed that to be more general

@InAnYan

InAnYan commented Oct 3, 2024

Copy link
Copy Markdown
Member Author
 Add "Grobid" to Parser choice if enabled (GROBID enable din preference)
Add "LLM" to Parser choice if enabled (AI enabled in preferences)

Ah, this is not that easy

@Siedlerchr

Copy link
Copy Markdown
Member

We had an instance of grobid running on azure but currently I cannot deploy it again due to some weird errors from docker

…dd-llm-citation-parsing

# Conflicts:
#	src/test/java/org/jabref/logic/importer/WebFetchersTest.java
@InAnYan

InAnYan commented Oct 3, 2024

Copy link
Copy Markdown
Member Author

Did some crazy refactoring.

Hope you like it

@koppor koppor left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small nitpicks.

And

  • When opening the dialog, it should check whether the current value is still available. Example: GROBID enabled, dialog opened, GROBID selected, then preferences: GROBID disabled. Opening dialog again: GROBID should not be selected. (Note that GROBID is not in the dropdown, which is OK. However, it is selected)
    • I think, there should be a preference disabled-dependency: grafik. GROBID disabled and default GROBID: try AI. If AI not enabled, set default to rule-based. - similar to AI: If AI get disabled -> if default AI then check if GROBID enabled. Then set GROBID - else set local.
  • Link org.jabref.logic.importer.plaincitation.RuleBasedPlainCitationParser and org.jabref.logic.importer.fileformat.BibliographyFromPdfImporter#parseReference with a TODO (at both sides, with {@link}. They are semantically equal, but the code is different. Someone should unify the functionality.
    • Similar: Link org.jabref.logic.importer.plaincitation.SeveralPlainCitationParser and org.jabref.logic.importer.fileformat.BibliographyFromPdfImporter#getIntermediateData

Note: for reviewing this PR, refactoring miner is better.

E.g., instead of thinking a file was deleted, there is detection of major refactorings

grafik

Comment thread src/main/java/org/jabref/gui/preferences/websearch/WebSearchTab.fxml Outdated
Comment thread src/main/java/org/jabref/logic/preferences/JabRefCliPreferences.java Outdated
Comment thread src/main/java/org/jabref/logic/preferences/JabRefCliPreferences.java Outdated
Comment thread src/main/resources/l10n/JabRef_en.properties Outdated
Comment thread src/main/resources/l10n/JabRef_en.properties Outdated
Comment thread src/main/resources/l10n/JabRef_en.properties Outdated
@InAnYan

InAnYan commented Oct 6, 2024

Copy link
Copy Markdown
Member Author

@koppor, I thought it will be easy to filter out parser choices

But we can't (currently, or without a hack) link preferences from different tabs.

At first, I used preferences.getAiPreferences().useAiProperty().addListener(...) (or not useAi, some other name), but when we edit preferences, we don't set them in CliPreferences...

@InAnYan

InAnYan commented Oct 6, 2024

Copy link
Copy Markdown
Member Author

Okay, GROBID is on the same page, but AI is not.

Is a hack worth it?

@InAnYan InAnYan requested a review from koppor October 6, 2024 11:07
@koppor

koppor commented Oct 6, 2024

Copy link
Copy Markdown
Member

Okay, GROBID is on the same page, but AI is not.

Is a hack worth it?

I don't fully understand ^^. - I just guess: It is OK for me if the dropdown for default in the web search contains an entry which is available at another setting...

@koppor

koppor commented Oct 6, 2024

Copy link
Copy Markdown
Member

@koppor, I thought it will be easy to filter out parser choices
But we can't (currently, or without a hack) link preferences from different tabs.

It should be possible to get the AIPreferences passed.

At first, I used preferences.getAiPreferences().useAiProperty().addListener(...)

Sounds good!

(or not useAi, some other name), but when we edit preferences, we don't set them in CliPreferences...

I don't understand. Its a hiearchy - and the class for GuiPreferences is a superset of the CliPreferences

grafik

@koppor koppor left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small improvement. I think, this is NOT a dirty hack, but a good solution.

Comment thread src/main/java/org/jabref/gui/preferences/PreferencesDialogViewModel.java Outdated
…Model.java

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
@koppor koppor enabled auto-merge October 6, 2024 11:37
@koppor koppor added this pull request to the merge queue Oct 6, 2024
Merged via the queue into JabRef:main with commit daf6736 Oct 6, 2024
@koppor koppor deleted the add-llm-citation-parsing branch October 6, 2024 11:51
tsantalis added a commit to tsantalis/RefactoringMiner that referenced this pull request Apr 20, 2025
JabRef/jabref#11831
Move And Rename Method public performSearch(searchQuery String) : List<BibEntry> from class org.jabref.logic.importer.fetcher.GrobidCitationFetcher to public parseSeveralPlainCitations(text String) : List<BibEntry> from class org.jabref.logic.importer.plaincitation.SeveralPlainCitationParser

Extract Method public splitCitations(text String) : Stream<String> extracted from public performSearch(searchQuery String) : List<BibEntry> in class org.jabref.logic.importer.plaincitation.SeveralPlainCitationParser

Rename Parameter searchQuery : String to text : String in method public parseSeveralPlainCitations(text String) : List<BibEntry> from class org.jabref.logic.importer.plaincitation.SeveralPlainCitationParser
maorethians pushed a commit to maorethians/RefactoringMiner that referenced this pull request Jun 8, 2026
JabRef/jabref#11831
Move And Rename Method public performSearch(searchQuery String) : List<BibEntry> from class org.jabref.logic.importer.fetcher.GrobidCitationFetcher to public parseSeveralPlainCitations(text String) : List<BibEntry> from class org.jabref.logic.importer.plaincitation.SeveralPlainCitationParser

Extract Method public splitCitations(text String) : Stream<String> extracted from public performSearch(searchQuery String) : List<BibEntry> in class org.jabref.logic.importer.plaincitation.SeveralPlainCitationParser

Rename Parameter searchQuery : String to text : String in method public parseSeveralPlainCitations(text String) : List<BibEntry> from class org.jabref.logic.importer.plaincitation.SeveralPlainCitationParser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add warning about confabulations to "New entry from plain text" "New entry from plain text" gives strange results

4 participants