In science, authors write papers. They related their paper to other papers. This text is very interesting, as it contains two aspects:
- interesting other papers
- description of other papers
Example: https://github.com/JabRef/jabref-demo-libraries/blob/main/chocolate/pdfs/LunaOstos_2024%20-%20Social%20Life%20Cycle%20Assessment%20in%20the%20Chocolate%20Industry%20-%20A%20Colombian%20Case%20Study%20with%20Luker%20Chocolate.pdf
Colombia is a middle-income country with a population
of approximately 50 million (CIA 2021), with at least 11
million people living in rural areas (DANE 2018). It is the
third most biodiverse country globally, following Brazil
and Indonesia (Nash 2022).
JabRef should do following:
For each reference:
- Lookup in the references
- Add to library - or update, if already exists
- Find out descriptive text for the paper in the text
- Add the desriptive text to
comments-{username}, prefixed with [{citation-key}]: ([LunaOstos_2024] in our example). In case there is already content in comments-{username}, just append it. Separated by an empty line.
Example result:
@Misc{Agency2021,
author = {{Central Intelligence Agency}},
note = {Accessed 4 Mar 2023},
title = {The world factbook: Colombia},
year = {2021},
comment-koppor = {[LunaOstos_2024]: Colombia is a middle-income country with a population of approximately 50 million.},
url = {https://www.cia.gov/the-world-factbook/countries/colombia/},
}
UI
Add a new tab "Related work text"
When pasting here, it is parsed and entries created / updated accordingly
Related: Citation relations. However, they do not have the full text.
Screenshot from the linked PDF:
Fuller context:
Hint: It is perfectly OK to use the langchain4j's AI interface to parse etc.
This is NOT citation relations, because this issue here is about to harvest knowledge from a PDF.
This requires Software Engineering not simply generating a wall of code.
As a consequence, you need to split into parts. One step by one step. Do a new PR. One step after another. With UML diagrams and documentation. No rush.
We need to be more close to the initial requirement.
First PR:
Setting
- User selects reference text in their PDF reader
- User copies text to clipboard
- User selects "Tools" -> "Insert related work text"
- JabRef opens dialog with current clipboard content
- Dialog shows if the reference is matched in the the current library - if not, JabRef offers to add the reference. This is a s sub-paert of the dialog.
- User clicks "Insert" (only enabled if the reference is found in the bib file)
- JabRef adds the reference to the linked reference as "- {citationkey}: {text}"
- JabRef displays notification "reference summary added"
- JabRef closes dialog
Note: The reference text need to be added as item at a markdown list. This eases separating references.
You can even split up the PR more and first work on the logic part. Maybe even
- "only" extracting text + reference "text" --> correct data structures and tests. Example:
A study on the Colombian [...] [1]. - finds A study on the Colombian [...] and [1]
- Finding the reference "text" in the references section. E.g.,
[1] is turned into a "pointer" to BibEntry LunaOstos_2024
Then the UI part
Then continue at reference format [1, 2] - separate PR
Then use one (!) other reference format. Create a test PDF for that.
Then work on another reference format. Create a test PDF for that.
Then support for multiple references, e.g.,
A study on the Colombian chocolate industry represents the
first application of a social life cycle assessment (S-LCA) to
cover both cocoa cultivation and chocolate manufacturing [1].
A study on selective, hedonic deprivation found that restricting
chocolate intake for two weeks increased state chocolate crav-
ing, but only in individuals who already had high trait choco-
late craving [2].
In science, authors write papers. They related their paper to other papers. This text is very interesting, as it contains two aspects:
Example: https://github.com/JabRef/jabref-demo-libraries/blob/main/chocolate/pdfs/LunaOstos_2024%20-%20Social%20Life%20Cycle%20Assessment%20in%20the%20Chocolate%20Industry%20-%20A%20Colombian%20Case%20Study%20with%20Luker%20Chocolate.pdf
JabRef should do following:
For each reference:
comments-{username}, prefixed with[{citation-key}]:([LunaOstos_2024]in our example). In case there is already content incomments-{username}, just append it. Separated by an empty line.Example result:
UI
Add a new tab "Related work text"
When pasting here, it is parsed and entries created / updated accordingly
Related: Citation relations. However, they do not have the full text.
Screenshot from the linked PDF:
Fuller context:
Hint: It is perfectly OK to use the langchain4j's AI interface to parse etc.
This is NOT citation relations, because this issue here is about to harvest knowledge from a PDF.
This requires Software Engineering not simply generating a wall of code.
As a consequence, you need to split into parts. One step by one step. Do a new PR. One step after another. With UML diagrams and documentation. No rush.
We need to be more close to the initial requirement.
First PR:
Setting
Note: The reference text need to be added as item at a markdown list. This eases separating references.
You can even split up the PR more and first work on the logic part. Maybe even
A study on the Colombian [...] [1].- findsA study on the Colombian [...]and[1][1]is turned into a "pointer" to BibEntryLunaOstos_2024Then the UI part
Then continue at reference format
[1, 2]- separate PRThen use one (!) other reference format. Create a test PDF for that.
Then work on another reference format. Create a test PDF for that.
Then support for multiple references, e.g.,