Extract text about papers from "related work" sections

In science, authors write papers. They related their paper to other papers. This text is very interesting, as it contains two aspects:

- interesting other papers
- description of other papers

Example: https://github.com/JabRef/jabref-demo-libraries/blob/main/chocolate/pdfs/LunaOstos_2024%20-%20Social%20Life%20Cycle%20Assessment%20in%20the%20Chocolate%20Industry%20-%20A%20Colombian%20Case%20Study%20with%20Luker%20Chocolate.pdf

```
Colombia is a middle-income country with a population
of approximately 50 million (CIA 2021), with at least 11
million people living in rural areas (DANE 2018). It is the
third most biodiverse country globally, following Brazil
and Indonesia (Nash 2022). 
```

JabRef should do following:

For each reference:

- Lookup in the references
- Add to library - or update, if already exists
- Find out descriptive text for the paper in the text
- Add the desriptive text to `comments-{username}`, prefixed with `[{citation-key}]: ` (`[LunaOstos_2024] ` in our example). In case there is already content in `comments-{username}`, just append it. Separated by an empty line.

Example result:


```bibtex
@Misc{Agency2021,
  author         = {{Central Intelligence Agency}},
  note           = {Accessed 4 Mar 2023},
  title          = {The world factbook: Colombia},
  year           = {2021},
  comment-koppor = {[LunaOstos_2024]: Colombia is a middle-income country with a population of approximately 50 million.},
  url            = {https://www.cia.gov/the-world-factbook/countries/colombia/},
}
```

---

## UI

Add a new tab "Related work text"

<img width="952" height="675" alt="grafik" src="https://github.com/user-attachments/assets/78014317-383e-4627-bd3e-79647baba5b4" />

When pasting here, it is parsed and entries created / updated accordingly

---

Related: Citation relations. However, they do not have the full text.

Screenshot from the linked PDF:

<img width="576" height="178" alt="Image" src="https://github.com/user-attachments/assets/d927583f-3b66-44b2-9ab3-ff1e0255b24c" />

Fuller context:

<img width="612" height="843" alt="Image" src="https://github.com/user-attachments/assets/e4377863-d352-4e3d-a3c3-0700dd99deff" />

---

Hint: It is perfectly OK to use the langchain4j's AI interface to  parse etc.

---

This is NOT citation relations, because this issue here is about to harvest knowledge from a PDF.

---

**This requires Software Engineering** not simply generating a wall of code.

As a consequence, you need to split into parts. One step by one step. Do a new PR. One step after another. With UML diagrams and documentation. No rush.

We need to be more close to the initial requirement.

First PR:

Setting

- User has opened an IEEE paper in their PDF reader. E.g., the one at https://github.com/JabRef/jabref/pull/15069.
- User has selected the respective BibEntry in JabRef

1. User selects reference text in their PDF reader
2. User copies text to clipboard
3. User selects "Tools" -> "Insert related work text"
4. JabRef opens dialog with current clipboard content
5. Dialog shows if the reference is matched in the the current library - if not, JabRef offers to add the reference. This is a s sub-paert of the dialog.
6. User clicks "Insert" (only enabled if the reference is found in the bib file)
7. JabRef adds the reference to the linked reference as "- {citationkey}: {text}"
8. JabRef displays notification "reference summary added"
9. JabRef closes dialog

Note: The reference text need to be added as item at a markdown list. This eases separating references.

You can even split up the PR more and first work on the logic part. Maybe even 

- "only" extracting text + reference "text" --> correct data structures and tests. Example: `A study on the Colombian [...] [1].` - finds `A study on the Colombian [...]` and `[1]`
 - Finding the reference "text" in the references section. E.g., `[1]` is turned into a "pointer" to BibEntry `LunaOstos_2024`

Then the UI part

Then continue at reference format `[1, 2]` - separate PR

Then use one (!) other reference format. Create a test PDF for that.

Then work on another reference format. Create a test PDF for that.

Then support for multiple references, e.g.,

```markdown
A study on the Colombian chocolate industry represents the
first application of a social life cycle assessment (S-LCA) to
cover both cocoa cultivation and chocolate manufacturing [1].
A study on selective, hedonic deprivation found that restricting
chocolate intake for two weeks increased state chocolate crav-
ing, but only in individuals who already had high trait choco-
late craving [2]. 
```






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extract text about papers from "related work" sections #14085

UI

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Extract text about papers from "related work" sections #14085

Description

UI

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions