Skip to content

Extract text about papers from "related work" sections #14085

@koppor

Description

@koppor

In science, authors write papers. They related their paper to other papers. This text is very interesting, as it contains two aspects:

  • interesting other papers
  • description of other papers

Example: https://github.com/JabRef/jabref-demo-libraries/blob/main/chocolate/pdfs/LunaOstos_2024%20-%20Social%20Life%20Cycle%20Assessment%20in%20the%20Chocolate%20Industry%20-%20A%20Colombian%20Case%20Study%20with%20Luker%20Chocolate.pdf

Colombia is a middle-income country with a population
of approximately 50 million (CIA 2021), with at least 11
million people living in rural areas (DANE 2018). It is the
third most biodiverse country globally, following Brazil
and Indonesia (Nash 2022). 

JabRef should do following:

For each reference:

  • Lookup in the references
  • Add to library - or update, if already exists
  • Find out descriptive text for the paper in the text
  • Add the desriptive text to comments-{username}, prefixed with [{citation-key}]: ([LunaOstos_2024] in our example). In case there is already content in comments-{username}, just append it. Separated by an empty line.

Example result:

@Misc{Agency2021,
  author         = {{Central Intelligence Agency}},
  note           = {Accessed 4 Mar 2023},
  title          = {The world factbook: Colombia},
  year           = {2021},
  comment-koppor = {[LunaOstos_2024]: Colombia is a middle-income country with a population of approximately 50 million.},
  url            = {https://www.cia.gov/the-world-factbook/countries/colombia/},
}

UI

Add a new tab "Related work text"

grafik

When pasting here, it is parsed and entries created / updated accordingly


Related: Citation relations. However, they do not have the full text.

Screenshot from the linked PDF:

Image

Fuller context:

Image

Hint: It is perfectly OK to use the langchain4j's AI interface to parse etc.


This is NOT citation relations, because this issue here is about to harvest knowledge from a PDF.


This requires Software Engineering not simply generating a wall of code.

As a consequence, you need to split into parts. One step by one step. Do a new PR. One step after another. With UML diagrams and documentation. No rush.

We need to be more close to the initial requirement.

First PR:

Setting

  1. User selects reference text in their PDF reader
  2. User copies text to clipboard
  3. User selects "Tools" -> "Insert related work text"
  4. JabRef opens dialog with current clipboard content
  5. Dialog shows if the reference is matched in the the current library - if not, JabRef offers to add the reference. This is a s sub-paert of the dialog.
  6. User clicks "Insert" (only enabled if the reference is found in the bib file)
  7. JabRef adds the reference to the linked reference as "- {citationkey}: {text}"
  8. JabRef displays notification "reference summary added"
  9. JabRef closes dialog

Note: The reference text need to be added as item at a markdown list. This eases separating references.

You can even split up the PR more and first work on the logic part. Maybe even

  • "only" extracting text + reference "text" --> correct data structures and tests. Example: A study on the Colombian [...] [1]. - finds A study on the Colombian [...] and [1]
  • Finding the reference "text" in the references section. E.g., [1] is turned into a "pointer" to BibEntry LunaOstos_2024

Then the UI part

Then continue at reference format [1, 2] - separate PR

Then use one (!) other reference format. Create a test PDF for that.

Then work on another reference format. Create a test PDF for that.

Then support for multiple references, e.g.,

A study on the Colombian chocolate industry represents the
first application of a social life cycle assessment (S-LCA) to
cover both cocoa cultivation and chocolate manufacturing [1].
A study on selective, hedonic deprivation found that restricting
chocolate intake for two weeks increased state chocolate crav-
ing, but only in individuals who already had high trait choco-
late craving [2]. 

Metadata

Metadata

Assignees

Type

No fields configured for task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions