Skip to content

[WIP] Extract PDF References#10437

Merged
calixtus merged 14 commits into
JabRef:mainfrom
aqurilla:fix-for-issue-10200
Mar 12, 2024
Merged

[WIP] Extract PDF References#10437
calixtus merged 14 commits into
JabRef:mainfrom
aqurilla:fix-for-issue-10200

Conversation

@aqurilla

@aqurilla aqurilla commented Oct 1, 2023

Copy link
Copy Markdown
Contributor

This fixes #10200 by implementing reference extraction from PDF files

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

Comment thread src/main/java/org/jabref/logic/importer/util/GrobidService.java
@koppor

koppor commented Oct 24, 2023

Copy link
Copy Markdown
Member

I added an initial test pdf at aqurilla#1.

@BBC-Esq

BBC-Esq commented Dec 23, 2023

Copy link
Copy Markdown

Interested in this as an attorney and extracting numerous legal citations...

@koppor

koppor commented Dec 24, 2023

Copy link
Copy Markdown
Member

Interested in this as an attorney and extracting numerous legal citations...

Can you share example PDFs we can license under MIT license - or maybe just sharable under a different license. 😅

@koppor

koppor commented Mar 11, 2024

Copy link
Copy Markdown
Member

Screenshots:

  1. Activate functionality

image

  1. Result

image


For completeness, the PDF part with references:

image

@koppor koppor marked this pull request as ready for review March 11, 2024 23:48
@calixtus calixtus enabled auto-merge March 11, 2024 23:49
@koppor

koppor commented Mar 11, 2024

Copy link
Copy Markdown
Member

@aqurilla Very nice work! Thank you for working on this!. Sorry on us for not having given feedback earlier.

@calixtus calixtus added this pull request to the merge queue Mar 11, 2024
Merged via the queue into JabRef:main with commit 4c64706 Mar 12, 2024
Siedlerchr added a commit to Frequinzy/jabref that referenced this pull request Mar 13, 2024
* upstream/main: (36 commits)
  chore: remove repetitive words (JabRef#11015)
  Fix test names (JabRef#11014)
  Remove obsolete "Comments" tab configuration (JabRef#11011)
  Fix "Other fields" tab respecting custom tabs (JabRef#11012)
  [WIP] Extract PDF References (JabRef#10437)
  Fixed jump to entry from crossref (JabRef#11009)
  fix suggestion provider for crossref field (JabRef#10962)
  Use SequencedSet for required and optional fields (JabRef#11007)
  Bump io.github.classgraph:classgraph from 4.8.165 to 4.8.168 (JabRef#11005)
  Bump org.glassfish.hk2:hk2-api from 3.0.6 to 3.1.0 (JabRef#11006)
  Bump org.apache.logging.log4j:log4j-to-slf4j from 2.23.0 to 2.23.1 (JabRef#11003)
  Bump org.javamodularity.moduleplugin from 1.8.14 to 1.8.15 (JabRef#11002)
  Bump jakarta.xml.bind:jakarta.xml.bind-api from 4.0.1 to 4.0.2 (JabRef#11004)
  Bump softprops/action-gh-release from 1 to 2 (JabRef#11000)
  Bump gittools/actions from 0.13.2 to 0.13.4 (JabRef#11001)
  Update custom-svg-icons.md (JabRef#10999)
  Update Texworks icon (JabRef#10998)
  Use tags editor for auto completion preferences (JabRef#10990)
  Enable auto merge of CHANGELOG.md (JabRef#10986)
  Enhance DOI parser to deal with special characters (JabRef#10989)
  ...

# Conflicts:
#	build.gradle
Siedlerchr added a commit that referenced this pull request Mar 17, 2024
* upstream/main: (26 commits)
  Speed up failure reporting (#11030)
  Importing of BibDesk Groups and Linked Files (#10968)
  Convert RemoveBracesFormatterTest to @ParameterizedTest (#11033)
  Update teaching.md
  Remove non-existing recipe (#11029)
  Update CSL styles (#11031)
  Clean up defintions of entry types (#11013)
  Fix log file path on Windows (#11028)
  Change to rolling logs (#11023)
  chore: remove repetitive words (#11015)
  Fix test names (#11014)
  Remove obsolete "Comments" tab configuration (#11011)
  Fix "Other fields" tab respecting custom tabs (#11012)
  [WIP] Extract PDF References (#10437)
  Fixed jump to entry from crossref (#11009)
  fix suggestion provider for crossref field (#10962)
  Use SequencedSet for required and optional fields (#11007)
  Bump io.github.classgraph:classgraph from 4.8.165 to 4.8.168 (#11005)
  Bump org.glassfish.hk2:hk2-api from 3.0.6 to 3.1.0 (#11006)
  Bump org.apache.logging.log4j:log4j-to-slf4j from 2.23.0 to 2.23.1 (#11003)
  ...

# Conflicts:
#	src/main/resources/csl-styles
Siedlerchr added a commit that referenced this pull request Mar 17, 2024
* upstream/main: (26 commits)
  Speed up failure reporting (#11030)
  Importing of BibDesk Groups and Linked Files (#10968)
  Convert RemoveBracesFormatterTest to @ParameterizedTest (#11033)
  Update teaching.md
  Remove non-existing recipe (#11029)
  Update CSL styles (#11031)
  Clean up defintions of entry types (#11013)
  Fix log file path on Windows (#11028)
  Change to rolling logs (#11023)
  chore: remove repetitive words (#11015)
  Fix test names (#11014)
  Remove obsolete "Comments" tab configuration (#11011)
  Fix "Other fields" tab respecting custom tabs (#11012)
  [WIP] Extract PDF References (#10437)
  Fixed jump to entry from crossref (#11009)
  fix suggestion provider for crossref field (#10962)
  Use SequencedSet for required and optional fields (#11007)
  Bump io.github.classgraph:classgraph from 4.8.165 to 4.8.168 (#11005)
  Bump org.glassfish.hk2:hk2-api from 3.0.6 to 3.1.0 (#11006)
  Bump org.apache.logging.log4j:log4j-to-slf4j from 2.23.0 to 2.23.1 (#11003)
  ...

# Conflicts:
#	src/main/resources/csl-styles
@aqurilla

aqurilla commented Apr 6, 2024

Copy link
Copy Markdown
Contributor Author

@koppor no problem, thank you!

@koppor

koppor commented Apr 7, 2024

Copy link
Copy Markdown
Member

@koppor no problem, thank you!

@aqurilla Just as side note: I implement the offline parsing at #11156. Thanks to your "Framework", I could focus on the logic part!

@aqurilla

aqurilla commented Apr 7, 2024

Copy link
Copy Markdown
Contributor Author

@koppor that is great to hear! Thanks for adding the offline functionality for this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature request: extract pdf references

7 participants