Skip to content

Search in PDF Files#2838

Merged
calixtus merged 122 commits into
mainfrom
lucene
Jul 14, 2021
Merged

Search in PDF Files#2838
calixtus merged 122 commits into
mainfrom
lucene

Conversation

@LinusDietz

@LinusDietz LinusDietz commented May 12, 2017

Copy link
Copy Markdown
Member

This implements a fulltext-search feature for linked PDF files.

search

All linked pdf files are indexed using Apache Lucene. Search-results are displayed in a google-esque fashion in an additional tab in the entry editor (visible only if a fulltext search is active).
Users can enable/disable fulltext-search with a toggle-button in the search-bar.

  • Change in CHANGELOG.md described
  • Tests created for changes
  • Screenshots added (for bigger UI changes)
  • Manually tested changed features in running JabRef
  • Check documentation status (Issue created for outdated help page at help.jabref.org?)
  • If you changed the localization: Did you run gradle localizationUpdate?

@LinusDietz LinusDietz added component: external-files status: waiting-for-feedback The submitter or other users need to provide more information about the issue labels May 12, 2017

@tobiasdiez tobiasdiez left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wooohoooo! Thanks for you efforts to get the PDF search woring!

Just a few initial remarks for now; I will continue reviewing the next days.

Comment thread build.gradle Outdated
Comment thread gradle/wrapper/gradle-wrapper.properties Outdated
Comment thread src/main/java/org/jabref/logic/pdf/search/indexing/DocumentReader.java Outdated
Comment thread src/main/java/org/jabref/logic/pdf/search/indexing/DocumentReader.java Outdated
Comment thread src/main/java/org/jabref/logic/pdf/search/indexing/DocumentReader.java Outdated
Comment thread src/main/java/org/jabref/logic/pdf/search/indexing/DocumentReader.java Outdated
Comment thread build.gradle Outdated
Comment thread src/main/java/org/jabref/logic/pdf/search/retrieval/PdfSearcher.java Outdated
Comment thread src/test/java/org/jabref/logic/pdf/search/indexing/DocumentReaderTest.java Outdated
Comment thread src/test/java/org/jabref/logic/pdf/search/indexing/DocumentReaderTest.java Outdated
Comment thread src/test/java/org/jabref/logic/pdf/search/indexing/PdfIndexerTest.java Outdated

@Siedlerchr Siedlerchr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments, some unnecessary mocking and some little improvements

@LinusDietz LinusDietz changed the title Search in PDF Files [WIP] Search in PDF Files May 14, 2017
@LinusDietz LinusDietz removed the status: waiting-for-feedback The submitter or other users need to provide more information about the issue label May 14, 2017
Comment thread src/main/java/org/jabref/gui/JabRefMain.java Outdated

@koppor koppor left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nitpicks, it is good to go otherwise

Comment thread src/main/java/org/jabref/gui/JabRefMain.java Outdated
Comment thread src/main/java/org/jabref/logic/pdf/search/indexing/DocumentReader.java Outdated
Comment thread src/main/java/org/jabref/logic/pdf/search/indexing/DocumentReader.java Outdated
Comment thread src/main/java/org/jabref/logic/pdf/search/indexing/PdfIndexer.java Outdated
Comment thread src/main/java/org/jabref/gui/JabRefMain.java Outdated
}

public Path getFulltextIndexPath() {
Path appData = Path.of(AppDirsFactory.getInstance().getUserDataDir(JabRefFrame.FRAME_TITLE, SearchFieldConstants.VERSION, "org.jabref"));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the documentation at https://github.com/harawata/appdirs. Magic string jabref is OK for me here.

@koppor

koppor commented Jul 13, 2021

Copy link
Copy Markdown
Member

This fixes #6188

btut and others added 3 commits July 14, 2021 13:28
Removed empty line and fixed logger format

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
Removed unnecessary import
@calixtus

Copy link
Copy Markdown
Member

Im still not convinced with using the literal JabRef for index path and would like to talk about that in the call tonight. I wonder, if there isn't a better, more generic solution.

@btut

btut commented Jul 14, 2021

Copy link
Copy Markdown
Contributor

Im still not convinced with using the literal JabRef for index path and would like to talk about that in the call tonight. I wonder, if there isn't a better, more generic solution.

I agree, especially because it is used in two distinct files and needs to be identical. However, I see how it is also problematic to use the one stored in JabRefFrame. Maybe we should define a new static variable in PdfIndexer?

btut added 3 commits July 14, 2021 13:51
For now, lets use highlighted and text annotations only.

Fixes 4654
Comment thread src/main/java/org/jabref/logic/pdf/search/indexing/DocumentReader.java Outdated
@koppor

koppor commented Jul 14, 2021

Copy link
Copy Markdown
Member

Result of call - we looked up https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/store/NIOFSDirectory.html

NOTE: NIOFSDirectory is not recommended on Windows because of a bug in how FileChannel.read is implemented in Sun's JRE. Inside of the implementation the position is apparently synchronized. See here for details.

For now, we accept that and think, no issues occur.

@calixtus calixtus merged commit ddce573 into main Jul 14, 2021
@calixtus calixtus deleted the lucene branch July 14, 2021 19:49
@Siedlerchr

Copy link
Copy Markdown
Member

FYI: mac index file is located under /Users/christophs/Library/Application Support/JabRef/0.3a

@koppor

koppor commented Jul 14, 2021

Copy link
Copy Markdown
Member

Directory is determined using the appdirs library: https://github.com/harawata/appdirs#appdirs

Siedlerchr added a commit that referenced this pull request Jul 15, 2021
* upstream/main: (45 commits)
  Squashed 'buildres/csl/csl-styles/' changes from ec4a4c0..176997d (#7910)
  Update citeproc-java to 3.0.0-alpha.2 (#7911)
  Search in PDF Files (#2838)
  Removed references to apache commons logging (#7907)
  Oobranch c : ootext and rangesort (#7788)
  Bump jackson-datatype-jsr310 from 2.12.3 to 2.12.4 (#7901)
  fix markdownlint
  Bump jackson-dataformat-yaml from 2.12.3 to 2.12.4 (#7899)
  Bump postgresql from 42.2.22 to 42.2.23 (#7902)
  Bump classgraph from 4.8.109 to 4.8.110 (#7900)
  Bump gittools/actions from 0.9.9 to 0.9.10 (#7898)
  Bump flowless from 0.6.3 to 0.6.4 (#7903)
  Bump jsoup from 1.13.1 to 1.14.1 (#7904)
  Bump tika-core from 1.26 to 1.27 (#7897)
  Try even more empty lines to provoke conflicts in CHANGELOG.md after a release
  Fix position in CHANGELOG.md
  Add corporate proxy workaround for Version.getAllAvailableVersions() (#7890)
  Update to Java 16 in build.gradle (#7892)
  Preparing Changelog for the next release cycle
  New development cycle
  ...

# Conflicts:
#	gradle/wrapper/gradle-wrapper.properties
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants