Rewrite google scholar fetcher to new infrastructure#2101
Conversation
Siedlerchr
left a comment
There was a problem hiding this comment.
Looks good but i am just wondering if There is no longer a method retuning the help page name from that enum
47a75e4 to
cd57adc
Compare
|
@Siedlerchr Thanks for the hint! I've overridden the method now. |
66c730e to
65615d9
Compare
lenhard
left a comment
There was a problem hiding this comment.
The code looks fine (some nitpicking of course). Could you please merge master into this PR, so that we can really test that the fetcher is functional?
| String content = URLDownload.createURLDownloadWithBrowserUserAgent(queryURL) | ||
| .downloadToString(StandardCharsets.UTF_8); | ||
|
|
||
| Matcher m = LINK_TO_BIB_PATTERN.matcher(content); |
There was a problem hiding this comment.
No more one letter variable names, please :-)
| } | ||
|
|
||
| private BibEntry downloadEntry(String link) throws IOException, FetcherException { | ||
| String s = URLDownload.createURLDownloadWithBrowserUserAgent(link).downloadToString(StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Same here, please improve naming in this method.
|
@lenhard Done. Branch already has been rebased. |
tobiasdiez
left a comment
There was a problem hiding this comment.
In general looks good to me, just a few minor remarks.
| } | ||
|
|
||
| String url = String.format(SEARCH_URL, | ||
| String url = String.format(SEARCH_IN_TITLE_URL, |
There was a problem hiding this comment.
Could you please use the UriBuilder instead of replacing strings. It is more readable and you don't need to take care of manually encoding the input string.
|
|
||
| List<BibEntry> foundEntries = new ArrayList<>(10); | ||
|
|
||
| String queryURL = String.format(BASIC_SEARCH_URL, URLEncoder.encode(query, StandardCharsets.UTF_8.name())); |
|
|
||
| String queryURL = String.format(BASIC_SEARCH_URL, URLEncoder.encode(query, StandardCharsets.UTF_8.name())); | ||
| addHitsFromQuery(foundEntries, queryURL); | ||
| String content; |
There was a problem hiding this comment.
content variable is not needed, right?
|
|
||
| return foundEntries; | ||
| } catch (IOException e) { | ||
| throw new FetcherException("Fetching entries from Google Scholar failed: ", e); |
There was a problem hiding this comment.
The error message should probably localized (applies also to the code below). I think there is similar error handling in SearchBasedParserFetcher.
| throw new FetcherException("Parsing entries from Google Scholar bib file failed."); | ||
| } else { | ||
| BibEntry entry = entries.iterator().next(); | ||
| entry.clearField(BibEntry.KEY_FIELD); |
There was a problem hiding this comment.
The other fetcher keep the bibtexkey. Not sure what the best approach is, but we should be consistent.
|
Hm, for me this still freezes on second search. Also, I am still getting this weird exception on the command line: What about that? |
|
Are you sure that you've pulled the rebased branch @lenhard? GeneralFetcher.java is exactly the line which I reverted... and it works for me here. |
936405d to
9da5cc1
Compare
|
Fixed the aspects @tobiasdiez has mentioned. Exception: Exception localization - this should be discussed more generally... @lenhard Can you please retest whether the freezing problem persists for you? |
|
Well, JabRef doesn't freeze, but I seem to be blocked from Google Scholar now with the fetcher: Unfortunately, filling out the captcha does not help. Of course, I cannot guarantee that the freezing does not return when I manage to get unblocked ;-) Oh, and to be sure I executed a hard reset and pull for this branch, so I should be up to date. |
| pdfLink = Optional.of(new URL(s)); | ||
| break; | ||
| try { | ||
| URIBuilder uriBuilder = new URIBuilder(SEARCH_IN_TITLE_URL); |
There was a problem hiding this comment.
Why is this not covered by tests? Or is just my Firefox plugin failing?
There was a problem hiding this comment.
This the "old" part of the fetcher responsible for fulltext downloads. The tests are currently not executed at travis as they led to errors if the IP got blocked by google (See https://github.com/JabRef/jabref/blob/master/src/test/java/net/sf/jabref/logic/importer/fetcher/GoogleScholarTest.java).
However, as my new tests (currently) are running through I'll check whether this still persists.
|
Besides minor comments: Fetcher works for me without exceptions. I just tested the search string "bpel4chor". |
* upstream/master: (24 commits) hotfix NPE in the main table (#2158) Enhance side pane toggle (#1605) Added gray background text to authors field to assist newcomers (#2147) Rewrite google scholar fetcher to new infrastructure (#2101) Found entries will be shown when dropping a pdf with xmp meta data (#2150) Japanese translation (#2155) Add shortcuts to context menu (#2131) [WIP] Doi resolution ignore (#2154) Fix DuplicationChecker and key generator (#2151) just close popup on first ESC keep entry in sight Fix isSharedDatabaseAlreadyPresent check. don't change entry after search if editing an entry (#2129) Change download URL to downloads.jabref.org (#2145) fix switching edited textfield in the entry editor with TAB (#2138) Update me.champeau.gradle.jmh' version from 0.3.0 to 0.3.1 Update install4j plugin from 6.1.2 to 6.1.3 Remove gradle plugin com.github.youribonnaffe.gradle.format as it is deprecated and the config uses a non-existing file Update gradle from 3.0 to 3.1 Fix changing the font size not working when importing preferences (#2069) ... # Conflicts: # src/main/java/net/sf/jabref/gui/externalfiles/DroppedFileHandler.java # src/main/java/net/sf/jabref/gui/importer/fetcher/GoogleScholarFetcher.java
Follow up of #2082:
Google Scholar search fetcher now use the new infrastructure - old class
GoogleScholarFetcher.javawith its excessive interweaving with the UI has been removed.New implementation uses the Cookie approach @tobiasdiez proposed in #2082 (comment)
As I had no problems with Captchas during development I'll check whether getting more than the 10 hits of the first result page is possible...
Change in CHANGELOG.md describedScreenshots added (for bigger UI changes)gradle localizationUpdate?