Concurrent fetchers#3881
Conversation
tobiasdiez
left a comment
There was a problem hiding this comment.
In general, I like the idea. However, to some extend the order of the fulltext fetcher is also important. For example, we probably prefer to have a published paper over just the preprint.
|
@tobiasdiez If that really matters, we could invokeAll and assign priorities to the fetchers or something like that. I'm not sure if the preprint (in reality) really differs (or how often) from the published paper. |
|
Priorities or clustering in Authority, Journals and preprints would be a good solution in my opinion. I know a few instances where authors didn't update their arxiv preprint with the revised and published version. Since even the slightest changes could shift the equation or theorem numbering, having the published PDF is in general desirable. |
|
I thought about this for a moment.
What really annoys me is that the download takes so much time now. Note sure which way to go here. |
|
At the moment we try the original publisher for 10 seconds via the DOIResolution. |
|
Maybe we can offer a switch? E.g. Prefer Official papers over Preprints? |
|
I implemented a possible solution in #3882 |
|
Ok, these are good points. What do think about combining both approaches: we cluster the fetcher by trust level and run all fetchers in a cluster in parallel. Thus the performance is still Something like: for (TrustLevel trustLevel : TrustLevel.values()) {
var tasks = fetchers.stream()
.filter(fetcher::getTrustLevel == trustLevel)
.map(fetcher -> () -> fetcher.fetch(entry))
.collect(list());
try {
return executer.invokeAny(tasks);
} catch( ExecutionException) {
// No fetcher successful, continue in next trust level bracket
}
}( |
|
It's probably easier an more clear to just run all fetchers as it is now and then select the best authority. Don't see too much benefits running them after another except for multiple code loops then. |
|
Closed in favor of #3882 |
* Parallel fetchers and first wins * Trust level implementation #3881 * Fix ordering * Add tests * Code style * Trust levels * Google refactoring * Syntax error * Reduce calls by one as mimeType is already known for fulltext as PDF #3879 * Fix test * Unued imports * Remove test * Refactoring * Feedback * Graceful shutdown and force shutdown for non-terminating tasks * 60 seconds * Revert test * Add Getters * Mock tests * Refactor to lambda * Revert "60 seconds" This reverts commit 27fa0e8. * Revert "Graceful shutdown and force shutdown for non-terminating tasks" This reverts commit f59a3c6. * Remove unused method
Trying to improve the speed of the fulltext fetcher:
@JabRef/developers WDYT?