Conversation
|
I am currently running into the issue described in #712 (comment) |
I got around the problem by manually creating |
| package org.grobid.core.engines; | ||
|
|
||
| import org.apache.commons.lang3.tuple.Pair; | ||
| import org.grobid.core.analyzers.GrobidAnalyzer; |
There was a problem hiding this comment.
There are quite a few unused imports. Might be good to tidy it up (and remove commented out code, perhaps mark test to be ignored instead).
| GrobidFactory.reset(); | ||
| } | ||
|
|
||
| public DocumentPiece getWholeDocumentPiece(Document doc) { |
There was a problem hiding this comment.
maybe getWholeDocumentPiece and getWholeDocumentParts could be moved to a central place, if it doesn't exist already.
|
Other parsers are potentially affected as well. There appear to be a lot of duplication and and could probably be refactored. |
|
Raised suggestion for refactoring the features: #718 |
lfoppiano
left a comment
There was a problem hiding this comment.
I'm having trouble understanding the whole picture here, so I'm not sure I can really review this part.
There is an example in the issue #712 - happy to add more information? |
|
I have regenerated the fulltext model feature files with the fix, but actually there is no difference with before with respect to starting block. I also retrained the model (before checking the difference in features), in branch https://github.com/kermitt2/grobid/tree/fix-712 It might appear in some new files like in your example, but the benchmark on PMC set only changes at the second decimal, so it should be minor. So we could simply merge the fix, but no need to update the training files and model for this. |
resolves #712