Fix affiliation missing when using DL affiliation-address model#1166
Fix affiliation missing when using DL affiliation-address model#1166
Conversation
|
After a few iteration over it, I think I understood the principle which is of separating blocks of affiliations that are on different offset differences. My fix just avoid adding |
|
@kermitt2 I've tried to fix this a bit in a rush, at least to mitigate the issue on the docker image. I'm sorry, I might need a quick review on your side. I've pushed this fix on the branch |
|
Hi @lfoppiano the fix works fine no problem. It is surprising that the starting "\n" has such effect on the DL processing. There's nothing else to change, the segmentation goes then normally, including parallel processing. I changed this part last December and it seems I only tested with the CRF model :) |
|
Thanks! |
This PR propose a fix for the affiliation, that are lost when processing them with a DL model.
The issue seems to be in the method:
getAffiliationBlocksFromSegments()where new\nare added (in general they should be added if there is a misalignment, however they are added for sure at the beginning).https://github.com/kermitt2/grobid/blob/a95d2533f1019e900b49ea5c39a5afe355dbb4a3/grobid-core/src/main/java/org/grobid/core/engines/AffiliationAddressParser.java#L81
I patched quickly by checking that
endis not zero. However this\ndoes not work well with the DL models, at contrary with the CRF models that they are ignoring it.I've left two tests which are showing the problem from both CRF and DL: https://github.com/kermitt2/grobid/blob/bd93a61f4542f218299e2c34a82c37b75bc727ef/grobid-core/src/test/java/org/grobid/core/engines/AffiliationAddressParserTest.java#L262
The DL test is still failing, as I'm not sure really where to fix the issue.
After this is fix we would need to rebuild the grobid-full image.