Hello, I am using Grobid for my project and I am working with PDF Drug Labels. I have noticed a few things that happen when the pdf is extracted into xml:
- It often times does not extract the text that comes right after an image
- It sometimes captures a new head into the preceding header. For example after extracting section 12.3, it extracts section 12.4 as a continuation of the preceding header.
Could this be looked at please?
Hello, I am using Grobid for my project and I am working with PDF Drug Labels. I have noticed a few things that happen when the pdf is extracted into xml:
Could this be looked at please?