I am using a docker container of docker pull lfoppiano/grobid:0.8.0
v0.7.3 also tested
- What is your Java version (
java --version)?
just used official docker: lfoppiano/grobid
- In case of build or run errors, please submit the error while running gradlew with
--stacktrace and --info for better log traces (e.g. ./gradlew run --stacktrace --info) or attach the log file logs/grobid-service.log.
No this file, as using docker
Problem
- The general paragraph text which is not belong to a figure is wrongly recognized as a
figDesc
- Part of the wrongly recognized text as figDesc also in the general paraph text "body/div/p"
- This mean its repeated in two part of tei xml: "body/figure/figDesc/div/p" and "body/div/p"
original pdf area

extracted xml

Reference materials
Used pdf
176_liu2010.pdf
Result tei xml
note: github not accept .xml file, I modified its suffix as .txt
176_liu2010.pdf.tei.xml.txt
I am using a docker container of
docker pull lfoppiano/grobid:0.8.0v0.7.3 also tested
java --version)?just used official docker:
lfoppiano/grobid--stacktraceand--infofor better log traces (e.g../gradlew run --stacktrace --info) or attach the log filelogs/grobid-service.log.No this file, as using docker
Problem
figDescoriginal pdf area
extracted xml
Reference materials
Used pdf
176_liu2010.pdf
Result tei xml
note: github not accept .xml file, I modified its suffix as .txt
176_liu2010.pdf.tei.xml.txt