In trying to get Pub2TEI working on the grobid gold standard data from PMC, I ran into the DTD issues mentioned in the README. After some research, I was able to discover that DTD loading can be disabled with the following switch:
--parserFeature?uri=http%3A//apache.org/xml/features/nonvalidating/load-external-dtd:false
References:
I've attached a file from the grobid PMC gold standard data I was having trouble with. The new switch allows the conversion to proceed.
sample.zip
The sample command in the README could be updated to:
java -jar Samples/saxon9he.jar \
--parserFeature?uri=http%3A//apache.org/xml/features/nonvalidating/load-external-dtd:false \
-a:off \
-dtd:off \
-expand:off \
-o:out.tei.xml \
-s:Samples/TestPubInput/BMJ/bmj_sample.xml \
-t \
-xsl:Stylesheets/Publishers.xsl
In trying to get Pub2TEI working on the grobid gold standard data from PMC, I ran into the DTD issues mentioned in the README. After some research, I was able to discover that DTD loading can be disabled with the following switch:
--parserFeature?uri=http%3A//apache.org/xml/features/nonvalidating/load-external-dtd:falseReferences:
--featureswitchload-external-dtdI've attached a file from the grobid PMC gold standard data I was having trouble with. The new switch allows the conversion to proceed.
sample.zip
The sample command in the README could be updated to:
java -jar Samples/saxon9he.jar \ --parserFeature?uri=http%3A//apache.org/xml/features/nonvalidating/load-external-dtd:false \ -a:off \ -dtd:off \ -expand:off \ -o:out.tei.xml \ -s:Samples/TestPubInput/BMJ/bmj_sample.xml \ -t \ -xsl:Stylesheets/Publishers.xsl