Add MathML support when importing PubMed#9963
Conversation
|
Thanks for tackling this issue! I checked the license/readme of the xslt library and when I see this right it is based on MIT license. So yeah, we definitely need to keep this Readme together with the xslt statements. |
|
|
||
| public class MathMLParser { | ||
| private static final Logger LOGGER = LoggerFactory.getLogger(MathMLParser.class); | ||
| private static final String XSLT_FILE_PATH = "src/main/resources/xslt/mathml_latex/mmltex.xsl"; |
There was a problem hiding this comment.
This could lead to problems when JabRef is packaged, as then the file path is inside the jar.
better:
| private static final String XSLT_FILE_PATH = "src/main/resources/xslt/mathml_latex/mmltex.xsl"; | |
| private static final String XSLT_FILE_PATH = "/xslt/mathml_latex/mmltex.xsl"; |
There was a problem hiding this comment.
Thanks for your feedback! I'll make these changes
|
|
||
| // convert to LaTeX using XSLT file | ||
| Source xmlSource = new StreamSource(new StringReader(xmlContent)); | ||
| Source xsltSource = new StreamSource(new File(XSLT_FILE_PATH)); |
There was a problem hiding this comment.
See above, when Jabref is packaged as modularized app the resource loading is different, so I would rather use something like this. Important is the second argument; otherwise the file cannot be found.
| Source xsltSource = new StreamSource(new File(XSLT_FILE_PATH)); | |
| URL xsltResource = MathMLParser.class.getResource(XSLT_FILE_PATH); | |
| xsltSource = new StreamSource(xsltResource.openStream(), xsltResource.toURI().toASCIIString()); |
| } | ||
|
|
||
| private static String getXMLCData(XMLStreamReader reader) { | ||
| return "<![CDATA[" + reader.getText() + "]]>"; |
There was a problem hiding this comment.
Honestly, I don't understand this class. What is the purpose of building xml tags manually again?
There was a problem hiding this comment.
This class was added since we are using a StaX parser in the MedlineImporter class, which does not load the full XML data into memory. Instead we have a stream and can progress only in a forward manner traversing through all the tag elements. I was not able to find any inbuilt library method on the StaX parser itself to easily extract the content between two main parent tags (<math> in this case), so this custom logic was required. Following that, the extracted XML string is used for carrying out the transformation. I hope that clarifies things!
There was a problem hiding this comment.
Ah yeah I see, thanks for the explanation. Searched a bit around but seems like the Stax Parser is only for single documents. So this is fine for me then!
|
Thank you for your contribution. We would be happy to see more contributions from your side. 😍 |
This fixes #4273 and fixes #6302 by adding a MathML parser that handles
<math>elements in the imported XML file. The parser uses an XLST transformation file to perform the conversion from MathML to LaTeX.I tried out a couple of different XLST files and the one at https://xsltml.sourceforge.net/ works the best for string output. This library contains a README file which I have included - please let me know if we need to remove it or reorganize its contents elsewhere.
Mandatory checks
CHANGELOG.mddescribed in a way that is understandable for the average user (if applicable)