Add MathML support when importing PubMed by aqurilla · Pull Request #9963 · JabRef/jabref

aqurilla · 2023-05-30T03:26:32Z

This fixes #4273 and fixes #6302 by adding a MathML parser that handles <math> elements in the imported XML file. The parser uses an XLST transformation file to perform the conversion from MathML to LaTeX.

I tried out a couple of different XLST files and the one at https://xsltml.sourceforge.net/ works the best for string output. This library contains a README file which I have included - please let me know if we need to remove it or reorganize its contents elsewhere.

Mandatory checks

Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

Siedlerchr · 2023-05-30T12:15:22Z

Thanks for tackling this issue! I checked the license/readme of the xslt library and when I see this right it is based on MIT license. So yeah, we definitely need to keep this Readme together with the xslt statements.
I'll take a look at your implementaiton later

Siedlerchr · 2023-05-30T15:19:33Z

+
+public class MathMLParser {
+    private static final Logger LOGGER = LoggerFactory.getLogger(MathMLParser.class);
+    private static final String XSLT_FILE_PATH = "src/main/resources/xslt/mathml_latex/mmltex.xsl";


This could lead to problems when JabRef is packaged, as then the file path is inside the jar.
better:

Suggested change

private static final String XSLT_FILE_PATH = "src/main/resources/xslt/mathml_latex/mmltex.xsl";

private static final String XSLT_FILE_PATH = "/xslt/mathml_latex/mmltex.xsl";

Thanks for your feedback! I'll make these changes

Siedlerchr · 2023-05-30T15:26:47Z

+
+            // convert to LaTeX using XSLT file
+            Source xmlSource = new StreamSource(new StringReader(xmlContent));
+            Source xsltSource = new StreamSource(new File(XSLT_FILE_PATH));


See above, when Jabref is packaged as modularized app the resource loading is different, so I would rather use something like this. Important is the second argument; otherwise the file cannot be found.

Suggested change

Source xsltSource = new StreamSource(new File(XSLT_FILE_PATH));

URL xsltResource = MathMLParser.class.getResource(XSLT_FILE_PATH);

xsltSource = new StreamSource(xsltResource.openStream(), xsltResource.toURI().toASCIIString());

Siedlerchr · 2023-05-30T16:12:12Z

+    }
+
+    private static String getXMLCData(XMLStreamReader reader) {
+        return "<![CDATA[" + reader.getText() + "]]>";


Honestly, I don't understand this class. What is the purpose of building xml tags manually again?

This class was added since we are using a StaX parser in the MedlineImporter class, which does not load the full XML data into memory. Instead we have a stream and can progress only in a forward manner traversing through all the tag elements. I was not able to find any inbuilt library method on the StaX parser itself to easily extract the content between two main parent tags (<math> in this case), so this custom logic was required. Following that, the extracted XML string is used for carrying out the transformation. I hope that clarifies things!

Ah yeah I see, thanks for the explanation. Searched a bit around but seems like the Stax Parser is only for single documents. So this is fine for me then!

calixtus · 2023-06-01T23:09:25Z

Thank you for your contribution. We would be happy to see more contributions from your side. 😍

aqurilla added 2 commits May 29, 2023 20:01

add mathml support for medline importer

d571141

add changelog entry

4f27f03

aqurilla marked this pull request as draft May 30, 2023 03:40

fix checkstyle issues

4922724

aqurilla marked this pull request as ready for review May 30, 2023 04:30

Siedlerchr requested changes May 30, 2023

View reviewed changes

aqurilla added 2 commits May 30, 2023 20:37

remove remaining tabs

f1d01e5

update resource loading method

9400208

calixtus reviewed Jun 1, 2023

View reviewed changes

Comment thread src/main/java/org/jabref/logic/importer/util/MathMLParser.java

Siedlerchr approved these changes Jun 1, 2023

View reviewed changes

Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Jun 1, 2023

calixtus approved these changes Jun 1, 2023

View reviewed changes

calixtus merged commit c7ada34 into JabRef:main Jun 1, 2023

koppor mentioned this pull request Sep 7, 2023

Remove obsolete class StaxParser #10346

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MathML support when importing PubMed#9963

Add MathML support when importing PubMed#9963
calixtus merged 5 commits into
JabRef:mainfrom
aqurilla:fix-for-issue-4273

aqurilla commented May 30, 2023 •

edited by Siedlerchr

Loading

Uh oh!

Siedlerchr commented May 30, 2023

Uh oh!

Siedlerchr May 30, 2023

Uh oh!

aqurilla May 31, 2023

Uh oh!

Siedlerchr May 30, 2023

Uh oh!

Siedlerchr May 30, 2023

Uh oh!

aqurilla May 31, 2023

Uh oh!

Siedlerchr Jun 1, 2023

Uh oh!

Uh oh!

calixtus commented Jun 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	private static final String XSLT_FILE_PATH = "src/main/resources/xslt/mathml_latex/mmltex.xsl";
	private static final String XSLT_FILE_PATH = "/xslt/mathml_latex/mmltex.xsl";

-            Source xsltSource = new StreamSource(new File(XSLT_FILE_PATH));
+                       URL xsltResource = MathMLParser.class.getResource(XSLT_FILE_PATH);
+             xsltSource = new StreamSource(xsltResource.openStream(), xsltResource.toURI().toASCIIString());

Uh oh!

Conversation

aqurilla commented May 30, 2023 • edited by Siedlerchr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mandatory checks

Uh oh!

Siedlerchr commented May 30, 2023

Uh oh!

Siedlerchr May 30, 2023

Choose a reason for hiding this comment

Uh oh!

aqurilla May 31, 2023

Choose a reason for hiding this comment

Uh oh!

Siedlerchr May 30, 2023

Choose a reason for hiding this comment

Uh oh!

Siedlerchr May 30, 2023

Choose a reason for hiding this comment

Uh oh!

aqurilla May 31, 2023

Choose a reason for hiding this comment

Uh oh!

Siedlerchr Jun 1, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

calixtus commented Jun 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aqurilla commented May 30, 2023 •

edited by Siedlerchr

Loading