JabRef version 5.0 on Ubuntu 19.10
Steps to reproduce the behavior
- In Pubmed, search for a publication which has a superscript or italics in its title.
- For example, in Pubmed copy and paste the following text into Pubmed's search bar, then hit Search:
Predicting Locally Advanced Rectal Cancer Response to Neoadjuvant Therapy With 18 F-FDG PET and MRI Radiomics Features
-
Pubmed should find the publication with this title.
-
Notice, on Pubmed's results web page, how this publication has the number 18 as a superscript in the title.
-
Copy the publication's PMID number. You can find it in the lower left corner. In this case it is:
30637502
-
Download the XML results file from Pubmed for this result.
Depending on whether you are using the old Pubmed website, or the new one, do as follows:
-
If using the old Pubmed website, with the results displayed, click on:
- Send to
- File
- XML
- Create file
- save the file
-
If using the new XML web site, there is no XML download, so use this other method instead, which downloads the XML result from Pubmed using the "wget" utility. The wget commandline is like this:
wget -O pubmed_result.xml 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=30637502&retmode=xml'
where the PMID number you found above is placed after the:
id=
in the url used by wget.
- Open Jabref
- Menu:
- File
- Import into new library
- open the XML results file from Pubmed you saved above.
Jabref imports the publication into a new library.
-
In Jabref, double click on the publication's row.
This opens the details panel.
-
In the details panel, click on the left-most tab called:
Required fields
-
Copy and paste the Title field's value.
You get:
Predicting locally advanced rectal cancer response to neoadjuvant therapy with , javax.xml.bind.JAXBElement@4e4ed55f, F-FDG PET and MRI radiomics features.
- Notice the problem in the title: The superscript with the number 18 has been replaced with the java-related string:
javax.xml.bind.JAXBElement@4e4ed55f
-
Open the XML results file in a text editor. See the XML file inside the attached zip file.
-
Search in the XML file for the text:
<ArticleTitle>
-
You see that the contents of the <ArticleTitle> XML tag is:
Predicting locally advanced rectal cancer response to neoadjuvant therapy with <sup>18</sup>F-FDG PET and MRI radiomics features.
-
Notice the:
<sup>
tag in the XML.
-
This <sup> tag is valid, according to Pubmed's DTD file, used to define what is valid inside Pubmed's XML output. See the DTD file inside the attached zip file.
-
Open the DTD file in a text editor.
-
In the DTD file, near the beginning, see the line:
<!ENTITY % text "#PCDATA | b | i | sup | sub | u" >
This says that the following XML / HTML tags are allowed in text entities. The allowed tags are:
Elsewhere in the DTD is says that the article title is allowed to be a text.
When Jabref encounters these tags, inside the value of the title, Jabref produces a text like:
javax.xml.bind.JAXBElement@4e4ed55f
instead of producing the text that was inside the tag.
Other fields besides the Title
I believe there is a similar problem with superscript and italics in the:
field as well, when importing from Pubmed's XML.
Jabref's Log was empty
Jabref's error console was empty, after importing the above XML file from Pubmed.
Checked XML validity against its DTD
I checked the XML file against its DTD, using the first three or four online DTD checkers, that I found googling for:
xml dtd validator online
All of the validators I tried replied that the XML is valid against its DTD.
Attachment
I attach a zip file:
Pubmed XML import superscript tag in title problem.zip
which contains two files:
-
An XML results file for the above Pubmed search, downloaded from Pubmed.
-
Pubmed's DTD file, used to define valid Pubmed XML output, downloaded from Pubmed here:
https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd
The above url of the DTD can be found inside the XML file, near the top of the file.
Thank you.
JabRef version 5.0 on Ubuntu 19.10
Steps to reproduce the behavior
Predicting Locally Advanced Rectal Cancer Response to Neoadjuvant Therapy With 18 F-FDG PET and MRI Radiomics FeaturesPubmed should find the publication with this title.
Notice, on Pubmed's results web page, how this publication has the number 18 as a superscript in the title.
Copy the publication's PMID number. You can find it in the lower left corner. In this case it is:
30637502Download the XML results file from Pubmed for this result.
Depending on whether you are using the old Pubmed website, or the new one, do as follows:
If using the old Pubmed website, with the results displayed, click on:
If using the new XML web site, there is no XML download, so use this other method instead, which downloads the XML result from Pubmed using the "wget" utility. The wget commandline is like this:
wget -O pubmed_result.xml 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=30637502&retmode=xml'
where the PMID number you found above is placed after the:
id=in the url used by wget.
Jabref imports the publication into a new library.
In Jabref, double click on the publication's row.
This opens the details panel.
In the details panel, click on the left-most tab called:
Required fieldsCopy and paste the Title field's value.
You get:
Open the XML results file in a text editor. See the XML file inside the attached zip file.
Search in the XML file for the text:
<ArticleTitle>You see that the contents of the <ArticleTitle> XML tag is:
Notice the:
<sup>tag in the XML.
This <sup> tag is valid, according to Pubmed's DTD file, used to define what is valid inside Pubmed's XML output. See the DTD file inside the attached zip file.
Open the DTD file in a text editor.
In the DTD file, near the beginning, see the line:
This says that the following XML / HTML tags are allowed in
textentities. The allowed tags are:Elsewhere in the DTD is says that the article title is allowed to be a
text.When Jabref encounters these tags, inside the value of the title, Jabref produces a text like:
instead of producing the text that was inside the tag.
Other fields besides the Title
I believe there is a similar problem with superscript and italics in the:
field as well, when importing from Pubmed's XML.
Jabref's Log was empty
Jabref's error console was empty, after importing the above XML file from Pubmed.
Checked XML validity against its DTD
I checked the XML file against its DTD, using the first three or four online DTD checkers, that I found googling for:
xml dtd validator onlineAll of the validators I tried replied that the XML is valid against its DTD.
Attachment
I attach a zip file:
Pubmed XML import superscript tag in title problem.zip
which contains two files:
An XML results file for the above Pubmed search, downloaded from Pubmed.
Pubmed's DTD file, used to define valid Pubmed XML output, downloaded from Pubmed here:
https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd
The above url of the DTD can be found inside the XML file, near the top of the file.
Thank you.