Switch to Latex2unicode#2532
Conversation
# Conflicts: # CHANGELOG.md
|
Please remove the old latex2unicode formatter code, otherwise good to go. |
tobiasdiez
left a comment
There was a problem hiding this comment.
In general, LGTM....just some minor remarks.
| */ | ||
| public class LatexToUnicodeAdapter { | ||
|
|
||
| public static String format(String inField) { |
There was a problem hiding this comment.
I would merge this class into the LatexToUnicodeFormatter. I don't see any advantage of having two classes that do essentially the same.
There was a problem hiding this comment.
This is not possible, since we need the unicode conversion in the model to internally convert fields of BibEntries. If we just had the formatter, we would have a dependency from model to logic, which is forbidden in the architecture.
| - Redesigned journal abbreviations dialog. | ||
| - Redesigned error console. | ||
| - All file dialogs now use the native file selector of the OS. [#1711](https://github.com/JabRef/jabref/issues/1711) | ||
| - Switch to the latex2unicode library https://github.com/tomtung/latex2unicode for converting LaTeX to unicode |
| public void testFormatTextit() { | ||
| // See #1464 | ||
| assertEquals("text", formatter.format("\\textit{text}")); | ||
| assertEquals("\uD835\uDC61\uD835\uDC52\uD835\uDC65\uD835\uDC61", formatter.format("\\textit{text}")); |
There was a problem hiding this comment.
d835 is the symbol for an invalid character. Is this conversation a bug or indeed the expected outcome?
There was a problem hiding this comment.
It seems to be used as an escaping symbol for the following hex. I arrived at this sequence by copying the italicized text in Intellij into the code which transformed it into these hexes.
If you remove the first hex, then the first letter of the expected output turns into a ? for Intellij, so this really seems to be the escape sequence.
I have to admit, though, that my knowledge on the correct hexes of italicized text in unicode is slim.
There was a problem hiding this comment.
I don't expect anyone to ever use italicized text in their entries - after all, why would you? But isn't it cool to know that with the converter we could theoretically support it? :-)
There was a problem hiding this comment.
Some more details here https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates :)
|
Since the build is passing and there are two approved reviews, I will merge this into master now and close the related issues. |
* upstream/master: Switch to Latex2unicode (#2532)

Motivation is discussed in #2465. Fixes many conversion problems, such as those discussed in #2063 and #207.