Skip to content

Support Unicode relations#933

Merged
k4b7 merged 4 commits intoKaTeX:masterfrom
ronkok:master
Oct 17, 2017
Merged

Support Unicode relations#933
k4b7 merged 4 commits intoKaTeX:masterfrom
ronkok:master

Conversation

@ronkok
Copy link
Collaborator

@ronkok ronkok commented Oct 14, 2017

This is the first in a series of PRs to give KaTeX the ability to recognize Unicode character input. The code in this PR follows the style of PR #410.

All the characters in this PR will produce rel atoms. I’ll submit PRs for other atom types later.

This is the first in a series of PRs to give KaTeX the ability to recognize Unicode character input. The code in this PR follows the style of PR #410.
All the characters in this PR will produce rel atoms. I’ll submit PRs for other atom types later.
@ronkok
Copy link
Collaborator Author

ronkok commented Oct 14, 2017

I’ve omitted, for now, some characters because they are HTML reserved characters, or because they need stackrel macros, or because they are of ambiguous atom type or ambiguous name. Omissions include: <, >, :, , ,|, all triangles, , , , , etc. Follow up on these omissions will come later. So will arrows and negated relations.

@ronkok
Copy link
Collaborator Author

ronkok commented Oct 14, 2017

Earlier KaTeX efforts toward Unicode input have stalled for fear of future broken content. So I’ve collected some data for use in vetting this work. The following table shows: (1) the KaTeX function to which each character is mapped, (2) confirmation whether unicode math and/or John Cook’s conversion page map it the same way, (3) possible alternates that I did not use.

Character Code point Maps to unicode math Cook Not used
U+2208 \in both \varin
U+220B \ni \ni, \varni \owns, \backepsilon, \varni
U+221D \propto both \varpropto
U+223C \sim all \thicksim, \difference
U+223D \backsim
U+2242 \eqsim
U+2243 \simeq \sime
U+2245 \cong
U+2248 \approx both \thickapprox
U+224A \approxeq
U+224D \asymp
U+224E \Bumpeq
U+224F \bumpeq
U+2250 \doteq
U+2251 \Doteq \doteqdot
U+2252 \fallingdotseq
U+2253 \risingdotseq
U+2256 \eqcirc
U+2257 \circeq
U+225C \triangleq
U+2261 \equiv
U+2264 \leq \le
U+2265 \geq \ge
U+2266 \leqq
U+2267 \geqq
U+226B \gg
U+226C \between
U+2273 \gtrsim
U+2277 \gtrless
U+227A \prec
U+227B \succ
U+227C \preccurlyeq
U+227D \succcurlyeq
U+227E \precsim
U+227F \succsim
U+2282 \subset
U+2283 \supset
U+2286 \subseteq
U+2287 \supseteq
U+228F \sqsubset
U+2290 \sqsupset
U+2291 \sqsubseteq
U+2292 \sqsupseteq
U+22A2 \vdash
U+22A3 \dashv
U+22A9 \Vdash
U+22AA \Vvdash
U+22B8 \multimap
U+22C8 \bowtie \Join
U+22CD \backsimeq
U+22D0 \Subset
U+22D1 \Supset
U+22D4 \pitchfork
U+22D9 \ggg \gggtr
U+22DB \gtreqless
U+22DE \curlyeqprec
U+22DF \curlyeqsucc
U+2322 \frown \frown, \smallfrown \smallfrown, \overparen
U+2323 \smile \smile, \smallsmile \smallsmile, \underparen
U+2A7E \geqslant
U+2A86 \gtrapprox
U+2A8C \gtreqqless
U+2A95 \eqslantless
U+2A96 \eqslantgtr
U+2AAF \preceq
U+2AB0 \succeq
U+2AB7 \precapprox
U+2AB8 \succapprox
U+2AC5 \subseteqq
U+2AC6 \supseteqq

@k4b7
Copy link
Member

k4b7 commented Oct 14, 2017

@ronkok is unicode math in the header of the table mean that you used the unicode-math? (the answer is "yes", I should read comments more carefully 😅). As we start adding support for more characters, I'd like to add a way for us to track which package each mapping is coming from. The reason being is that eventually I'd like for there to be a way to configure which "packages" are active.

@k4b7
Copy link
Member

k4b7 commented Oct 14, 2017

I see that you added , true); to a various defineSymbol calls. Does this mean that our existing mapping just happened to line up with unicode-math?

@k4b7
Copy link
Member

k4b7 commented Oct 14, 2017

The unicode math link in the third comment is really useful.

@ronkok
Copy link
Collaborator Author

ronkok commented Oct 14, 2017

This particular PR does have a one-to-one match between unicode math and the KaTeX symbols.js file. In the PRs to come that will often be true but not always. There are some conflicts to come. I omitted any conflicts from this PR. I am trying to submit the easy cases first. After they are out of the way, we can focus attention on the hard cases.

@k4b7
Copy link
Member

k4b7 commented Oct 14, 2017

In terms of conflicts I think we should defer to using unicode-math for now. That way whenever we introduce the concept of the "packages" sometime in the future we'll have a consistent situation where all of our unicode chars came from that one package.

@ronkok
Copy link
Collaborator Author

ronkok commented Oct 14, 2017

When you see all the unicode math choices, I predict that you will disagree with a few of them.

  1. A few of their function names conflict with names from other packages, like mathtools.
  2. A couple of their names conflict with AMS!

Again, we don't have to settle the hard cases now. That can come later.

@k4b7
Copy link
Member

k4b7 commented Oct 14, 2017

Well that's not cool. 😞

@ronkok
Copy link
Collaborator Author

ronkok commented Oct 14, 2017

Most of the unicode math mappings are good ones. There are just a few head scratchers in there.

@ronkok
Copy link
Collaborator Author

ronkok commented Oct 14, 2017

Also, here's the data behind John Cook's conversion page.

@edemaine
Copy link
Member

edemaine commented Oct 15, 2017

Wow, thanks for taking this on, @ronkok!

We can add unicode-math to #793 (which I or one of us should turn into a contrib file so that relevant PRs can easily add to it).

This commit fixes a brain cramp of mine.
@ronkok
Copy link
Collaborator Author

ronkok commented Oct 15, 2017

The most recent commit fixes a brain cramp of mine. After going to the trouble to create a table that documents a good mapping, the script I used to edit symbols.js then made its selection from column 1 instead of from column 3 of the table.

Fixed now.

Copy link
Member

@k4b7 k4b7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's exciting to see support for so many unicode characters being added.

@k4b7 k4b7 merged commit bd4e146 into KaTeX:master Oct 17, 2017
@k4b7
Copy link
Member

k4b7 commented Oct 17, 2017

@ronkok thanks for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants