-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
When converting native -> markdown (with tex_math_dollars), DisplayMath that contains leading/trailing newlines is now emitted as a single-line $$...$$ in pandoc 3.7+, whereas pandoc 3.6.4 preserved the internal newlines:
-
pandoc 3.6.4 (previous behavior)
$$ e = mc^2 $$ -
pandoc 3.7+ (actual)
$$e = mc^2$$
This breaks use-cases where consumers expect a multi-line display-math block (for example: producing Markdown for tools like remark/remark-math + KaTeX or Docusaurus MDX that treat $$\n...\n$$ as a block).
Here is how to reproduce
-
Start pandoc in native mode to build the test Inline element (or use a file containing display math with surrounding blank lines):
❯ pandoc -f native -t markdown [ Para [ Math DisplayMath "\ne = mc^2\n" ] ] ^D -
Observe output in 3.6.4:
$$ e = mc^2 $$Observe output in 3.7:
$$e = mc^2$$
Looking at history I found what changed this.
Before 3.7 the Markdown writer used simple concatenation for $$ display math (effectively "$$" <> literal str <> "$$"). In 3.7 the writer now uses the new delimited helper which pulls leading/trailing whitespace (including newlines) outside the opener/closer.
pandoc/src/Text/Pandoc/Writers/Shared.hs
Lines 845 to 865 in 8a75c07
| -- | Add an opener and closer to a Doc. If the Doc begins or ends | |
| -- with whitespace, export this outside the opener or closer. | |
| -- This is used for formats, like Markdown, which don't allow spaces | |
| -- after opening or before closing delimiters. | |
| delimited :: Doc Text -> Doc Text -> Doc Text -> Doc Text | |
| delimited opener closer content = | |
| mconcat initialWS <> opener <> mconcat middle <> closer <> mconcat finalWS | |
| where | |
| contents = toList content | |
| (initialWS, rest) = span isWS contents | |
| (reverseFinalWS, reverseMiddle) = span isWS (reverse rest) | |
| finalWS = reverse reverseFinalWS | |
| middle = reverse reverseMiddle | |
| isWS NewLine = True | |
| isWS CarriageReturn = True | |
| isWS BreakingSpace = True | |
| isWS BlankLines{} = True | |
| isWS _ = False | |
| toList (Concat (Concat a b) c) = toList (Concat a (Concat b c)) | |
| toList (Concat a b) = a : toList b | |
| toList x = [x] |
pandoc/src/Text/Pandoc/Writers/Markdown/Inline.hs
Lines 524 to 550 in 8a75c07
| inlineToMarkdown opts (Math DisplayMath str) = do | |
| variant <- asks envVariant | |
| case () of | |
| _ | variant == Markua -> do | |
| let attributes = attrsToMarkua opts (addKeyValueToAttr ("",[],[]) | |
| ("format", "latex")) | |
| return $ blankline <> attributes <> cr <> literal "```" <> cr | |
| <> literal str <> cr <> literal "```" <> blankline | |
| | otherwise -> case writerHTMLMathMethod opts of | |
| WebTeX url -> | |
| let str' = T.strip str | |
| in (\x -> blankline <> x <> blankline) `fmap` | |
| inlineToMarkdown opts (Image nullAttr [Str str'] | |
| (url <> urlEncode str', str')) | |
| _ | isEnabled Ext_tex_math_gfm opts -> | |
| return $ cr <> (literal "``` math" | |
| $$ literal (T.dropAround (=='\n') str) | |
| $$ literal "```") <> cr | |
| | isEnabled Ext_tex_math_dollars opts -> | |
| return $ delimited "$$" "$$" (literal str) | |
| | isEnabled Ext_tex_math_single_backslash opts -> | |
| return $ "\\[" <> literal str <> "\\]" | |
| | isEnabled Ext_tex_math_double_backslash opts -> | |
| return $ "\\\\[" <> literal str <> "\\\\]" | |
| | otherwise -> (\x -> cr <> x <> cr) `fmap` | |
| (texMathToInlines DisplayMath str >>= inlineListToMarkdown opts) | |
The delimited function intentionally extracts initial and trailing whitespace (NewLine, CarriageReturn, BreakingSpace, BlankLines) and places it outside the opener/closer. That behavior is often useful for inline delimiters (emphasis, code markers, etc.) but has the side effect of removing the interior newlines for display math when the tex_math_dollars path uses delimited "$$" "$$" (literal str). So I wonder if this is expected change for display math block ?
This is a problem for some Markdown toolchains that distinguish between:
- block display math:
$$\n<math>\n$$(treated as a block) - inline math:
$<math>$or single-line$$<math>$$(treated inline)
Converting a display math block to $$<math>$$ can change rendering from block-centered math to inline math, breaking layout.
Context
- I originally noticed this while generating Markdown for Docusaurus / MDX with remark-math + KaTeX. The single-line
$$...$$is interpreted as inline math by the toolchain and not displayed as a centered block. - I believe previous behavior in 3.6.4 preserved the internal newlines for DisplayMath; the change to
delimitedin 3.7 appears to have caused this regression.