Skip to content

tats-u/markdown-escaped-space

Repository files navigation

CommonMark Escaped Space Extension

A CommonMark extension that allows escaped spaces around emphasis markers and GFM autolinks in CJK (Chinese, Japanese, and Korean) text.

Packages

Problem

CommonMark has limitations when dealing with CJK text:

Emphasis Markers Adjacent to CJK Punctuation

Emphasis markers like ** may not be recognized as emphasis marks when they are adjacent to CJK punctuation marks or when soft line breaks exist between CJK characters.

For example, without this extension:

太郎は**「こんにちわ」**といった。

张三说**「你好」**的时候。

김철수는**「안녕하세요」**라고 말했습니다.

The emphasis markers here are not recognized as emphasis due to the adjacent punctuation marks.

GFM Autolinks in CJK Text

GFM's autolink feature also has compatibility issues with CJK text. Without proper handling, GFM autolinks may incorrectly extend into following CJK content or cause unwanted breaks.

For example:

詳しくはhttps://example.comをご覧ください。

Renders as:

<p>詳しくは<a href="https://example.comをご覧ください。">https://example.comをご覧ください。</a></p>

The autolink incorrectly includes the CJK text after the URL, breaking the intended link.

Common Workaround and Its Problems

A common workaround is to add a space after the URL:

詳しくはhttps://example.com をご覧ください。

This renders correctly:

<p>詳しくは<a href="https://example.com">https://example.com</a> をご覧ください。</p>

However, this workaround has a visual drawback: the space between the URL and the following CJK text is rendered as a visible space in HTML, which may look unnatural or create unexpected line breaks in some contexts. This is especially problematic for professional documents where precise spacing is important.

Solution

This extension allows you to use escaped spaces (backslash-space) to fix both emphasis recognition and autolink delineation in CJK contexts:

太郎は\ **「こんにちわ」**\ といった。

张三说\ **「你好」**\ 的时候。

김철수는\ **「안녕하세요」**\ 라고 말했습니다。

詳しくはhttps://example.com\ をご覧ください。

The escaped spaces are "not rendered" in the output but allow proper parsing while maintaining readability in the source.

This solution is also adopted in reStructuredText, which is a documentation format used by the Python community.

Features

Escaped Space for Emphasis & GFM Autolinks

Escaped spaces around emphasis markers and autolinks help with CJK text processing, ensuring proper recognition even when adjacent to CJK punctuation.

For Emphasis:

太郎は\ **「こんにちわ」**\ といった。

Renders as:

<p>太郎は<strong>「こんにちわ」</strong>といった。</p>

For Autolinks:

詳しくはhttps://example.com\ をご覧ください。

Renders as:

<p>詳しくは<a href="https://example.com">https://example.com</a>をご覧ください。</p>

Note how the escaped space delineates the autolink without introducing a visible space in the output.

Non-Breaking Space Feature (Generic)

In addition to CJK support, this extension includes a custom feature for general typography: backslash followed by two spaces is converted to a non-breaking space (U+00A0).

Use case: Preventing line breaks between related words in European languages.

Example:

The\  author is John\  Doe.

Renders as:

<p>The&nbsp;author is John&nbsp;Doe.</p>

This is useful for:

  • Keeping titles and names together on a single line
  • Preventing separation of abbreviations from units (e.g., "5 cm")
  • Professional typography where specific spacing rules apply

Note

This feature is experimental and its syntax may be changed to \ \ in the future.

Specification

See specification.md for detailed technical specifications.

Who should use this extension?

You should use this extension if you:

  1. Need to handle Chinese, Japanese, or Korean content with emphasis markers adjacent to CJK punctuation
  2. Want better GFM autolink handling in CJK contexts
  3. Need non-breaking space support for professional typography
  4. Cannot modify source content to add spaces manually
  5. Need emphasis to work correctly without relying on HTML tags like <strong>
  6. Are creating Markdown-related software or services targeting CJK users or requiring precise spacing control

Usage

This extension is designed to be used in conjunction with your Markdown parser. The behavior is:

  • Escaped space (\ ) around emphasis markers and autolinks is not rendered in the output
  • They allow emphasis markers to be recognized even when adjacent to CJK punctuation
  • They help delineate autolinks in CJK text without introducing visible spaces
  • Backslash + two spaces (\ ) is converted to a non-breaking space
  • They maintain the original source formatting without visible artifacts

Examples

Escaped Space for Emphasis

Input:

太郎は\ **「こんにちわ」**\ といった。

Output:

<p>太郎は<strong>「こんにちわ」</strong>といった。</p>

Escaped Space for GFM Autolinks

Input:

詳しくはhttps://example.com\ をご覧ください。

Output:

<p>詳しくは<a href="https://example.com">https://example.com</a>をご覧ください。</p>

Non-Breaking Space

Input:

The\  quick brown fox

Output:

<p>The&nbsp;quick brown fox</p>

Related Projects

This extension is a port of the WithEscapedSpace option from goldmark's CJK extension.

There is another more user-friendly approach to handling CJK text in Markdown without adding extra symbols \ (CJK Friendly Emphasis Extension):

It is not mutually exclusive with this extension, and you can use both together if needed. Especially, you can use this Escaped Space extension as the fallback for corner cases where the CJK Friendly Emphasis extension does not work.

Compatibility

This extension does not affect behavior in languages other than CJK for emphasis handling. Content in English, European languages, and other non-CJK text will render identically with or without the escaped space emphasis feature.

The non-breaking space feature is language-agnostic and works universally.

Contributing

Please submit issues and pull requests in English or Japanese.

License

MIT

About

A markdown-it extension that allows escaped spaces around emphasis markers and GFM autolinks in CJK (Chinese, Japanese, and Korean) text.

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE-spec

Stars

Watchers

Forks

Contributors