Skip to content

markdown: MarkdownEscaper escapes non-ASCII characters due to lossy as u8 cast #55704

@itrous

Description

@itrous

Reproduction steps

  1. Use any LSP server that emits Cyrillic text in Diagnostic.message.
  2. Open a file that triggers a diagnostic whose message contains both:
    • a Cyrillic letter from the ranges U+043AU+0440 (к л м н о п р) or U+0421U+042F (С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я), and
    • any markdown-special character (', ", (, ), #, /, …).
  3. Inspect the diagnostic message in the editor.

Current behavior

Spurious \ characters appear before specific Cyrillic letters:

Отсутствует \п\р\обе\л с\п\рава \от ','

В\мест\о уста\ревшег\о св\ойства "\Эта\Ф\о\р\ма" с\ледует ис\п\о\льз\овать "\Эт\отОбъе\кт"

П\р\оцеду\ра \нах\одится в\не \об\ласти (#Об\ласть/#Region). Весь \к\од \м\оду\ля д\о\лже\н быть \о\рга\низ\ова\н в \об\ласти д\ля \лучшей ст\ру\кту\ры.

Copying the message yields the clean text — the backslashes are inserted by the renderer.

Expected behavior

Display the message exactly as the LSP server sent it, e.g.:

Отсутствует пробел справа от ','

Root cause

crates/markdown/src/markdown.rs, pub fn escape (~line 457):

for c in s.chars() {
    escaper.next(c as u8).write_to(c, &mut output);
}

MarkdownEscaper::next decides to prefix a backslash via byte.is_ascii_punctuation(). For non-ASCII codepoints, c as u8 truncates to the low 8 bits, which can land inside ASCII-punctuation ranges — e.g. р (U+0440) → 0x40@, о (U+043E) → 0x3E>, Э (U+042D) → 0x2D-. Per CommonMark §6.1, \X is consumed only when X is ASCII punctuation, so for non-ASCII characters the backslash is rendered literally.

The earlier fix in #51766 addressed visible escapes inside indented code blocks but did not touch this lossy-cast path. The bug affects any script outside ASCII whose codepoints land in those low-bit ranges — Cyrillic, Greek, Hebrew, Arabic, CJK, etc.

Zed version and system specs

Zed 1.0.1, macOS 15.7.4

Attach Zed log file

n/a

Relevant Zed settings / Keymap

n/a

(for AI issues) Model provider details

n/a

WSL

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:internationalizationFeedback for human language support, translations, etcarea:languages/markdownMarkdown markup supportfrequency:uncommonBugs that happen for a small subset of users, special configurations, rare circumstances, etcmeta:awesomeexemplary issue/PR from the communitypriority:P2Average run-of-the-mill bugsstate:reproducibleVerified steps to reproduce included and someone on the team managed to reproduce

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions