fix(doc): strip blockquote > prefixes from JSDoc code blocks#34866
fix(doc): strip blockquote > prefixes from JSDoc code blocks#34866LeSingh1 wants to merge 5 commits into
Conversation
bartlomieju
left a comment
There was a problem hiding this comment.
Thanks for the fix and the clear write-up — the blockquote case itself is handled well, and the unit + spec tests are good. I ran test_extract_doc_tests locally and it passes.
There is one correctness issue to address before this can land: the regex strips > from every doc-code line unconditionally, even when the code block is not inside a blockquote. So a legitimate code line that begins with > (e.g. a line inside a multi-line template literal) gets its > silently removed, corrupting the example. This worked correctly before the PR, so it's a regression. Details inline, plus a request for a regression test that would catch it.
| let blocks_regex = lazy_regex::regex!(r"```([^\r\n]*)\r?\n([\S\s]*?)```"); | ||
| let lines_regex = | ||
| lazy_regex::regex!(r"(?:\* ?)((#!+).*)|(?:\* ?)(?:\# ?)?(.*)"); | ||
| lazy_regex::regex!(r"(?:\* ?(?:> ?)*)((#!+).*)|(?:\* ?(?:> ?)*)(?:\# ?)?(.*)"); |
There was a problem hiding this comment.
This strips > markers from every line of every doc-code block, regardless of whether the block is actually inside a blockquote. A legitimate code line that starts with > therefore gets corrupted. For example:
/**
* ```ts
* const s = `
* > real content
* `;
* console.log(s);
* ```
*/The line > real content is template-literal content, but it extracts as real content — the > is dropped, silently changing the string (this was correct before this PR). I confirmed the new regex matches and strips it.
Could you scope the stripping to blocks that are genuinely inside a blockquote? One approach: capture the blockquote prefix from the opening fence line (the > sequence that precedes ```ts) and strip exactly that consistent prefix from each content line, instead of greedily consuming any > on every line. That leaves non-blockquote examples untouched while still fixing #25980.
| // https://github.com/denoland/deno/issues/25980 | ||
| // Code blocks nested inside blockquotes in JSDoc comments should have | ||
| // the blockquote `> ` prefix stripped so the extracted source is valid. | ||
| Test { |
There was a problem hiding this comment.
nit: once the stripping is scoped, please add a regression test for the inverse case — a code block that is NOT inside a blockquote but contains a line starting with > (e.g. a multi-line template literal). That asserts the > is preserved and would have caught the over-stripping above.
|
Thanks for the detailed feedback, Bartek. I've pushed a rework that addresses the scoping issue. The old approach added The new approach detects the blockquote prefix from the opening fence line itself. For each block match, I look at what appears on that line before the triple-backtick in the source string, strip the JSDoc I also added a regression test for the inverse case: a code block not inside a blockquote whose body has a line starting with I was unable to run |
|
Can you please resolve the conflict in |
When a code example is nested inside a blockquote in a JSDoc comment, each line of the code block carries a leading "> " (or "> ") prefix from the markdown blockquote syntax. The lines_regex used when extracting those blocks only stripped the JSDoc "* " prefix, leaving the "> " on every line. That produced source like "> console.log(...)" which the TypeScript parser rejected with a confusing "Expression expected" error pointing at the ">" character. Extend the regex to also consume any number of "> " blockquote markers that follow the "* " JSDoc marker: before: (?:\* ?)((#!+).*)|(?:\* ?)(?:\# ?)?(.*) after: (?:\* ?(?:> ?)*)((#!+).*)|(?:\* ?(?:> ?)*)(?:\# ?)?(.*) This matches the shebang capture group pattern too so shebang lines inside blockquoted code blocks keep working. Fixes denoland#25980
…uote The previous approach added `(?:> ?)*` to the lines_regex, which stripped any leading `> ` from every content line in every code block, regardless of whether the block was actually inside a markdown blockquote. This corrupted legitimate code where a line begins with `> ` -- for example, content inside a template literal. The fix detects the blockquote prefix from the opening fence line itself: look at what appears before the triple-backtick on that line, extract the `> ` token sequence, and strip exactly that prefix from each content line. If the fence line has no `> ` prefix the block is not inside a blockquote, so nothing extra is stripped. Adds a regression test for the inverse case: a code block not inside a blockquote whose body contains a line starting with `> ` (inside a template literal). That line must be preserved as-is.
80ad9e9 to
d18781e
Compare
|
Rebased onto current upstream/main and resolved the conflict in cli/util/extract.rs. The conflict was between upstream's approach (using is_markdown_blockquote + a strip_markdown_blockquote_marker helper that stripped any leading > from every line) and this PR's approach (detecting the exact blockquote prefix from the opening fence line and stripping only that). The PR's logic is strictly more precise, so the resolution keeps the blockquote_prefix detection and the lines_regex.captures_iter loop and drops the upstream helper. The now-dead is_markdown_blockquote variable and strip_markdown_blockquote_marker function were also removed to keep things clean. Ran rustfmt --check on the file and it passes. The regression test for the non-blockquote code block containing a > line is intact. New head is d18781e. |
The blockquote prefix was only detected when the fence line carried a JSDoc star marker, so code blocks quoted in plain markdown kept their leading angle brackets and extracted as invalid source. Detect the blockquote depth from the fence line whether or not a star marker is present, and strip that many markers per line so an empty quoted line written as a bare angle bracket no longer leaks into the output.
|
Pushed a fix for the failing extraction tests. The blockquote prefix was only being detected when the fence line had a JSDoc star marker, so code blocks quoted in plain markdown (like the README example in the issue) kept their leading angle brackets and extracted as invalid TypeScript. I now read the blockquote depth from the fence line whether or not a star marker is present, and strip that many markers per line, so an empty quoted line written as a bare angle bracket no longer leaves a stray marker behind. The non-blockquote case is unchanged since its depth is zero. |
|
Thanks for the fix! The JSDoc case and the nested-depth / template-literal handling look good. One concern about the shared refactor: Possible regression on the markdown (
|
The markdown fence pattern captures the leading blockquote markers in a named group, so the match begins at the angle bracket and the text before the fence is empty. Reading the prefix from before the fence only worked for JSDoc, where the pattern has no such group. Use the captured group when present and fall back to the fence line text for JSDoc, so quoted code blocks in markdown files have their markers stripped and extract as valid source.
|
Follow-up: the markdown doctest was still failing because the markdown fence pattern captures the leading blockquote markers in a named group, so the match starts at the angle bracket and the text before the fence is empty. My first attempt read the prefix from before the fence, which only works for the JSDoc pattern. It now reads the captured group when present and falls back to the fence line for JSDoc, so quoted blocks in markdown files strip correctly and extract as valid source. |
|
Thanks for tackling this, and for the thorough writeup and tests. This is really close to landing! The depth-bounded approach is exactly right: only stripping when the fence is actually inside a blockquote, and only up to the detected depth, is what makes the template-literal case ( A few small things to tighten it up before we merge: 1. The description doesn't match the implementation. It says the fix "extends the regex so it greedily consumes any number of 2. Add a nested-blockquote test. The code loops /**
* > > ```ts
* > > console.log("nested");
* > > ```
*/3. The import reordering looks unrelated. The diff reshuffles the Optional nit: the "consume one Thanks again for the contribution! |
Adds a depth-two nested blockquote test, extracts a strip_blockquote_marker helper so the depth counting and the per-line stripping share one implementation, and restores the import block to its original order since the earlier reshuffle was unrelated to this fix.
|
Thanks, all three addressed in the latest commit. I rewrote the description to match what actually landed (fence-line depth detection plus bounded per-line stripping, no change to lines_regex). I added the two-level nested case you suggested. I pulled the strip-one-marker logic into a small strip_blockquote_marker helper so the depth count and the per-line stripping use the same code. And I restored the import block to its original order, since that reshuffle slipped in during an earlier rebase and was not part of the fix. |
This fixes #25980. Code blocks nested inside a markdown blockquote, both in JSDoc comments and in plain markdown files, were extracted with their leading > markers still attached, so the snippets did not parse as valid TypeScript.
The fix detects whether a fenced block is inside a blockquote by looking at the opening fence line. For markdown the block pattern captures the leading blockquote markers in a named group; for JSDoc the markers are read from the text before the fence with the comment star removed. The count of > markers on the fence line is the blockquote depth. Each content line then has up to that many markers stripped, where one marker is a > followed by an optional single space or tab.
Bounding the stripping to the detected depth keeps two cases working that a naive strip-all-leading-> would break: a normal code line that starts with > inside a template literal is left untouched when the block is not in a blockquote, and a line quoted only at the outer level is not over-stripped inside a nested block.
Tests cover JSDoc blockquotes, plain markdown blockquotes, a two-level nested blockquote, an empty quoted line written as a bare >, and the regression case of a non-blockquote block whose body starts with >.