refactor(markdown-parser): deduplicate link scan/parse with BumpMode by jfmcdowell · Pull Request #9192 · biomejs/biome

jfmcdowell · 2026-02-22T20:03:51Z

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

Deduplicates two scan/parse pairs in crates/biome_markdown_parser/src/syntax/inline/links.rs that shared logic but differed in token bumping behavior.

Added BumpMode (Scan/Parse) with bump_token and bump_separator.
Extracted title_content_core from scan_title_content and parse_title_content.
Extracted destination_tokens_core from scan_inline_link_destination_tokens and parse_inline_link_destination_tokens.
Preserved the two existing structural differences in destination handling (re_lex timing and whitespace-skip position).
No behavior change intended.

Test Plan

cargo test -p biome_markdown_parser — 64 tests pass, zero snapshot changes
cargo insta test -p biome_markdown_parser — no snapshot diffs
just test-markdown-conformance
just f && just l

Docs

N/A

changeset-bot · 2026-02-22T20:03:57Z

⚠️ No Changeset found

Latest commit: fd3003c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-02-22T20:07:49Z

Walkthrough

The PR refactors inline markdown link parsing to use mode-aware parsing functions instead of direct token-bump calls. scan_* helpers (e.g. scan_inline_link_destination_tokens, scan_title_content) were removed and replaced by parse_inline_link_destination_tokens and parse_title_content, with bump-based separators such as bump_link_def_separator and bump_textual_link_def. Angle-bracket and raw destination branches remain distinct but now use lookahead validation without generating CST nodes. Link validity and destination parsing outcomes (Valid/Invalid/DepthExceeded) and public signatures are preserved.

Suggested reviewers

ematipico

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarises the main refactoring effort: deduplicating link scan/parse logic using a BumpMode abstraction, which matches the changeset's core objective.
Description check	✅ Passed	The description is directly related to the changeset, explaining the deduplication of scan/parse pairs, the BumpMode mechanism introduced, and the testing performed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

dyc3 · 2026-02-23T02:46:04Z

+    /// Real parse: wrap each token in MD_TEXTUAL CST nodes.
+    Parse,
+}
+


We already have a concept of speculative parsing. IIRC the CSS parser uses it, you should probably go look at that.

Otherwise, we usually bump, check, and if it's bad, rewind the state of the lexer and parser and try something else. Depends on the situation.

TIL. I changed the approach to use the speculative parsing pattern

Call parse functions directly inside p.lookahead() instead of maintaining separate scan/parse paths. CST nodes are auto-reverted on rewind.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/biome_markdown_parser/src/syntax/inline/links.rs (1)

558-564: get_title_close_char called twice with no state change in between.

Cache the result once and pass it straight to parse_title_content.

♻️ Cache the title close character

-        let has_title = saw_separator && get_title_close_char(p).is_some();
-        ...
-        if has_title {
-            parse_title_content(p, get_title_close_char(p));
-        }
+        let title_close = get_title_close_char(p);
+        let has_title = saw_separator && title_close.is_some();
+        ...
+        if has_title {
+            parse_title_content(p, title_close);
+        }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 558 -
564, Compute and stash the result of get_title_close_char(p) once, e.g. let
title_close = get_title_close_char(p); then set has_title = saw_separator &&
title_close.is_some(); keep the separator-bumping loop as-is, and if has_title
call parse_title_content(p, title_close) instead of calling
get_title_close_char(p) again; this ensures a single, consistent title-close
value is used for both the check and the subsequent parse.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 554-561: The second while loop that calls bump_link_def_separator
is dead because the first loop already advances through all title separator
tokens; remove the second loop entirely. Keep the initial loop that sets
saw_separator and advances via bump_link_def_separator(p), and retain the
has_title computation using get_title_close_char(p).is_some(); verify there are
no leftover assumptions about a later consumer of title-separator tokens in
surrounding code (references: is_title_separator_token, bump_link_def_separator,
get_title_close_char, saw_separator).

---

Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 558-564: Compute and stash the result of get_title_close_char(p)
once, e.g. let title_close = get_title_close_char(p); then set has_title =
saw_separator && title_close.is_some(); keep the separator-bumping loop as-is,
and if has_title call parse_title_content(p, title_close) instead of calling
get_title_close_char(p) again; this ensures a single, consistent title-close
value is used for both the check and the subsequent parse.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d58b4e and 387954d.

📒 Files selected for processing (1)

crates/biome_markdown_parser/src/syntax/inline/links.rs

…_is_valid

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/biome_markdown_parser/src/syntax/inline/links.rs (1)

558-561: Nitpick: get_title_close_char called twice with no state change in between.

🧹 Cache the result

+        let title_close_char = get_title_close_char(p);
-        let has_title = saw_separator && get_title_close_char(p).is_some();
+        let has_title = saw_separator && title_close_char.is_some();

         if has_title {
-            parse_title_content(p, get_title_close_char(p));
+            parse_title_content(p, title_close_char);
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 558 -
561, The code calls get_title_close_char(p) twice; cache its result into a local
(e.g., let title_close = get_title_close_char(p)) after computing has_title so
you reuse title_close for both the has_title check and the parse_title_content
call (replace parse_title_content(p, get_title_close_char(p)) with
parse_title_content(p, title_close)); update uses around has_title and
parse_title_content to reference the cached title_close and avoid redundant
calls.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 536-542: The validation path pre-skips title-separator whitespace
(using is_title_separator_token and bump_link_def_separator) before calling
parse_inline_link_destination_tokens, but the parser does not, causing inputs
like "( <url> )" to fail; move the whitespace-skip logic into
parse_inline_link_destination_tokens (i.e., consume title-separator tokens
before checking p.at(L_ANGLE) and before the angle-bracket branch) so both
validation and parsing behave the same, and then remove the separate pre-skip
loop from inline_link_is_valid.

---

Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 558-561: The code calls get_title_close_char(p) twice; cache its
result into a local (e.g., let title_close = get_title_close_char(p)) after
computing has_title so you reuse title_close for both the has_title check and
the parse_title_content call (replace parse_title_content(p,
get_title_close_char(p)) with parse_title_content(p, title_close)); update uses
around has_title and parse_title_content to reference the cached title_close and
avoid redundant calls.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 387954d and fd3003c.

📒 Files selected for processing (1)

crates/biome_markdown_parser/src/syntax/inline/links.rs

coderabbitai · 2026-02-23T15:54:08Z

+        // Skip leading whitespace before angle bracket check
+        // (parse_inline_link_destination_tokens only skips whitespace in the raw path).
+        while is_title_separator_token(p) {
+            bump_link_def_separator(p);
+        }
+
+        let destination_result = parse_inline_link_destination_tokens(p);


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's confirm the whitespace pre-skip at lines 536-542 sed -n '536,542p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 376

🏁 Script executed:

# Now check the actual parse path at lines 197-200 sed -n '197,200p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 244

🏁 Script executed:

# Search for test cases that might test angle-bracket destinations with leading whitespace rg -n '\(\s*<' crates/biome_markdown_parser/tests/ --type rust -B2 -A2

Repository: biomejs/biome

Length of output: 39

🏁 Script executed:

# Look for inline link tests in general to understand current test coverage fd test crates/biome_markdown_parser/tests/ --type rust | head -10

Repository: biomejs/biome

Length of output: 228

🏁 Script executed:

# Search for CommonMark-related fixtures or test data fd -e md -e txt crates/biome_markdown_parser/tests/ | head -20

Repository: biomejs/biome

Length of output: 502

🏁 Script executed:

# Find test fixture files in markdown parser tests find crates/biome_markdown_parser/tests -type f \( -name "*.md" -o -name "*.txt" -o -name "*.json" \) | head -20

Repository: biomejs/biome

Length of output: 1535

🏁 Script executed:

# Look for inline_link or link-related tests rg -l "inline_link|parse_inline" crates/biome_markdown_parser/ --type rust

Repository: biomejs/biome

Length of output: 387

🏁 Script executed:

# Check what the parse_inline_link_destination_tokens function does with leading whitespace sed -n '588,650p' crates/biome_markdown_parser/src/syntax/inline/links.rs | head -70

Repository: biomejs/biome

Length of output: 2255

🏁 Script executed:

# Let's look at the raw destination path whitespace skip location to verify the comment claim sed -n '625,640p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 443

🏁 Script executed:

# Search for inline link tests with angle brackets find crates/biome_markdown_parser/tests -name "*link*" -type f

Repository: biomejs/biome

Length of output: 1801

🏁 Script executed:

# Check if there are any test fixtures with links containing angle brackets and spaces rg '\(\s*<' crates/biome_markdown_parser/tests/md_test_suite/ -l

Repository: biomejs/biome

Length of output: 231

🏁 Script executed:

# Look at the links test fixture cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_links.md 2>/dev/null || echo "File not found, searching for similar..."

Repository: biomejs/biome

Length of output: 97

🏁 Script executed:

# Find and display link-related fixtures find crates/biome_markdown_parser/tests -name "*link*" -o -name "*url*" 2>/dev/null

Repository: biomejs/biome

Length of output: 1801

🏁 Script executed:

# Let's check what fixtures exist and search for any that might test this pattern ls crates/biome_markdown_parser/tests/md_test_suite/ok/ | grep -i link

Repository: biomejs/biome

Length of output: 657

🏁 Script executed:

# Check the whitespace test file - this is likely where the issue would be tested cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_whitespace.md

Repository: biomejs/biome

Length of output: 356

🏁 Script executed:

# Also check the destination/title test cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_destination_title.md

Repository: biomejs/biome

Length of output: 224

🏁 Script executed:

# Check the destination/title snapshot to see what's expected cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_destination_title.md.snap

Repository: biomejs/biome

Length of output: 17978

🏁 Script executed:

# Let's check the inline_link_whitespace test to see if it covers space before angle brackets cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_whitespace.md.snap | head -100

Repository: biomejs/biome

Length of output: 3318

🏁 Script executed:

# Search for any tests with the specific pattern of space + angle bracket rg '\(\s+<' crates/biome_markdown_parser/tests/ --type md

Repository: biomejs/biome

Length of output: 39

🏁 Script executed:

# Let's verify the exact position of the whitespace skip in parse_inline_link_destination_tokens sed -n '588,600p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 508

🏁 Script executed:

# Create a simple test to confirm the asymmetry: check if [link]( <url> ) is currently rejected cat > /tmp/test_space_angle.md << 'EOF' Space before angle bracket: [link]( <url> ) No space before angle bracket: [link](<url>) EOF cat /tmp/test_space_angle.md

Repository: biomejs/biome

Length of output: 145

🏁 Script executed:

# Check if the destination validation path (inline_link_is_valid) handles this differently # Let's look at where inline_link_is_valid is called from sed -n '528,575p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 1529

🏁 Script executed:

# Verify the exact flow by looking at parse_link_or_image which calls inline_link_is_valid and then parse sed -n '187,220p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 1425

Whitespace handling between validation and parsing diverges for ( <url> ) inputs.

The lookahead path (lines 536–540) pre-skips whitespace before checking for the enclosed destination (<url>), so validation succeeds. The actual parse path makes no such pre-skip, so parse_inline_link_destination_tokens misses the angle bracket (sees whitespace instead), falls into the raw destination branch, and later rejects < because it's disallowed in raw destinations per CommonMark. This causes spec-valid [link]( <url> ) to be silently rejected.

Moving the whitespace skip inside parse_inline_link_destination_tokens before the p.at(L_ANGLE) check would unify both paths and fix this. The pre-skip in inline_link_is_valid can then be removed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 536 - 542, The validation path pre-skips title-separator whitespace (using is_title_separator_token and bump_link_def_separator) before calling parse_inline_link_destination_tokens, but the parser does not, causing inputs like "( <url> )" to fail; move the whitespace-skip logic into parse_inline_link_destination_tokens (i.e., consume title-separator tokens before checking p.at(L_ANGLE) and before the angle-bracket branch) so both validation and parsing behave the same, and then remove the separate pre-skip loop from inline_link_is_valid.

refactor(markdown-parser): deduplicate link scan/parse with BumpMode

0d58b4e

github-actions Bot added the A-Parser Area: parser label Feb 22, 2026

dyc3 reviewed Feb 23, 2026

View reviewed changes

refactor(markdown-parser): replace BumpMode with speculative parsing

387954d

Call parse functions directly inside p.lookahead() instead of maintaining separate scan/parse paths. CST nodes are auto-reverted on rewind.

coderabbitai Bot reviewed Feb 23, 2026

View reviewed changes

Comment thread crates/biome_markdown_parser/src/syntax/inline/links.rs Outdated

refactor(markdown-parser): remove dead whitespace loop in inline_link…

fd3003c

…_is_valid

coderabbitai Bot reviewed Feb 23, 2026

View reviewed changes

jfmcdowell requested a review from dyc3 February 23, 2026 17:03

dyc3 approved these changes Feb 27, 2026

View reviewed changes

dyc3 merged commit 412a08d into biomejs:main Feb 27, 2026
12 checks passed

coderabbitai Bot mentioned this pull request Mar 12, 2026

refactor(markdown-parser): decompose parse_link_or_image into phase helpers #9452

Merged

coderabbitai Bot mentioned this pull request Mar 28, 2026

fix(markdown-parser): incorrect inline link R_PAREN token range #9642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(markdown-parser): deduplicate link scan/parse with BumpMode#9192

refactor(markdown-parser): deduplicate link scan/parse with BumpMode#9192
dyc3 merged 3 commits into
biomejs:mainfrom
jfmcdowell:refactor/links-dedup

jfmcdowell commented Feb 22, 2026

Uh oh!

changeset-bot Bot commented Feb 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Feb 22, 2026 •

edited

Loading

Uh oh!

dyc3 Feb 23, 2026

Uh oh!

jfmcdowell Feb 23, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jfmcdowell commented Feb 22, 2026

Summary

Test Plan

Docs

Uh oh!

changeset-bot Bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Suggested reviewers

Uh oh!

dyc3 Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

jfmcdowell Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented Feb 22, 2026 •

edited

Loading

coderabbitai Bot commented Feb 22, 2026 •

edited

Loading