Skip to content

refactor(markdown-parser): deduplicate link scan/parse with BumpMode#9192

Merged
dyc3 merged 3 commits into
biomejs:mainfrom
jfmcdowell:refactor/links-dedup
Feb 27, 2026
Merged

refactor(markdown-parser): deduplicate link scan/parse with BumpMode#9192
dyc3 merged 3 commits into
biomejs:mainfrom
jfmcdowell:refactor/links-dedup

Conversation

@jfmcdowell

Copy link
Copy Markdown
Contributor

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

Deduplicates two scan/parse pairs in crates/biome_markdown_parser/src/syntax/inline/links.rs that shared logic but differed in token bumping behavior.

  • Added BumpMode (Scan/Parse) with bump_token and bump_separator.
  • Extracted title_content_core from scan_title_content and parse_title_content.
  • Extracted destination_tokens_core from scan_inline_link_destination_tokens and parse_inline_link_destination_tokens.
  • Preserved the two existing structural differences in destination handling (re_lex timing and whitespace-skip position).
  • No behavior change intended.

Test Plan

  • cargo test -p biome_markdown_parser — 64 tests pass, zero snapshot changes
  • cargo insta test -p biome_markdown_parser — no snapshot diffs
  • just test-markdown-conformance
  • just f && just l

Docs

N/A

@changeset-bot

changeset-bot Bot commented Feb 22, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: fd3003c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions Bot added the A-Parser Area: parser label Feb 22, 2026
@coderabbitai

coderabbitai Bot commented Feb 22, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

The PR refactors inline markdown link parsing to use mode-aware parsing functions instead of direct token-bump calls. scan_* helpers (e.g. scan_inline_link_destination_tokens, scan_title_content) were removed and replaced by parse_inline_link_destination_tokens and parse_title_content, with bump-based separators such as bump_link_def_separator and bump_textual_link_def. Angle-bracket and raw destination branches remain distinct but now use lookahead validation without generating CST nodes. Link validity and destination parsing outcomes (Valid/Invalid/DepthExceeded) and public signatures are preserved.

Suggested reviewers

  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarises the main refactoring effort: deduplicating link scan/parse logic using a BumpMode abstraction, which matches the changeset's core objective.
Description check ✅ Passed The description is directly related to the changeset, explaining the deduplication of scan/parse pairs, the BumpMode mechanism introduced, and the testing performed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

/// Real parse: wrap each token in MD_TEXTUAL CST nodes.
Parse,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a concept of speculative parsing. IIRC the CSS parser uses it, you should probably go look at that.

Otherwise, we usually bump, check, and if it's bad, rewind the state of the lexer and parser and try something else. Depends on the situation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. I changed the approach to use the speculative parsing pattern

Call parse functions directly inside p.lookahead() instead of maintaining
separate scan/parse paths. CST nodes are auto-reverted on rewind.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/syntax/inline/links.rs (1)

558-564: get_title_close_char called twice with no state change in between.

Cache the result once and pass it straight to parse_title_content.

♻️ Cache the title close character
-        let has_title = saw_separator && get_title_close_char(p).is_some();
-        ...
-        if has_title {
-            parse_title_content(p, get_title_close_char(p));
-        }
+        let title_close = get_title_close_char(p);
+        let has_title = saw_separator && title_close.is_some();
+        ...
+        if has_title {
+            parse_title_content(p, title_close);
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 558 -
564, Compute and stash the result of get_title_close_char(p) once, e.g. let
title_close = get_title_close_char(p); then set has_title = saw_separator &&
title_close.is_some(); keep the separator-bumping loop as-is, and if has_title
call parse_title_content(p, title_close) instead of calling
get_title_close_char(p) again; this ensures a single, consistent title-close
value is used for both the check and the subsequent parse.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 554-561: The second while loop that calls bump_link_def_separator
is dead because the first loop already advances through all title separator
tokens; remove the second loop entirely. Keep the initial loop that sets
saw_separator and advances via bump_link_def_separator(p), and retain the
has_title computation using get_title_close_char(p).is_some(); verify there are
no leftover assumptions about a later consumer of title-separator tokens in
surrounding code (references: is_title_separator_token, bump_link_def_separator,
get_title_close_char, saw_separator).

---

Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 558-564: Compute and stash the result of get_title_close_char(p)
once, e.g. let title_close = get_title_close_char(p); then set has_title =
saw_separator && title_close.is_some(); keep the separator-bumping loop as-is,
and if has_title call parse_title_content(p, title_close) instead of calling
get_title_close_char(p) again; this ensures a single, consistent title-close
value is used for both the check and the subsequent parse.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d58b4e and 387954d.

📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/syntax/inline/links.rs

Comment thread crates/biome_markdown_parser/src/syntax/inline/links.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/syntax/inline/links.rs (1)

558-561: Nitpick: get_title_close_char called twice with no state change in between.

🧹 Cache the result
+        let title_close_char = get_title_close_char(p);
-        let has_title = saw_separator && get_title_close_char(p).is_some();
+        let has_title = saw_separator && title_close_char.is_some();

         if has_title {
-            parse_title_content(p, get_title_close_char(p));
+            parse_title_content(p, title_close_char);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 558 -
561, The code calls get_title_close_char(p) twice; cache its result into a local
(e.g., let title_close = get_title_close_char(p)) after computing has_title so
you reuse title_close for both the has_title check and the parse_title_content
call (replace parse_title_content(p, get_title_close_char(p)) with
parse_title_content(p, title_close)); update uses around has_title and
parse_title_content to reference the cached title_close and avoid redundant
calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 536-542: The validation path pre-skips title-separator whitespace
(using is_title_separator_token and bump_link_def_separator) before calling
parse_inline_link_destination_tokens, but the parser does not, causing inputs
like "( <url> )" to fail; move the whitespace-skip logic into
parse_inline_link_destination_tokens (i.e., consume title-separator tokens
before checking p.at(L_ANGLE) and before the angle-bracket branch) so both
validation and parsing behave the same, and then remove the separate pre-skip
loop from inline_link_is_valid.

---

Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 558-561: The code calls get_title_close_char(p) twice; cache its
result into a local (e.g., let title_close = get_title_close_char(p)) after
computing has_title so you reuse title_close for both the has_title check and
the parse_title_content call (replace parse_title_content(p,
get_title_close_char(p)) with parse_title_content(p, title_close)); update uses
around has_title and parse_title_content to reference the cached title_close and
avoid redundant calls.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 387954d and fd3003c.

📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/syntax/inline/links.rs

Comment on lines +536 to +542
// Skip leading whitespace before angle bracket check
// (parse_inline_link_destination_tokens only skips whitespace in the raw path).
while is_title_separator_token(p) {
bump_link_def_separator(p);
}

let destination_result = parse_inline_link_destination_tokens(p);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's confirm the whitespace pre-skip at lines 536-542
sed -n '536,542p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 376


🏁 Script executed:

# Now check the actual parse path at lines 197-200
sed -n '197,200p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 244


🏁 Script executed:

# Search for test cases that might test angle-bracket destinations with leading whitespace
rg -n '\(\s*<' crates/biome_markdown_parser/tests/ --type rust -B2 -A2

Repository: biomejs/biome

Length of output: 39


🏁 Script executed:

# Look for inline link tests in general to understand current test coverage
fd test crates/biome_markdown_parser/tests/ --type rust | head -10

Repository: biomejs/biome

Length of output: 228


🏁 Script executed:

# Search for CommonMark-related fixtures or test data
fd -e md -e txt crates/biome_markdown_parser/tests/ | head -20

Repository: biomejs/biome

Length of output: 502


🏁 Script executed:

# Find test fixture files in markdown parser tests
find crates/biome_markdown_parser/tests -type f \( -name "*.md" -o -name "*.txt" -o -name "*.json" \) | head -20

Repository: biomejs/biome

Length of output: 1535


🏁 Script executed:

# Look for inline_link or link-related tests
rg -l "inline_link|parse_inline" crates/biome_markdown_parser/ --type rust

Repository: biomejs/biome

Length of output: 387


🏁 Script executed:

# Check what the parse_inline_link_destination_tokens function does with leading whitespace
sed -n '588,650p' crates/biome_markdown_parser/src/syntax/inline/links.rs | head -70

Repository: biomejs/biome

Length of output: 2255


🏁 Script executed:

# Let's look at the raw destination path whitespace skip location to verify the comment claim
sed -n '625,640p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 443


🏁 Script executed:

# Search for inline link tests with angle brackets
find crates/biome_markdown_parser/tests -name "*link*" -type f

Repository: biomejs/biome

Length of output: 1801


🏁 Script executed:

# Check if there are any test fixtures with links containing angle brackets and spaces
rg '\(\s*<' crates/biome_markdown_parser/tests/md_test_suite/ -l

Repository: biomejs/biome

Length of output: 231


🏁 Script executed:

# Look at the links test fixture
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_links.md 2>/dev/null || echo "File not found, searching for similar..."

Repository: biomejs/biome

Length of output: 97


🏁 Script executed:

# Find and display link-related fixtures
find crates/biome_markdown_parser/tests -name "*link*" -o -name "*url*" 2>/dev/null

Repository: biomejs/biome

Length of output: 1801


🏁 Script executed:

# Let's check what fixtures exist and search for any that might test this pattern
ls crates/biome_markdown_parser/tests/md_test_suite/ok/ | grep -i link

Repository: biomejs/biome

Length of output: 657


🏁 Script executed:

# Check the whitespace test file - this is likely where the issue would be tested
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_whitespace.md

Repository: biomejs/biome

Length of output: 356


🏁 Script executed:

# Also check the destination/title test
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_destination_title.md

Repository: biomejs/biome

Length of output: 224


🏁 Script executed:

# Check the destination/title snapshot to see what's expected
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_destination_title.md.snap

Repository: biomejs/biome

Length of output: 17978


🏁 Script executed:

# Let's check the inline_link_whitespace test to see if it covers space before angle brackets
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_whitespace.md.snap | head -100

Repository: biomejs/biome

Length of output: 3318


🏁 Script executed:

# Search for any tests with the specific pattern of space + angle bracket
rg '\(\s+<' crates/biome_markdown_parser/tests/ --type md

Repository: biomejs/biome

Length of output: 39


🏁 Script executed:

# Let's verify the exact position of the whitespace skip in parse_inline_link_destination_tokens
sed -n '588,600p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 508


🏁 Script executed:

# Create a simple test to confirm the asymmetry: check if [link]( <url> ) is currently rejected
cat > /tmp/test_space_angle.md << 'EOF'
Space before angle bracket: [link]( <url> )
No space before angle bracket: [link](<url>)
EOF
cat /tmp/test_space_angle.md

Repository: biomejs/biome

Length of output: 145


🏁 Script executed:

# Check if the destination validation path (inline_link_is_valid) handles this differently
# Let's look at where inline_link_is_valid is called from
sed -n '528,575p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 1529


🏁 Script executed:

# Verify the exact flow by looking at parse_link_or_image which calls inline_link_is_valid and then parse
sed -n '187,220p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 1425


Whitespace handling between validation and parsing diverges for ( <url> ) inputs.

The lookahead path (lines 536–540) pre-skips whitespace before checking for the enclosed destination (<url>), so validation succeeds. The actual parse path makes no such pre-skip, so parse_inline_link_destination_tokens misses the angle bracket (sees whitespace instead), falls into the raw destination branch, and later rejects < because it's disallowed in raw destinations per CommonMark. This causes spec-valid [link]( <url> ) to be silently rejected.

Moving the whitespace skip inside parse_inline_link_destination_tokens before the p.at(L_ANGLE) check would unify both paths and fix this. The pre-skip in inline_link_is_valid can then be removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 536 -
542, The validation path pre-skips title-separator whitespace (using
is_title_separator_token and bump_link_def_separator) before calling
parse_inline_link_destination_tokens, but the parser does not, causing inputs
like "( <url> )" to fail; move the whitespace-skip logic into
parse_inline_link_destination_tokens (i.e., consume title-separator tokens
before checking p.at(L_ANGLE) and before the angle-bracket branch) so both
validation and parsing behave the same, and then remove the separate pre-skip
loop from inline_link_is_valid.

@jfmcdowell jfmcdowell requested a review from dyc3 February 23, 2026 17:03
@dyc3 dyc3 merged commit 412a08d into biomejs:main Feb 27, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants