refactor(markdown-parser): deduplicate link scan/parse with BumpMode#9192
Conversation
|
WalkthroughThe PR refactors inline markdown link parsing to use mode-aware parsing functions instead of direct token-bump calls. scan_* helpers (e.g. scan_inline_link_destination_tokens, scan_title_content) were removed and replaced by parse_inline_link_destination_tokens and parse_title_content, with bump-based separators such as bump_link_def_separator and bump_textual_link_def. Angle-bracket and raw destination branches remain distinct but now use lookahead validation without generating CST nodes. Link validity and destination parsing outcomes (Valid/Invalid/DepthExceeded) and public signatures are preserved. Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| /// Real parse: wrap each token in MD_TEXTUAL CST nodes. | ||
| Parse, | ||
| } | ||
|
|
There was a problem hiding this comment.
We already have a concept of speculative parsing. IIRC the CSS parser uses it, you should probably go look at that.
Otherwise, we usually bump, check, and if it's bad, rewind the state of the lexer and parser and try something else. Depends on the situation.
There was a problem hiding this comment.
TIL. I changed the approach to use the speculative parsing pattern
Call parse functions directly inside p.lookahead() instead of maintaining separate scan/parse paths. CST nodes are auto-reverted on rewind.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/syntax/inline/links.rs (1)
558-564:get_title_close_charcalled twice with no state change in between.Cache the result once and pass it straight to
parse_title_content.♻️ Cache the title close character
- let has_title = saw_separator && get_title_close_char(p).is_some(); - ... - if has_title { - parse_title_content(p, get_title_close_char(p)); - } + let title_close = get_title_close_char(p); + let has_title = saw_separator && title_close.is_some(); + ... + if has_title { + parse_title_content(p, title_close); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 558 - 564, Compute and stash the result of get_title_close_char(p) once, e.g. let title_close = get_title_close_char(p); then set has_title = saw_separator && title_close.is_some(); keep the separator-bumping loop as-is, and if has_title call parse_title_content(p, title_close) instead of calling get_title_close_char(p) again; this ensures a single, consistent title-close value is used for both the check and the subsequent parse.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 554-561: The second while loop that calls bump_link_def_separator
is dead because the first loop already advances through all title separator
tokens; remove the second loop entirely. Keep the initial loop that sets
saw_separator and advances via bump_link_def_separator(p), and retain the
has_title computation using get_title_close_char(p).is_some(); verify there are
no leftover assumptions about a later consumer of title-separator tokens in
surrounding code (references: is_title_separator_token, bump_link_def_separator,
get_title_close_char, saw_separator).
---
Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 558-564: Compute and stash the result of get_title_close_char(p)
once, e.g. let title_close = get_title_close_char(p); then set has_title =
saw_separator && title_close.is_some(); keep the separator-bumping loop as-is,
and if has_title call parse_title_content(p, title_close) instead of calling
get_title_close_char(p) again; this ensures a single, consistent title-close
value is used for both the check and the subsequent parse.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/biome_markdown_parser/src/syntax/inline/links.rs (1)
558-561: Nitpick:get_title_close_charcalled twice with no state change in between.🧹 Cache the result
+ let title_close_char = get_title_close_char(p); - let has_title = saw_separator && get_title_close_char(p).is_some(); + let has_title = saw_separator && title_close_char.is_some(); if has_title { - parse_title_content(p, get_title_close_char(p)); + parse_title_content(p, title_close_char); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 558 - 561, The code calls get_title_close_char(p) twice; cache its result into a local (e.g., let title_close = get_title_close_char(p)) after computing has_title so you reuse title_close for both the has_title check and the parse_title_content call (replace parse_title_content(p, get_title_close_char(p)) with parse_title_content(p, title_close)); update uses around has_title and parse_title_content to reference the cached title_close and avoid redundant calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 536-542: The validation path pre-skips title-separator whitespace
(using is_title_separator_token and bump_link_def_separator) before calling
parse_inline_link_destination_tokens, but the parser does not, causing inputs
like "( <url> )" to fail; move the whitespace-skip logic into
parse_inline_link_destination_tokens (i.e., consume title-separator tokens
before checking p.at(L_ANGLE) and before the angle-bracket branch) so both
validation and parsing behave the same, and then remove the separate pre-skip
loop from inline_link_is_valid.
---
Nitpick comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 558-561: The code calls get_title_close_char(p) twice; cache its
result into a local (e.g., let title_close = get_title_close_char(p)) after
computing has_title so you reuse title_close for both the has_title check and
the parse_title_content call (replace parse_title_content(p,
get_title_close_char(p)) with parse_title_content(p, title_close)); update uses
around has_title and parse_title_content to reference the cached title_close and
avoid redundant calls.
| // Skip leading whitespace before angle bracket check | ||
| // (parse_inline_link_destination_tokens only skips whitespace in the raw path). | ||
| while is_title_separator_token(p) { | ||
| bump_link_def_separator(p); | ||
| } | ||
|
|
||
| let destination_result = parse_inline_link_destination_tokens(p); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's confirm the whitespace pre-skip at lines 536-542
sed -n '536,542p' crates/biome_markdown_parser/src/syntax/inline/links.rsRepository: biomejs/biome
Length of output: 376
🏁 Script executed:
# Now check the actual parse path at lines 197-200
sed -n '197,200p' crates/biome_markdown_parser/src/syntax/inline/links.rsRepository: biomejs/biome
Length of output: 244
🏁 Script executed:
# Search for test cases that might test angle-bracket destinations with leading whitespace
rg -n '\(\s*<' crates/biome_markdown_parser/tests/ --type rust -B2 -A2Repository: biomejs/biome
Length of output: 39
🏁 Script executed:
# Look for inline link tests in general to understand current test coverage
fd test crates/biome_markdown_parser/tests/ --type rust | head -10Repository: biomejs/biome
Length of output: 228
🏁 Script executed:
# Search for CommonMark-related fixtures or test data
fd -e md -e txt crates/biome_markdown_parser/tests/ | head -20Repository: biomejs/biome
Length of output: 502
🏁 Script executed:
# Find test fixture files in markdown parser tests
find crates/biome_markdown_parser/tests -type f \( -name "*.md" -o -name "*.txt" -o -name "*.json" \) | head -20Repository: biomejs/biome
Length of output: 1535
🏁 Script executed:
# Look for inline_link or link-related tests
rg -l "inline_link|parse_inline" crates/biome_markdown_parser/ --type rustRepository: biomejs/biome
Length of output: 387
🏁 Script executed:
# Check what the parse_inline_link_destination_tokens function does with leading whitespace
sed -n '588,650p' crates/biome_markdown_parser/src/syntax/inline/links.rs | head -70Repository: biomejs/biome
Length of output: 2255
🏁 Script executed:
# Let's look at the raw destination path whitespace skip location to verify the comment claim
sed -n '625,640p' crates/biome_markdown_parser/src/syntax/inline/links.rsRepository: biomejs/biome
Length of output: 443
🏁 Script executed:
# Search for inline link tests with angle brackets
find crates/biome_markdown_parser/tests -name "*link*" -type fRepository: biomejs/biome
Length of output: 1801
🏁 Script executed:
# Check if there are any test fixtures with links containing angle brackets and spaces
rg '\(\s*<' crates/biome_markdown_parser/tests/md_test_suite/ -lRepository: biomejs/biome
Length of output: 231
🏁 Script executed:
# Look at the links test fixture
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_links.md 2>/dev/null || echo "File not found, searching for similar..."Repository: biomejs/biome
Length of output: 97
🏁 Script executed:
# Find and display link-related fixtures
find crates/biome_markdown_parser/tests -name "*link*" -o -name "*url*" 2>/dev/nullRepository: biomejs/biome
Length of output: 1801
🏁 Script executed:
# Let's check what fixtures exist and search for any that might test this pattern
ls crates/biome_markdown_parser/tests/md_test_suite/ok/ | grep -i linkRepository: biomejs/biome
Length of output: 657
🏁 Script executed:
# Check the whitespace test file - this is likely where the issue would be tested
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_whitespace.mdRepository: biomejs/biome
Length of output: 356
🏁 Script executed:
# Also check the destination/title test
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_destination_title.mdRepository: biomejs/biome
Length of output: 224
🏁 Script executed:
# Check the destination/title snapshot to see what's expected
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_destination_title.md.snapRepository: biomejs/biome
Length of output: 17978
🏁 Script executed:
# Let's check the inline_link_whitespace test to see if it covers space before angle brackets
cat crates/biome_markdown_parser/tests/md_test_suite/ok/inline_link_whitespace.md.snap | head -100Repository: biomejs/biome
Length of output: 3318
🏁 Script executed:
# Search for any tests with the specific pattern of space + angle bracket
rg '\(\s+<' crates/biome_markdown_parser/tests/ --type mdRepository: biomejs/biome
Length of output: 39
🏁 Script executed:
# Let's verify the exact position of the whitespace skip in parse_inline_link_destination_tokens
sed -n '588,600p' crates/biome_markdown_parser/src/syntax/inline/links.rsRepository: biomejs/biome
Length of output: 508
🏁 Script executed:
# Create a simple test to confirm the asymmetry: check if [link]( <url> ) is currently rejected
cat > /tmp/test_space_angle.md << 'EOF'
Space before angle bracket: [link]( <url> )
No space before angle bracket: [link](<url>)
EOF
cat /tmp/test_space_angle.mdRepository: biomejs/biome
Length of output: 145
🏁 Script executed:
# Check if the destination validation path (inline_link_is_valid) handles this differently
# Let's look at where inline_link_is_valid is called from
sed -n '528,575p' crates/biome_markdown_parser/src/syntax/inline/links.rsRepository: biomejs/biome
Length of output: 1529
🏁 Script executed:
# Verify the exact flow by looking at parse_link_or_image which calls inline_link_is_valid and then parse
sed -n '187,220p' crates/biome_markdown_parser/src/syntax/inline/links.rsRepository: biomejs/biome
Length of output: 1425
Whitespace handling between validation and parsing diverges for ( <url> ) inputs.
The lookahead path (lines 536–540) pre-skips whitespace before checking for the enclosed destination (<url>), so validation succeeds. The actual parse path makes no such pre-skip, so parse_inline_link_destination_tokens misses the angle bracket (sees whitespace instead), falls into the raw destination branch, and later rejects < because it's disallowed in raw destinations per CommonMark. This causes spec-valid [link]( <url> ) to be silently rejected.
Moving the whitespace skip inside parse_inline_link_destination_tokens before the p.at(L_ANGLE) check would unify both paths and fix this. The pre-skip in inline_link_is_valid can then be removed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 536 -
542, The validation path pre-skips title-separator whitespace (using
is_title_separator_token and bump_link_def_separator) before calling
parse_inline_link_destination_tokens, but the parser does not, causing inputs
like "( <url> )" to fail; move the whitespace-skip logic into
parse_inline_link_destination_tokens (i.e., consume title-separator tokens
before checking p.at(L_ANGLE) and before the angle-bracket branch) so both
validation and parsing behave the same, and then remove the separate pre-skip
loop from inline_link_is_valid.
Note
AI Assistance Disclosure: This PR was developed with assistance from Claude Code.
Summary
Deduplicates two scan/parse pairs in
crates/biome_markdown_parser/src/syntax/inline/links.rsthat shared logic but differed in token bumping behavior.BumpMode(Scan/Parse) withbump_tokenandbump_separator.title_content_corefromscan_title_contentandparse_title_content.destination_tokens_corefromscan_inline_link_destination_tokensandparse_inline_link_destination_tokens.re_lextiming and whitespace-skip position).Test Plan
cargo test -p biome_markdown_parser— 64 tests pass, zero snapshot changescargo insta test -p biome_markdown_parser— no snapshot diffsjust test-markdown-conformancejust f && just lDocs
N/A