Skip to content

refactor(markdown-parser): decompose parse_link_or_image into phase helpers#9452

Merged
ematipico merged 1 commit into
biomejs:mainfrom
jfmcdowell:refactor/parse-link-or-image-extraction
Mar 12, 2026
Merged

refactor(markdown-parser): decompose parse_link_or_image into phase helpers#9452
ematipico merged 1 commit into
biomejs:mainfrom
jfmcdowell:refactor/parse-link-or-image-extraction

Conversation

@jfmcdowell

Copy link
Copy Markdown
Contributor

Note

AI Assistance Disclosure: This PR was developed with assistance from Claude Code.

Summary

Decompose parse_link_or_image into four focused helpers so link/image parsing is easier to follow and review without changing markdown parsing behavior.

  • Extract parse_bracketed_link_text for [text] / ![alt] parsing.
  • Extract parse_inline_link_tail for (destination "title") parsing.
  • Extract parse_reference_link_tail for full, collapsed, and shortcut reference handling.
  • Extract abort_link_parse with RollbackKind enum to centralize all rollback paths (marker abandonment, checkpoint rewind, conditional re-lex).
  • Replace expect_with_context(L_PAREN, ...) with eat_with_context to eliminate panicking code and spurious diagnostics on internal preconditions.

Test Plan

  • just test-crate biome_markdown_parser
  • just test-markdown-conformance (652/652, 100%)
  • just f && just l

Docs

N/A — internal parser refactor with no user-facing behavior change.

…elpers

Extract four helpers from the monolithic `parse_link_or_image` function:

- `parse_bracketed_link_text`: parses `[text]` or `![alt]`, reports facts
- `parse_inline_link_tail`: parses `(destination "title")` after `]`
- `parse_reference_link_tail`: handles full, collapsed, and shortcut references
- `abort_link_parse`: centralized rollback with `RollbackKind` enum

The coordinator (`parse_link_or_image`) now owns orchestration and policy
(nested-link rejection, inline validation dispatch) while helpers report
facts and return success/rollback results.

Key improvements:
- Replaced `expect_with_context(L_PAREN)` with `eat_with_context` to
  eliminate diagnostic emission for internal preconditions
- Centralized rollback ensures consistent state restoration across all
  failure paths (marker abandonment, checkpoint rewind, conditional relex)
- Zero `unwrap`/`expect`/`panic!`/`unreachable!` in the touched flow

Pure structural refactor — no behavioral changes.
@changeset-bot

changeset-bot Bot commented Mar 11, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: f6dad89

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions Bot added A-Parser Area: parser L-Markdown Language: Markdown labels Mar 11, 2026
@jfmcdowell jfmcdowell marked this pull request as ready for review March 12, 2026 00:13
@coderabbitai

coderabbitai Bot commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

This change refactors the markdown link parsing logic in links.rs by introducing a multi-phase parsing approach. It adds new internal enums (BracketedLinkTextResult, LinkTailResult, RollbackKind) and a centralised abort_link_parse helper to unify rollback behaviour. The refactoring decouples bracket parsing from tail parsing through new phase helpers (parse_bracketed_link_text, parse_inline_link_tail, parse_reference_link_tail) whilst maintaining the existing public API surface.

Possibly related PRs

Suggested reviewers

  • dyc3
  • ematipico
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change—decomposing parse_link_or_image into phase helpers—which matches the core refactoring work in the PR.
Description check ✅ Passed The description clearly relates to the changeset, detailing the four extracted helpers, the centralised rollback logic, and the intent to improve code clarity without behavioural changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/biome_markdown_parser/src/syntax/inline/links.rs`:
- Around line 215-220: The rollback path in the match handling
parse_inline_link_tail incorrectly uses RollbackKind::AfterLexSwitch even when
eat_with_context(L_PAREN, MarkdownLexContext::LinkDefinition) failed (no
lex-context switch occurred); update the LinkTailResult::Rollback branch in the
match (the call to abort_link_parse(p, checkpoint, m,
RollbackKind::AfterLexSwitch)) to use RollbackKind::BeforeLexSwitch instead so
abort_link_parse accurately reflects that no lex switch happened (refer to
parse_inline_link_tail, abort_link_parse, eat_with_context, and the RollbackKind
enum).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 693b12a5-cd1d-4ab9-989a-4539016e9931

📥 Commits

Reviewing files that changed from the base of the PR and between 89bf719 and f6dad89.

📒 Files selected for processing (1)
  • crates/biome_markdown_parser/src/syntax/inline/links.rs

Comment on lines +215 to +220
match parse_inline_link_tail(p, kind, opening_range) {
LinkTailResult::Parsed => Present(m.complete(p, kind.inline_kind())),
LinkTailResult::Rollback => {
abort_link_parse(p, checkpoint, m, RollbackKind::AfterLexSwitch)
}
}

@coderabbitai coderabbitai Bot Mar 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the implementation of eat_with_context to understand context-switching semantics
ast-grep --pattern $'fn eat_with_context($$$) {
  $$$
}'

# Also check bump_with_context for comparison
rg -n -A10 'fn eat_with_context' --type rust

Repository: biomejs/biome

Length of output: 705


🏁 Script executed:

sed -n '376,395p' crates/biome_parser/src/lib.rs

Repository: biomejs/biome

Length of output: 519


🏁 Script executed:

rg -n -A15 'fn bump_with_context' crates/biome_parser/src/lib.rs | head -40

Repository: biomejs/biome

Length of output: 561


🏁 Script executed:

rg -n 'RollbackKind' crates/biome_markdown_parser/src/syntax/ -A2 -B2

Repository: biomejs/biome

Length of output: 3516


🏁 Script executed:

rg -n 'parse_bracketed_link_text' crates/biome_markdown_parser/src/syntax/inline/ -A20 | head -50

Repository: biomejs/biome

Length of output: 4464


🏁 Script executed:

sed -n '240,280p' crates/biome_markdown_parser/src/syntax/inline/links.rs

Repository: biomejs/biome

Length of output: 1604


The use of RollbackKind::AfterLexSwitch when eat_with_context fails is semantically incorrect.

When eat_with_context(L_PAREN, MarkdownLexContext::LinkDefinition) returns false at line 277, no token was consumed and no context switch occurred—the function returns early without calling do_bump_with_context. However, the coordinator uses RollbackKind::AfterLexSwitch at line 218, which causes force_relex_regular() to be called, implying a context switch that never happened.

In practice, this is a no-op (re-lexing in Regular context when already in Regular context changes nothing), but for correctness, consider using RollbackKind::BeforeLexSwitch instead, since no lex-context switch actually occurred.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/biome_markdown_parser/src/syntax/inline/links.rs` around lines 215 -
220, The rollback path in the match handling parse_inline_link_tail incorrectly
uses RollbackKind::AfterLexSwitch even when eat_with_context(L_PAREN,
MarkdownLexContext::LinkDefinition) failed (no lex-context switch occurred);
update the LinkTailResult::Rollback branch in the match (the call to
abort_link_parse(p, checkpoint, m, RollbackKind::AfterLexSwitch)) to use
RollbackKind::BeforeLexSwitch instead so abort_link_parse accurately reflects
that no lex switch happened (refer to parse_inline_link_tail, abort_link_parse,
eat_with_context, and the RollbackKind enum).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline tail rollback is conservatively treated as AfterLexSwitch because failures can occur both before and after eat_with_context; the latter requires regular-context re-lex.

@ematipico ematipico Mar 12, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip @jfmcdowell : start the sentence with @coderabbitai and it will pick up your message (you need to at it)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfmcdowell Thanks for the explanation — the conservative AfterLexSwitch approach makes sense. Since parse_inline_link_tail can fail at multiple points (before or after eat_with_context), the coordinator can't distinguish which happened, and calling force_relex_regular() when already in Regular context is indeed a no-op. No change needed here.


🧠 Learnings used
Learnt from: ematipico
Repo: biomejs/biome PR: 8031
File: crates/biome_html_parser/src/syntax/svelte.rs:140-147
Timestamp: 2025-11-09T12:47:46.298Z
Learning: In the Biome HTML parser, `expect` and `expect_with_context` consume the current token and then lex the next token. The context parameter in `expect_with_context` controls how the next token (after the consumed one) is lexed, not the current token being consumed. For example, in Svelte parsing, after `bump_with_context(T!["{:"], HtmlLexContext::Svelte)`, the next token is already lexed in the Svelte context, so `expect(T![else])` is sufficient unless the token after `else` also needs to be lexed in a specific context.

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Learnt from: ematipico
Repo: biomejs/biome PR: 8537
File: crates/biome_js_analyze/src/lint/nursery/no_leaked_render.rs:167-210
Timestamp: 2025-12-22T09:27:13.161Z
Learning: In crates/biome_analyze/**/*analyze/src/**/*.rs, the `fix_kind` field in `declare_lint_rule!` should only be specified when the rule implements the `action` function. Rules that only emit diagnostics without providing code fixes should not include `fix_kind` in their metadata.

Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Learnt from: dyc3
Repo: biomejs/biome PR: 8901
File: crates/biome_analyze_macros/src/group_macro.rs:9-9
Timestamp: 2026-01-29T22:10:30.888Z
Learning: Do not import xtask inside biome crates. In the biomeps repository, ensure that biome crates (e.g., crates/biome_*/...) do not have imports like use xtask_glue::... or other xtask-related crates. The correct dependency direction is for xtask to depend on and import biome crates, not the reverse, since xtask contains heavier utilities (e.g., git2) that can inflate build times when included in biome crates. When reviewing, verify dependency declarations (Cargo.toml) and any use statements to confirm xtask crates are not pulled into biome crates; prefer having xtask depend on biome crates and export needed functionality through stable interfaces.

@ematipico ematipico merged commit dc526a5 into biomejs:main Mar 12, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser L-Markdown Language: Markdown

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants