Skip to content

fix: parse each language injection region as independent tree#2479

Merged
HerringtonDarkholme merged 3 commits into
ast-grep:mainfrom
Dsaquel:fix/independent-injection-regions
Feb 15, 2026
Merged

fix: parse each language injection region as independent tree#2479
HerringtonDarkholme merged 3 commits into
ast-grep:mainfrom
Dsaquel:fix/independent-injection-regions

Conversation

@Dsaquel

@Dsaquel Dsaquel commented Feb 15, 2026

Copy link
Copy Markdown
Contributor

fix #2478

Summary by CodeRabbit

  • Refactor
    • Reworked how embedded/injected languages are collected and represented so each injected fragment is reported as its own entry.
    • Improves handling of multiple injection occurrences and ensures injected fragments are consistently included in file analysis, yielding more accurate parsing and tooling for files with embedded languages.

@coderabbitai

coderabbitai Bot commented Feb 15, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Changed injection collection from a per-language HashMap to an ordered Vec of (language, ranges) pairs across trait, core, language, and CLI code so each injection match is returned as a separate entry for independent Tree-sitter parsing.

Changes

Cohort / File(s) Summary
Trait & Core
crates/core/src/tree_sitter/mod.rs
LanguageExt::extract_injections signature changed to return Vec<(String, Vec<TSRange>)>; callers updated to iterate (lang_str, ranges) and treat each entry as an independent parse unit.
CLI: injection logic
crates/cli/src/lang/injection.rs
extract_injections / extract_custom_inject signatures changed to use Vec<(String, Vec<TSRange>)>; logic now pushes separate (lang, ranges) entries per match; tests updated and new test added for multiple matches producing separate entries.
CLI: language module & utils
crates/cli/src/lang/mod.rs, crates/cli/src/utils/mod.rs
Call sites updated for the new Vec return; deduplication via HashSet added when resolving injectable languages; utils now iterates injection roots and collects stats per injection root rather than mapping/filtering previous sets.
Language implementations
crates/language/src/html.rs, crates/language/src/lib.rs
Html and SupportLang implementations updated to return Vec<(String, Vec<TSRange>)>; aggregation changed from HashMap insertions to pushing tuple entries; non-HTML variants return empty Vec.
Tests & imports
crates/.../*tests*, crates/...
Tests and call sites adapted to index-ordered Vec entries; removed unused HashMap imports and related aggregation logic.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI (injector)
    participant LangImpl as LanguageImpls (Html, etc.)
    participant Core as Core/TreeSitter
    participant TSDoc as TreeSitter Doc

    CLI->>LangImpl: request extract_injections(root)
    LangImpl-->>CLI: return Vec<(lang, ranges)> (multiple entries allowed)
    loop for each (lang, ranges)
        CLI->>Core: create parse request (lang, ranges)
        Core->>TSDoc: parse ranges as independent document
        TSDoc-->>Core: parsed AST
        Core-->>CLI: parsed results for this entry
    end
    CLI->>CLI: aggregate per-entry results (do not merge ranges)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I hopped through ranges, one by one I peep,
No more bundled naps where matches sleep,
Each tiny match gets its own sunny bed,
Parsers sing softly, duplicates shed,
A carrot for tests — tidy trees ahead!

🚥 Pre-merge checks | ✅ 5 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: parsing each language injection region as independent trees instead of grouping them.
Linked Issues check ✅ Passed The pull request implements the required objective from issue #2478 by changing return types from HashMap to Vec to parse each injection region independently.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the language injection issue; modifications across multiple files consistently implement independent region parsing.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Feb 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.87500% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.11%. Comparing base (85d13f7) to head (2a06992).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/tree_sitter/mod.rs 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2479      +/-   ##
==========================================
+ Coverage   88.10%   88.11%   +0.01%     
==========================================
  Files         108      108              
  Lines       17825    17843      +18     
==========================================
+ Hits        15705    15723      +18     
  Misses       2120     2120              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/language/src/html.rs (1)

38-67: ⚠️ Potential issue | 🟠 Major

HTML's extract_injections still groups ranges by language — may not achieve fully independent parsing.

The internal HashMap on line 43 continues to aggregate all raw_text ranges for the same language into a single entry. After map.into_iter().collect() on line 66, two <script> tags both defaulting to "js" yield one ("js", [range1, range2]) tuple, which get_injections will parse as a single combined tree.

If independent parsing per injection region is desired for HTML too, each raw_text node should be pushed as its own entry:

♻️ Suggested change to produce independent entries
   fn extract_injections<L: LanguageExt>(
     &self,
     root: Node<StrDoc<L>>,
   ) -> Vec<(String, Vec<TSRange>)> {
     let lang = root.lang();
-    let mut map = HashMap::new();
+    let mut ret = Vec::new();
     let matcher = KindMatcher::new("script_element", lang.clone());
     for script in root.find_all(matcher) {
       let injected = find_lang(&script).unwrap_or_else(|| "js".into());
       let content = script.children().find(|c| c.kind() == "raw_text");
       if let Some(content) = content {
-        map
-          .entry(injected)
-          .or_insert_with(Vec::new)
-          .push(node_to_range(&content));
+        ret.push((injected, vec![node_to_range(&content)]));
       };
     }
     let matcher = KindMatcher::new("style_element", lang.clone());
     for style in root.find_all(matcher) {
       let injected = find_lang(&style).unwrap_or_else(|| "css".into());
       let content = style.children().find(|c| c.kind() == "raw_text");
       if let Some(content) = content {
-        map
-          .entry(injected)
-          .or_insert_with(Vec::new)
-          .push(node_to_range(&content));
+        ret.push((injected, vec![node_to_range(&content)]));
       };
     }
-    map.into_iter().collect()
+    ret
   }

Note: If this change is applied, the test helper extract() and tests test_html_extraction / test_explicit_lang would need updating since they currently collect back into a HashMap.

🤖 Fix all issues with AI agents
In `@crates/language/src/html.rs`:
- Around line 140-143: The test helper fn extract currently returns a HashMap by
collecting Html.extract_injections(root.root()), which merges entries with the
same language key and hides per-entry independence; change extract's signature
to return the raw sequence (e.g., Vec<(String, Vec<TSRange>)> or the exact
collection type produced by Html.extract_injections) instead of HashMap, and
simply return Html.extract_injections(root.root()) (remove
into_iter().collect()), then update any tests/call sites that expected a HashMap
to iterate the returned Vec to preserve separate entries for the same language;
references: extract, Html.ast_grep, Html.extract_injections, TSRange.
🧹 Nitpick comments (1)
crates/cli/src/utils/mod.rs (1)

136-142: Injection processing loop is correct; note the silenced error on collect_file_stats.

The loop correctly iterates all injection roots and pushes each to ret. However, .ok() on line 139 silently discards any error from collect_file_stats, while the same call on line 134 (for the host language) propagates errors via ?. This asymmetry means tracing/stats failures for injected languages will be silently swallowed.

If this is intentional (e.g., injected languages may not always have rules configured), consider adding a brief comment explaining why errors are ignored here.

Comment thread crates/language/src/html.rs Outdated
@Dsaquel Dsaquel force-pushed the fix/independent-injection-regions branch from 956d887 to 2a06992 Compare February 15, 2026 20:24

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/cli/src/lang/injection.rs (1)

170-184: node_to_range is duplicated with crates/language/src/html.rs.

This helper function has identical logic to the one in crates/language/src/html.rs (lines 78-92). Consider extracting it into a shared location (e.g., ast_grep_core) to avoid drift.

Comment thread crates/cli/src/lang/mod.rs
Comment thread crates/cli/src/utils/mod.rs
@HerringtonDarkholme HerringtonDarkholme merged commit b260b96 into ast-grep:main Feb 15, 2026
6 checks passed
@HerringtonDarkholme

Copy link
Copy Markdown
Member

Thanks! The fix looks awesome

@Dsaquel Dsaquel deleted the fix/independent-injection-regions branch February 15, 2026 22:25
nicopauss pushed a commit to Intersec/lib-common that referenced this pull request Apr 1, 2026
##### [\`v0.41.0\`](https://github.com/ast-grep/ast-grep/blob/HEAD/CHANGELOG.md#0410)

- chore(deps): update pyo3/maturin-action action to v1.50.0 [`#2465`](ast-grep/ast-grep#2465)
- fix(deps): update rust-wasm-bindgen monorepo [`#2491`](ast-grep/ast-grep#2491)
- chore(deps): update rust crate toml\_edit to v0.25.3 [`#2469`](ast-grep/ast-grep#2469)
- chore(deps): update dependency web-tree-sitter to ^0.26.0 [`#2492`](ast-grep/ast-grep#2492)
- refactor: addr\_of\_mut to OnceLock [`#2486`](ast-grep/ast-grep#2486)
- feat: support ast-grep-wasm [`#2484`](ast-grep/ast-grep#2484)
- chore(deps): update rust crate clap to v4.5.60 [`#2483`](ast-grep/ast-grep#2483)
- chore(deps): update rust crate anyhow to v1.0.102 [`#2489`](ast-grep/ast-grep#2489)
- chore(deps): update dependency oxlint to v1.49.0 [`#2488`](ast-grep/ast-grep#2488)
- feat: make default rule id to filename [`#2482`](ast-grep/ast-grep#2482)
- chore(deps): update dependency oxlint to v1.48.0 [`#2481`](ast-grep/ast-grep#2481)
- chore(deps): update rust crate napi-derive to v3.5.2 [`#2475`](ast-grep/ast-grep#2475)
- chore(deps): update rust crate napi to v3.8.3 [`#2474`](ast-grep/ast-grep#2474)
- chore(deps): update rust crate futures to v0.3.32 [`#2476`](ast-grep/ast-grep#2476)
- chore(deps): update rust crate clap to v4.5.58 [`#2470`](ast-grep/ast-grep#2470)
- fix: parse each language injection region as independent tree [`#2479`](ast-grep/ast-grep#2479)
- chore(deps): update dependency [@types/node](https://github.com/types/node) to v24.10.13 [`#2467`](ast-grep/ast-grep#2467)
- chore(deps): update rust crate clap\_complete to v4.5.66 [`#2471`](ast-grep/ast-grep#2471)
- chore(deps): update rust crate predicates to v3.1.4 [`#2472`](ast-grep/ast-grep#2472)
- chore(deps): update dependency oxlint to v1.47.0 [`#2468`](ast-grep/ast-grep#2468)
- chore(deps): update rust crate tempfile to v3.25.0 [`#2466`](ast-grep/ast-grep#2466)
- fix(deps): update rust crate toml\_edit to 0.25.0 [`#2473`](ast-grep/ast-grep#2473)
- chore(deps): update rust crate tree-sitter to v0.26.5 [`#2458`](ast-grep/ast-grep#2458)
- chore(deps): update dependency [@types/node](https://github.com/types/node) to v24.10.12 [`#2438`](ast-grep/ast-grep#2438)
- chore(deps): update rust crate schemars to v1.2.1 [`#2457`](ast-grep/ast-grep#2457)
- chore(deps): update rust crate clap to v4.5.57 [`#2455`](ast-grep/ast-grep#2455)
- chore(deps): update rust crate anyhow to v1.0.101 [`#2462`](ast-grep/ast-grep#2462)
- chore(deps): update rust crate inquire to v0.9.3 [`#2463`](ast-grep/ast-grep#2463)
- chore(deps): update dependency oxlint to v1.43.0 [`#2459`](ast-grep/ast-grep#2459)
- chore(deps): update rust crate regex to v1.12.3 [`#2461`](ast-grep/ast-grep#2461)
- chore(deps): update rust crate clap to v4.5.55 [`#2454`](ast-grep/ast-grep#2454)
- chore(deps): update dependency oxlint to v1.42.0 [`#2452`](ast-grep/ast-grep#2452)
- chore(deps): update dependency oxlint to v1.41.0 [`#2447`](ast-grep/ast-grep#2447)
- chore(deps): update rust crate assert\_cmd to v2.1.2 [`#2437`](ast-grep/ast-grep#2437)
- chore(deps): update rust crate thiserror to v2.0.18 [`#2446`](ast-grep/ast-grep#2446)
- chore(deps): update dependency oxlint to v1.39.0 [`#2442`](ast-grep/ast-grep#2442)
- chore(deps): update rust crate inquire to v0.9.2 [`#2445`](ast-grep/ast-grep#2445)
- fix: improve max diagnostic impl to use one atomic op [`781ec81`](ast-grep/ast-grep@781ec81)
- fix: udpate schema [`e645fdf`](ast-grep/ast-grep@e645fdf)
- fix: move dumping to core crate [`0b2acb9`](ast-grep/ast-grep@0b2acb9)

Renovate-Branch: renovate/2024.6-ast-grep-cli-0.x
Change-Id: I88da2bd1f360bebda04663520c30423d0cb7ddd1
Priv-Id: 72123de3a05b89754a6bf040b3bd2708f3d9a9e2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] languageInjections: combined ranges produce wrong matches and duplicates

2 participants