fix: parse each language injection region as independent tree#2479
Conversation
📝 WalkthroughWalkthroughChanged injection collection from a per-language HashMap to an ordered Vec of (language, ranges) pairs across trait, core, language, and CLI code so each injection match is returned as a separate entry for independent Tree-sitter parsing. Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI (injector)
participant LangImpl as LanguageImpls (Html, etc.)
participant Core as Core/TreeSitter
participant TSDoc as TreeSitter Doc
CLI->>LangImpl: request extract_injections(root)
LangImpl-->>CLI: return Vec<(lang, ranges)> (multiple entries allowed)
loop for each (lang, ranges)
CLI->>Core: create parse request (lang, ranges)
Core->>TSDoc: parse ranges as independent document
TSDoc-->>Core: parsed AST
Core-->>CLI: parsed results for this entry
end
CLI->>CLI: aggregate per-entry results (do not merge ranges)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2479 +/- ##
==========================================
+ Coverage 88.10% 88.11% +0.01%
==========================================
Files 108 108
Lines 17825 17843 +18
==========================================
+ Hits 15705 15723 +18
Misses 2120 2120 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/language/src/html.rs (1)
38-67:⚠️ Potential issue | 🟠 MajorHTML's
extract_injectionsstill groups ranges by language — may not achieve fully independent parsing.The internal
HashMapon line 43 continues to aggregate allraw_textranges for the same language into a single entry. Aftermap.into_iter().collect()on line 66, two<script>tags both defaulting to"js"yield one("js", [range1, range2])tuple, whichget_injectionswill parse as a single combined tree.If independent parsing per injection region is desired for HTML too, each
raw_textnode should be pushed as its own entry:♻️ Suggested change to produce independent entries
fn extract_injections<L: LanguageExt>( &self, root: Node<StrDoc<L>>, ) -> Vec<(String, Vec<TSRange>)> { let lang = root.lang(); - let mut map = HashMap::new(); + let mut ret = Vec::new(); let matcher = KindMatcher::new("script_element", lang.clone()); for script in root.find_all(matcher) { let injected = find_lang(&script).unwrap_or_else(|| "js".into()); let content = script.children().find(|c| c.kind() == "raw_text"); if let Some(content) = content { - map - .entry(injected) - .or_insert_with(Vec::new) - .push(node_to_range(&content)); + ret.push((injected, vec![node_to_range(&content)])); }; } let matcher = KindMatcher::new("style_element", lang.clone()); for style in root.find_all(matcher) { let injected = find_lang(&style).unwrap_or_else(|| "css".into()); let content = style.children().find(|c| c.kind() == "raw_text"); if let Some(content) = content { - map - .entry(injected) - .or_insert_with(Vec::new) - .push(node_to_range(&content)); + ret.push((injected, vec![node_to_range(&content)])); }; } - map.into_iter().collect() + ret }Note: If this change is applied, the test helper
extract()and teststest_html_extraction/test_explicit_langwould need updating since they currently collect back into aHashMap.
🤖 Fix all issues with AI agents
In `@crates/language/src/html.rs`:
- Around line 140-143: The test helper fn extract currently returns a HashMap by
collecting Html.extract_injections(root.root()), which merges entries with the
same language key and hides per-entry independence; change extract's signature
to return the raw sequence (e.g., Vec<(String, Vec<TSRange>)> or the exact
collection type produced by Html.extract_injections) instead of HashMap, and
simply return Html.extract_injections(root.root()) (remove
into_iter().collect()), then update any tests/call sites that expected a HashMap
to iterate the returned Vec to preserve separate entries for the same language;
references: extract, Html.ast_grep, Html.extract_injections, TSRange.
🧹 Nitpick comments (1)
crates/cli/src/utils/mod.rs (1)
136-142: Injection processing loop is correct; note the silenced error oncollect_file_stats.The loop correctly iterates all injection roots and pushes each to
ret. However,.ok()on line 139 silently discards any error fromcollect_file_stats, while the same call on line 134 (for the host language) propagates errors via?. This asymmetry means tracing/stats failures for injected languages will be silently swallowed.If this is intentional (e.g., injected languages may not always have rules configured), consider adding a brief comment explaining why errors are ignored here.
956d887 to
2a06992
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/cli/src/lang/injection.rs (1)
170-184:node_to_rangeis duplicated withcrates/language/src/html.rs.This helper function has identical logic to the one in
crates/language/src/html.rs(lines 78-92). Consider extracting it into a shared location (e.g.,ast_grep_core) to avoid drift.
|
Thanks! The fix looks awesome |
##### [\`v0.41.0\`](https://github.com/ast-grep/ast-grep/blob/HEAD/CHANGELOG.md#0410) - chore(deps): update pyo3/maturin-action action to v1.50.0 [`#2465`](ast-grep/ast-grep#2465) - fix(deps): update rust-wasm-bindgen monorepo [`#2491`](ast-grep/ast-grep#2491) - chore(deps): update rust crate toml\_edit to v0.25.3 [`#2469`](ast-grep/ast-grep#2469) - chore(deps): update dependency web-tree-sitter to ^0.26.0 [`#2492`](ast-grep/ast-grep#2492) - refactor: addr\_of\_mut to OnceLock [`#2486`](ast-grep/ast-grep#2486) - feat: support ast-grep-wasm [`#2484`](ast-grep/ast-grep#2484) - chore(deps): update rust crate clap to v4.5.60 [`#2483`](ast-grep/ast-grep#2483) - chore(deps): update rust crate anyhow to v1.0.102 [`#2489`](ast-grep/ast-grep#2489) - chore(deps): update dependency oxlint to v1.49.0 [`#2488`](ast-grep/ast-grep#2488) - feat: make default rule id to filename [`#2482`](ast-grep/ast-grep#2482) - chore(deps): update dependency oxlint to v1.48.0 [`#2481`](ast-grep/ast-grep#2481) - chore(deps): update rust crate napi-derive to v3.5.2 [`#2475`](ast-grep/ast-grep#2475) - chore(deps): update rust crate napi to v3.8.3 [`#2474`](ast-grep/ast-grep#2474) - chore(deps): update rust crate futures to v0.3.32 [`#2476`](ast-grep/ast-grep#2476) - chore(deps): update rust crate clap to v4.5.58 [`#2470`](ast-grep/ast-grep#2470) - fix: parse each language injection region as independent tree [`#2479`](ast-grep/ast-grep#2479) - chore(deps): update dependency [@types/node](https://github.com/types/node) to v24.10.13 [`#2467`](ast-grep/ast-grep#2467) - chore(deps): update rust crate clap\_complete to v4.5.66 [`#2471`](ast-grep/ast-grep#2471) - chore(deps): update rust crate predicates to v3.1.4 [`#2472`](ast-grep/ast-grep#2472) - chore(deps): update dependency oxlint to v1.47.0 [`#2468`](ast-grep/ast-grep#2468) - chore(deps): update rust crate tempfile to v3.25.0 [`#2466`](ast-grep/ast-grep#2466) - fix(deps): update rust crate toml\_edit to 0.25.0 [`#2473`](ast-grep/ast-grep#2473) - chore(deps): update rust crate tree-sitter to v0.26.5 [`#2458`](ast-grep/ast-grep#2458) - chore(deps): update dependency [@types/node](https://github.com/types/node) to v24.10.12 [`#2438`](ast-grep/ast-grep#2438) - chore(deps): update rust crate schemars to v1.2.1 [`#2457`](ast-grep/ast-grep#2457) - chore(deps): update rust crate clap to v4.5.57 [`#2455`](ast-grep/ast-grep#2455) - chore(deps): update rust crate anyhow to v1.0.101 [`#2462`](ast-grep/ast-grep#2462) - chore(deps): update rust crate inquire to v0.9.3 [`#2463`](ast-grep/ast-grep#2463) - chore(deps): update dependency oxlint to v1.43.0 [`#2459`](ast-grep/ast-grep#2459) - chore(deps): update rust crate regex to v1.12.3 [`#2461`](ast-grep/ast-grep#2461) - chore(deps): update rust crate clap to v4.5.55 [`#2454`](ast-grep/ast-grep#2454) - chore(deps): update dependency oxlint to v1.42.0 [`#2452`](ast-grep/ast-grep#2452) - chore(deps): update dependency oxlint to v1.41.0 [`#2447`](ast-grep/ast-grep#2447) - chore(deps): update rust crate assert\_cmd to v2.1.2 [`#2437`](ast-grep/ast-grep#2437) - chore(deps): update rust crate thiserror to v2.0.18 [`#2446`](ast-grep/ast-grep#2446) - chore(deps): update dependency oxlint to v1.39.0 [`#2442`](ast-grep/ast-grep#2442) - chore(deps): update rust crate inquire to v0.9.2 [`#2445`](ast-grep/ast-grep#2445) - fix: improve max diagnostic impl to use one atomic op [`781ec81`](ast-grep/ast-grep@781ec81) - fix: udpate schema [`e645fdf`](ast-grep/ast-grep@e645fdf) - fix: move dumping to core crate [`0b2acb9`](ast-grep/ast-grep@0b2acb9) Renovate-Branch: renovate/2024.6-ast-grep-cli-0.x Change-Id: I88da2bd1f360bebda04663520c30423d0cb7ddd1 Priv-Id: 72123de3a05b89754a6bf040b3bd2708f3d9a9e2
fix #2478
Summary by CodeRabbit