Skip to content

feat: ast-grep Parser Abstratction (Part 1, Node)#1940

Merged
HerringtonDarkholme merged 13 commits intomainfrom
feat-non-ts
Apr 25, 2025
Merged

feat: ast-grep Parser Abstratction (Part 1, Node)#1940
HerringtonDarkholme merged 13 commits intomainfrom
feat-non-ts

Conversation

@HerringtonDarkholme
Copy link
Copy Markdown
Member

@HerringtonDarkholme HerringtonDarkholme commented Apr 20, 2025

#1918

Summary by CodeRabbit

  • Refactor
    • Removed explicit language-type parameters from matcher, rule, fixer, and pattern types across public APIs and internal implementations.
    • Split language trait into core language and extended language traits to separate foundational parsing logic from language-specific features.
    • Simplified trait bounds and type signatures in pattern matching, rule configuration, transformation, and replacement modules.
    • Reorganized trait implementations for language structs to improve modularity and clarity.
    • Introduced a new syntax tree node trait to generalize node handling across modules.
    • Generalized matcher traits and implementations by eliminating language-specific generics.
    • Updated node match and metavariable environment abstractions to support generic syntax tree nodes.
    • Refined replacer and transformation logic to use generalized node and environment types.
  • Bug Fixes
    • Enhanced consistency and robustness of rule matching and pattern processing across languages.
  • Chores
    • Updated internal APIs, macros, and tests to conform with the new non-generic interfaces and trait structures.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 20, 2025

Walkthrough

This set of changes refactors the language abstraction system by splitting the original Language trait into two: CoreLanguage (handling foundational language operations) and a refined Language trait (focused on Tree-sitter integration and advanced features). The refactor removes the generic language parameter from core types such as Pattern, Rule, Fixer, RuleCore, RuleRegistration, and associated functions, making them language-agnostic. All relevant method signatures, trait implementations, and struct fields are updated to reflect this change, eliminating the need for phantom type markers and simplifying type signatures across the codebase. The logic and control flow of the affected methods remain unchanged.

Changes

File(s) / Path(s) Change Summary
crates/core/src/language.rs, crates/cli/src/lang/mod.rs, crates/dynamic/src/lib.rs, crates/language/src/html.rs, crates/language/src/lib.rs, crates/napi/src/napi_lang.rs, crates/pyo3/src/py_lang.rs Split Language trait into CoreLanguage and Language; reorganized implementations accordingly; added or updated CoreLanguage impls for each language struct.
crates/core/src/matcher.rs, crates/core/src/matcher/kind.rs, crates/core/src/matcher/pattern.rs, crates/core/src/matcher/text.rs, crates/core/src/match_tree/mod.rs, crates/core/src/ops.rs Removed generic language parameter from Matcher trait and all matcher structs; updated trait bounds and implementations; all pattern and kind matchers are now language-agnostic.
crates/core/src/meta_var.rs, crates/core/src/node.rs, crates/core/src/traversal.rs, crates/core/src/matcher/node_match.rs Removed language parameter from matcher trait bounds in all relevant methods; updated method signatures accordingly; introduced SgNode trait for node abstraction.
crates/core/src/lib.rs, crates/core/src/replacer.rs, crates/core/src/replacer/structural.rs, crates/core/src/replacer/template.rs Removed generic language parameter from matcher trait bounds and method signatures; updated trait bounds for replacers and fixers.
crates/cli/src/config.rs, crates/cli/src/lang/injection.rs, crates/cli/src/print/interactive_print.rs, crates/cli/src/print/mod.rs, crates/cli/src/run.rs, crates/cli/src/utils/debug_query.rs, crates/cli/src/utils/mod.rs Removed generic language parameters from types such as GlobalRules, RuleCore, Pattern, Fixer, and Matcher in function signatures, struct fields, and implementations.
crates/config/src/check_var.rs, crates/config/src/fixer.rs, crates/config/src/lib.rs, crates/config/src/rule/deserialize_env.rs, crates/config/src/rule/mod.rs, crates/config/src/rule/nth_child.rs, crates/config/src/rule/range.rs, crates/config/src/rule/referent_rule.rs, crates/config/src/rule/relational_rule.rs, crates/config/src/rule/stop_by.rs, crates/config/src/rule_config.rs, crates/config/src/rule_core.rs, crates/config/src/transform/mod.rs, crates/config/src/transform/rewrite.rs Removed generic language parameter from all rule-related types (Rule, RuleCore, RuleRegistration, Fixer, etc.); updated all method signatures, struct fields, and trait implementations to use non-generic versions.
crates/napi/src/doc.rs, crates/napi/src/find_files.rs, crates/pyo3/src/py_node.rs Removed generic language parameter from RuleCore in function signatures and type aliases for FFI and bindings.
crates/core/src/match_tree/strictness.rs Replaced concrete Node<D> types with the new SgNode trait in method signatures for greater abstraction.
xtask/src/schema.rs Added empty CoreLanguage implementation for PlaceholderLang.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI/Config
    participant Pattern/Rule/Fixer
    participant CoreLanguage
    participant Language

    User->>CLI/Config: Provides language, patterns, rules, fixers
    CLI/Config->>Pattern/Rule/Fixer: Instantiates without language generics
    Pattern/Rule/Fixer->>CoreLanguage: Calls core parsing methods (e.g., meta_var_char)
    Pattern/Rule/Fixer->>Language: Calls advanced methods (e.g., get_ts_language)
    CLI/Config->>User: Returns results (matches, rewrites, etc.)
Loading

Possibly related PRs

  • ast-grep/ast-grep#1632: Refactors position handling with a new Position struct, affecting node APIs similar to the matcher and node changes in this PR.
  • ast-grep/ast-grep#1208: Modifies function signatures in match_tree.rs involving pattern matching, which overlaps with this PR's changes to matcher and pattern APIs.
  • ast-grep/ast-grep#1312: Adjusts usage of the Fixer type with language generics, which this PR removes for simplification.

Poem

In the warren of code, a rabbit did hop,
Snipping old generics, letting type bounds drop.
Now Pattern, Rule, and Fixer roam free,
No more language tags—just pure clarity!
With CoreLanguage roots and Language in bloom,
The code runs swift, with plenty of room.
🐇✨ Hooray for a simpler, more hoppy codebase!

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 20, 2025

Codecov Report

Attention: Patch coverage is 95.27687% with 29 lines in your changes missing coverage. Please review.

Project coverage is 87.04%. Comparing base (8265edb) to head (c5a0441).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
crates/dynamic/src/lib.rs 58.82% 7 Missing ⚠️
crates/config/src/rule/relational_rule.rs 89.18% 4 Missing ⚠️
crates/core/src/language.rs 42.85% 4 Missing ⚠️
crates/core/src/replacer.rs 76.47% 4 Missing ⚠️
crates/core/src/node.rs 96.62% 3 Missing ⚠️
crates/core/src/ops.rs 95.45% 2 Missing ⚠️
crates/cli/src/config.rs 75.00% 1 Missing ⚠️
crates/cli/src/lang/mod.rs 92.30% 1 Missing ⚠️
crates/cli/src/utils/debug_query.rs 0.00% 1 Missing ⚠️
crates/config/src/rule/deserialize_env.rs 75.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1940      +/-   ##
==========================================
- Coverage   87.14%   87.04%   -0.10%     
==========================================
  Files          99       98       -1     
  Lines       15884    15749     -135     
==========================================
- Hits        13842    13709     -133     
+ Misses       2042     2040       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (6)
crates/config/src/rule_config.rs (2)

201-205: ⚠️ Potential issue

map_err converts to the wrong error type – compilation will fail

Fixer::parse returns a result with its own error type (likely FixerError).
The current call:

let parsed = Fixer::parse(fix, &env, &self.transform).map_err(RuleCoreError::Fixer)?;

changes that error into RuleCoreError, but the ? operator subsequently tries to bubble up a RuleCoreError even though get_fixer is declared to return Result<_, RuleConfigError>. This is a type mismatch and will not compile.

Proposed fix:

-      let parsed = Fixer::parse(fix, &env, &self.transform).map_err(RuleCoreError::Fixer)?;
+      let parsed = Fixer::parse(fix, &env, &self.transform)
+        .map_err(|e| RuleConfigError::Core(RuleCoreError::Fixer(e)))?;

This keeps the conversion localized and preserves the declared return type.


131-149: 🛠️ Refactor suggestion

insert_rewriter panics on duplicate IDs instead of reporting an error

reg.insert_rewriter(&val.id, rewriter).expect("should work");

If a user accidentally re‑uses a rewriter ID, the code will panic, aborting the whole run.
Because you already have a dedicated RuleConfigError::DuplicateRule (via ReferentRuleError), consider propagating that error instead of crashing:

-      reg.insert_rewriter(&val.id, rewriter).expect("should work");
+      reg
+        .insert_rewriter(&val.id, rewriter)
+        .map_err(|e| RuleConfigError::Rewriter(e, val.id.clone()))?;

This makes the behaviour consistent with the other insert_* helpers and avoids unexpected panics during configuration loading.

crates/config/src/rule/referent_rule.rs (1)

31-45: ⚠️ Potential issue

Unsafe interior‑mutability on shared Arc<HashMap> risks data races

Registration::write transmutes an &Arc<HashMap<…>> into a mutable &mut HashMap<…> via raw pointer cast.
With the new public GlobalRules::insert function, external callers can now mutate a GlobalRules instance after it has been shared, opening the door to UB if the map is accessed concurrently elsewhere.

Consider replacing this hand‑rolled pattern with a safe concurrent map (dashmap, parking_lot::RwLock<HashMap<…>>, or std::sync::RwLock) to keep mutation fully synchronized and sound.

crates/core/src/ops.rs (1)

209-218: ⚠️ Potential issue

Avoid mutating the caller’s environment inside Not — clone first

Not::match_node_with_env receives env by mutable reference and passes it directly to the inner matcher.
If the inner matcher succeeds, you later return None (because of the XOR), yet all metavariable bindings that were added by the inner matcher remain in env.
These stale bindings “leak” outwards and can pollute subsequent matcher evaluation (e.g. when Not is used inside All, Op::every, etc.).

-    self
-      .not
-      .match_node_with_env(node.clone(), env)
-      .xor(Some(node))
+    // work on a copy so that any bindings produced by `self.not`
+    // are discarded when it matches.
+    let mut shadow_env = Cow::Borrowed(env.as_ref());
+    let matched = self.not.match_node_with_env(node.clone(), &mut shadow_env).is_some();
+    if matched {
+      None
+    } else {
+      Some(node)
+    }

This preserves the intended boolean semantics while guaranteeing that env is only mutated when the Not pattern fails (i.e. when we actually return Some(node)).

crates/config/src/check_var.rs (1)

147-149: ⚠️ Potential issue

Incorrect reference level leads to mismatched‐type compilation error

var is already an &str. Passing &var therefore hands a &&str to HashSet<&str>::contains, which expects &str. This currently compiles only thanks to an (unstable) blanket Borrow implementation and can easily break, or trigger confusing error messages if trait bounds change.

-    if !vars.contains(&var) {
+    if !vars.contains(var) {

A direct reference keeps the type signatures clean and avoids double indirection.

crates/config/src/rule_core.rs (1)

187-192: ⚠️ Potential issue

get_env can fabricate an environment for any language — risks runtime incoherence

After dropping the generic parameter from RuleCore, the method

pub fn get_env<L: Language>(&self, lang: L) -> DeserializeEnv<L>

allows callers to request a DeserializeEnv for any Language, even if the
RuleCore, its Rule, the registered utils/rewriters, and the Fixer
instances were all built for a different language.

That silently breaks the compile‑time guarantee we previously had and can
produce undefined matcher behaviour or panics at runtime when, for example,
Tree‑sitter kind‑ids no longer match.

Recommended approaches:

  1. Re‑introduce a language “marker” inside RuleCore, e.g.
phantom: std::marker::PhantomData<L>

and keep RuleCore generic over L, or

  1. Store the original language’s LanguageId (or equivalent) at construction
    time and assert equality inside get_env, returning an error otherwise.

Either restores soundness and prevents cross‑language misuse.

🧹 Nitpick comments (14)
crates/cli/src/run.rs (1)

115-128: Unnecessary allocation: avoid .clone() by making Strictness Copy

MatchStrictness is an enum and therefore already Copy.
If we derive Copy for the wrapper Strictness, the extra .clone() (heap‑free but still superfluous) disappears and the intent becomes clearer.

-  fn build_pattern(&self, lang: SgLang) -> Result<Pattern> {
...
-    if let Some(strictness) = &self.strictness {
-      Ok(pattern.with_strictness(strictness.0.clone()))
+  fn build_pattern(&self, lang: SgLang) -> Result<Pattern> {
...
+    if let Some(strictness) = &self.strictness {
+      Ok(pattern.with_strictness(strictness.0))

Add the derive:

 #[derive(Clone)]
-struct Strictness(MatchStrictness);
+#[derive(Clone, Copy)]
+struct Strictness(MatchStrictness);

Tiny win, but eliminates an avoidable method call every time the CLI builds a pattern.

crates/config/src/rule/stop_by.rs (1)

121-148: Public method lacks documentation & example usage

StopBy::find is a central control‑flow utility (“monster method” per the TODO) yet still undocumented after this refactor.
Readers now have to reverse‑engineer:

  • expected invariants for once / multi / finder
  • how StopBy variants change traversal semantics
  • lifetime constraints on 't

Please consider adding a doc‑comment with a short example and move the inline TODO to an issue tracker to avoid bit‑rot.

crates/config/src/rule/deserialize_env.rs (1)

175-189: Minor performance tweak: reuse DeserializeEnv instead of creating one per rule

Inside parse_global_utils a fresh DeserializeEnv is built for every utility rule even though all rules for a given lang share the same GlobalRules. Constructing it once and re‑using it avoids repeated RuleRegistration::from_globals allocations:

-    for id in order {
-      let (lang, core) = utils.get(id).expect("must exist");
-      let env = DeserializeEnv::new(lang.clone()).with_globals(&registration);
+    for (lang, group) in &utils_per_language {
+      let env = DeserializeEnv::new(lang.clone()).with_globals(&registration);
+      for (id, core) in group {
         let matcher = core.get_matcher_with_hint(env, CheckHint::Global)?;
         registration
           .insert(id, matcher)
           .map_err(RuleSerializeError::MatchesReference)?;
       }
     }

Not critical, but could save noticeable CPU on large rule‑sets.

crates/core/src/matcher.rs (2)

76-90: Cache the Pattern to avoid recompiling on every call

Every invocation of match_node_with_env / get_match_len on an &str causes a fresh Pattern::str allocation and language‑clone.
If the same pattern is used repeatedly (a common scenario), this churn can become noticeable.

-    let pattern = Pattern::str(self, node.lang().clone());
+    // cache the compiled pattern once per thread
+    thread_local! {
+        static COMPILED: std::cell::RefCell<
+            std::collections::HashMap<(String, std::any::TypeId), Pattern>
+        > = const { /* ... */ };
+    }
+    let pattern = get_or_insert_compiled(self, node.lang());

A simple once_cell::sync::Lazy or thread‑local hashmap keyed by (pattern_string, lang_type_id) would amortise the cost while keeping the public API intact.


131-139: Compute potential_kinds once in FindAllNodes

self.matcher.potential_kinds() is evaluated on every next() call, although the result is immutable for the lifetime of the iterator.

 pub struct FindAllNodes<'tree, D: Doc, M: Matcher> {
   dfs: Pre<'tree, D>,
   matcher: M,
+  kinds: Option<BitSet>,
 }
 
 impl<'tree, D: Doc, M: Matcher> FindAllNodes<'tree, D, M> {
   pub fn new(matcher: M, node: Node<'tree, D>) -> Self {
-    Self {
-      dfs: node.dfs(),
-      matcher,
-    }
+    let kinds = matcher.potential_kinds();
+    Self { dfs: node.dfs(), matcher, kinds }
   }
 }

This micro‑optimisation reduces repeated allocations / calculations during large searches.

crates/config/src/rule/mod.rs (1)

259-275: Redundant Ok() wrapper in verify_util

verify_util already returns Result<(), _>. Wrapping it in another Ok(..) adds an unnecessary layer.

-      Rule::Matches(r) => Ok(r.verify_util()?),
+      Rule::Matches(r) => r.verify_util(),

Same behaviour, cleaner code.

crates/config/src/rule_config.rs (1)

118-127: Minor: redundant clone of DeserializeEnv

self.core.get_matcher(env.clone())? consumes env.clone() while the original env is not reused later.
DeserializeEnv is cheap, but you can remove the extra allocation:

-    let rule = self.core.get_matcher(env.clone())?;
+    let rule = self.core.get_matcher(env)?;

This also removes the need for env to implement Clone here.

crates/config/src/rule/relational_rule.rs (1)

184-195: try_new still generic over L even though the struct is language‑agnostic

Precedes::try_new, Follows::try_new, Has::try_new, and Inside::try_new now only use DeserializeEnv<L> to deserialize the inner rule. The surrounding refactor removes language generics elsewhere; you could finish the clean‑up by changing the signature to accept &DeserializeEnv without <L> and erase the type parameter entirely:

-  pub fn try_new<L: Language>(relation: Relation, env: &DeserializeEnv<L>)
+  pub fn try_new(relation: Relation, env: &DeserializeEnv)

This keeps the public API surface minimal and consistent with other de‑genericised types.

crates/core/src/ops.rs (1)

128-150: Minor: simplify the Any search loop

Inside the Any matcher you recreate new_env for every iteration.
Instead, allocate it once and clone only when a candidate pattern matches; this avoids a needless Cow re‑borrow on every miss.

-    let mut new_env = Cow::Borrowed(env.as_ref());
-    let found = self.patterns.iter().find_map(|p| {
-      new_env = Cow::Borrowed(env.as_ref());
-      p.match_node_with_env(node.clone(), &mut new_env)
-    });
+    let mut scratch = Cow::Borrowed(env.as_ref());
+    let found = self.patterns.iter().find_map(|p| {
+      scratch.to_mut().clear();               // restore to original each round
+      p.match_node_with_env(node.clone(), &mut scratch)
+    });

Not critical, but this gives a tiny perf win and makes the control‑flow clearer.

crates/config/src/rule/nth_child.rs (1)

206-230: Avoid allocating a full Vec when locating the index

find_index currently builds a Vec of all siblings only to call .position().
This creates unnecessary heap traffic when traversing large ASTs.

-    let mut children: Vec<_> = if let Some(rule) = &self.of_rule {
-      /* … */
-    } else {
-      /* … */
-    };
-    if self.reverse {
-      children.reverse()
-    }
-    children
-      .iter()
-      .position(|child| child.node_id() == node.node_id())
+    let iter = parent.children().filter(|n| n.is_named());
+    let iter = if let Some(rule) = &self.of_rule {
+      iter.filter(|child| rule.match_node_with_env(*child, env).is_some())
+    } else {
+      iter
+    };
+    let iter: Box<dyn Iterator<Item = _>> = if self.reverse {
+      Box::new(iter.collect::<Vec<_>>().into_iter().rev())
+    } else {
+      Box::new(iter)
+    };
+    iter.enumerate()
+        .find(|(_, child)| child.node_id() == node.node_id())
+        .map(|(idx, _)| idx)

This keeps the algorithm O(n) but removes the intermediate allocation (and a potential full reverse of the vector).

crates/config/src/fixer.rs (2)

139-150: matcher should be passed by reference to avoid an extra move

get_replaced_range currently takes matcher: impl Matcher by value.
Callers have to pass an owned matcher, causing an extra move (and possibly a clone if they want to reuse it).

-  fn get_replaced_range(&self, nm: &NodeMatch<D>, matcher: impl Matcher) -> Range<usize> {
+  fn get_replaced_range(
+    &self,
+    nm: &NodeMatch<D>,
+    matcher: &impl Matcher,
+  ) -> Range<usize> {

All call‑sites inside the crate already have a reference handy, so this change is API‑compatible but slightly more ergonomic.


41-59: Remove the lingering generic on Expansion::parse

Expansion itself is now language‑agnostic, yet parse still carries <L: Language> only to thread DeserializeEnv<L>.
Since DeserializeEnv already encapsulates the language, the extra type parameter is no longer needed and hurts readability.

-  fn parse<L: Language>(
+  fn parse(
     relation: &Maybe<Relation>,
-    env: &DeserializeEnv<L>,
+    env: &DeserializeEnv<impl Language>,
   ) -> Result<Option<Self>, FixerError> {

Or simply accept a generic E: ?Sized if you want to stay completely agnostic.

crates/config/src/check_var.rs (1)

21-28: Unnecessary double‑indirection on Fixer argument

fixer is passed as &Option<Fixer>.
All call‑sites already own an Option<Fixer> that they borrow, so the extra & buys no benefit and forces every callee to peel two levels of indirection.

Consider changing the signature (and the corresponding callee sites) to accept Option<&Fixer>:

-pub fn check_rule_with_hint<'r>(
-  ...
-  fixer: &Option<Fixer>,
+pub fn check_rule_with_hint<'r>(
+  ...
+  fixer: Option<&'r Fixer>,

This removes one layer of reference and makes intent ( “maybe there is a fixer, and if so borrow it” ) explicit.

crates/config/src/rule_core.rs (1)

140-148: Derive Debug/Clone for easier diagnostics

Now that RuleCore is non‑generic, deriving common traits no longer risks
excessive monomorphisation. Adding at least #[derive(Clone, Debug)] eases
unit‑test authoring and ad‑hoc logging.

- pub struct RuleCore {
+ #[derive(Clone, Debug)]
+ pub struct RuleCore {

Doing so has zero runtime cost and improves maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac00a62 and 1fa5969.

📒 Files selected for processing (37)
  • crates/cli/src/config.rs (3 hunks)
  • crates/cli/src/lang/injection.rs (1 hunks)
  • crates/cli/src/print/interactive_print.rs (3 hunks)
  • crates/cli/src/print/mod.rs (1 hunks)
  • crates/cli/src/run.rs (4 hunks)
  • crates/cli/src/utils/debug_query.rs (1 hunks)
  • crates/cli/src/utils/mod.rs (2 hunks)
  • crates/config/src/check_var.rs (7 hunks)
  • crates/config/src/fixer.rs (8 hunks)
  • crates/config/src/lib.rs (2 hunks)
  • crates/config/src/rule/deserialize_env.rs (5 hunks)
  • crates/config/src/rule/mod.rs (7 hunks)
  • crates/config/src/rule/nth_child.rs (5 hunks)
  • crates/config/src/rule/range.rs (5 hunks)
  • crates/config/src/rule/referent_rule.rs (12 hunks)
  • crates/config/src/rule/relational_rule.rs (9 hunks)
  • crates/config/src/rule/stop_by.rs (5 hunks)
  • crates/config/src/rule_config.rs (5 hunks)
  • crates/config/src/rule_core.rs (12 hunks)
  • crates/config/src/transform/mod.rs (2 hunks)
  • crates/config/src/transform/rewrite.rs (5 hunks)
  • crates/core/src/lib.rs (1 hunks)
  • crates/core/src/match_tree/mod.rs (4 hunks)
  • crates/core/src/matcher.rs (9 hunks)
  • crates/core/src/matcher/kind.rs (3 hunks)
  • crates/core/src/matcher/node_match.rs (1 hunks)
  • crates/core/src/matcher/pattern.rs (12 hunks)
  • crates/core/src/matcher/text.rs (2 hunks)
  • crates/core/src/meta_var.rs (1 hunks)
  • crates/core/src/node.rs (2 hunks)
  • crates/core/src/ops.rs (13 hunks)
  • crates/core/src/replacer.rs (1 hunks)
  • crates/core/src/traversal.rs (3 hunks)
  • crates/napi/src/doc.rs (1 hunks)
  • crates/napi/src/find_files.rs (2 hunks)
  • crates/pyo3/src/py_node.rs (2 hunks)
  • xtask/src/schema.rs (0 hunks)
💤 Files with no reviewable changes (1)
  • xtask/src/schema.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/config/src/transform/rewrite.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/config/src/lib.rs
🧰 Additional context used
🧬 Code Graph Analysis (7)
crates/core/src/replacer.rs (1)
crates/config/src/fixer.rs (1)
  • get_replaced_range (138-150)
crates/core/src/lib.rs (3)
crates/pyo3/src/py_node.rs (1)
  • replace (255-268)
crates/core/src/node.rs (1)
  • replace (523-527)
crates/napi/src/sg_node.rs (1)
  • replace (378-386)
crates/cli/src/print/interactive_print.rs (3)
crates/config/src/lib.rs (1)
  • from_str (26-29)
crates/config/src/fixer.rs (1)
  • from_str (115-122)
crates/napi/src/doc.rs (1)
  • from_str (163-165)
crates/cli/src/utils/mod.rs (3)
crates/pyo3/ast_grep_py/__init__.py (1)
  • Pattern (8-11)
crates/core/src/node.rs (2)
  • lang (84-86)
  • lang (246-248)
crates/core/src/language.rs (1)
  • ast_grep (56-58)
crates/config/src/rule/deserialize_env.rs (1)
crates/config/src/rule/mod.rs (1)
  • deserialize_rule (370-390)
crates/core/src/match_tree/mod.rs (1)
crates/pyo3/ast_grep_py/__init__.py (1)
  • Pattern (8-11)
crates/config/src/rule_config.rs (2)
crates/config/src/rule_core.rs (3)
  • get_matcher (117-119)
  • get_matcher (283-287)
  • get_fixer (90-97)
crates/pyo3/src/py_node.rs (1)
  • get_matcher (338-354)
🔇 Additional comments (52)
crates/cli/src/lang/injection.rs (1)

37-37: Simplification of type signature by removing language generic parameter

The change from Vec<(RuleCore<SgLang>, Option<String>)> to Vec<(RuleCore, Option<String>)> is appropriate and aligns with the broader refactoring effort to make core types language-agnostic. This simplification helps reduce the complexity of the type system while maintaining the same functionality.

crates/core/src/matcher/node_match.rs (1)

55-55: Simplified trait bound by removing language-specific constraint

The change from M: Matcher<D::Lang> to M: Matcher is a good simplification that makes the API more flexible. This relaxation of the trait bound allows the method to accept any matcher implementation without tying it to the specific document language type, which aligns with the overall refactoring goal.

crates/napi/src/doc.rs (1)

32-32: Simplified return type by removing language generic parameter

The change from NapiResult<RuleCore<NapiLang>> to NapiResult<RuleCore> is a good simplification that aligns with the broader refactoring effort to make RuleCore language-agnostic. The function body works correctly with this change since the language-specific information is now handled at a different level.

crates/core/src/meta_var.rs (1)

112-112: Simplified trait bound by removing language-specific constraint

The change from M: Matcher<D::Lang> to M: Matcher appropriately relaxes the trait bound on the generic parameter, allowing any matcher implementation to be used without requiring a specific language type. This change is consistent with the broader refactoring goal of separating language-specific concerns from core functionality.

crates/cli/src/print/interactive_print.rs (3)

369-369: Simplified function signature by removing language generic parameter.

The function signature for make_diffs has been simplified to take a generic impl Matcher rather than impl Matcher<SgLang>, aligning with the broader refactoring effort to remove explicit language generic parameters from the codebase. This makes the API more flexible and less coupled to specific language types.


417-418: Added explicit type annotation for Fixer::from_str.

Now that the Fixer type is no longer generic over language, the from_str method requires an explicit type parameter annotation to specify the language. This change is consistent with the refactoring of separating the core language abstractions.


431-432: Added explicit type annotation for Fixer::from_str.

Similar to the previous instance, the language type parameter is now explicitly specified for Fixer::from_str. This pattern is applied consistently throughout the codebase as part of the refactoring effort.

crates/cli/src/utils/debug_query.rs (1)

20-20: Simplified method signature by making Pattern non-generic.

The debug_pattern method now takes a non-generic &Pattern instead of &Pattern<SgLang>, aligning with the broader effort to simplify type signatures by removing language generics. The language parameter is still passed separately, maintaining the functionality while making the API more flexible.

crates/cli/src/print/mod.rs (1)

74-74: Simplified method signature by removing language generic parameter.

The Diff::generate method now takes &impl Matcher and &Fixer without explicit language type parameters, consistent with the broader refactoring effort to decouple core abstractions from language-specific details. This change makes the code more maintainable and the API more flexible.

crates/core/src/lib.rs (1)

52-52: Simplified method signature by removing language generic constraint.

The replace method on AstGrep now only requires M: Matcher without binding it to a specific language type. This is part of the broader refactoring that separates core language abstractions from tree-sitter-specific functionality, making the API more flexible and easier to extend.

The change maintains the same functionality while reducing type complexity, aligning perfectly with the PR objectives of abstracting the parser implementation.

crates/core/src/traversal.rs (3)

77-77: Simplified matcher trait bound.

The change removes the language-specific generic parameter from the Matcher trait bound, making matchers language-agnostic. This aligns with the broader refactoring goal of separating core functionality from language-specific implementations.


101-101: Consistent trait bound update for Visit struct.

Updated the Matcher trait bound to be language-agnostic, matching the change in the Visitor::visit method. This ensures consistency throughout the codebase.


115-115: Iterator implementation updated with simplified trait bound.

Continued the pattern of removing language specificity from the Matcher trait bound in the Iterator implementation for Visit. This completes the decoupling of traversal functionality from language-specific implementation details.

crates/pyo3/src/py_node.rs (2)

342-342: Simplified RuleCore return type.

Removed the language generic parameter from RuleCore, making it language-agnostic. This simplifies the type signature while maintaining the same functionality, and aligns with the broader refactoring effort across the codebase.


372-372: Consistent type signature update for helper function.

Updated the return type of get_matcher_from_rule to use the non-generic RuleCore type, maintaining consistency with the changes in the SgNode::get_matcher method. This ensures that the Python bindings use the same simplified type structure.

crates/core/src/replacer.rs (1)

22-22: Simplified Matcher trait bound in Replacer trait.

Updated the get_replaced_range method to accept a language-agnostic Matcher implementation rather than one tied to a specific language. This change aligns with the trait bound simplifications throughout the codebase and reduces coupling between replacers and language-specific implementations.

crates/config/src/transform/mod.rs (2)

51-51: Removed language generic parameter from RuleCore.

Updated the function signature to use the language-agnostic RuleCore type in the rewriters map, simplifying the type system while maintaining the same functionality. This is consistent with the broader refactoring effort.


75-75: Updated Ctx struct with simplified RuleCore type.

Changed the type of the rewriters field in the Ctx struct to use the non-generic RuleCore type, maintaining consistency with the changes in the apply_transform method. This ensures that the transformation context uses the same simplified type structure.

crates/core/src/matcher/kind.rs (5)

25-27: Refactoring to make KindMatcher non-generic

The KindMatcher struct has been refactored to remove the generic language parameter, making it non-generic. This simplifies the type signature and aligns with the broader codebase refactoring effort.


29-36: Appropriate method refactoring with explicit language passing

Good adaptation of the constructor method to accept the language parameter explicitly rather than through a generic type parameter. This approach maintains the functionality while improving type simplicity.


38-45: Well-preserved error handling with simplified type signature

The error handling in try_new is properly maintained while adapting to the new non-generic struct design. The function still creates a KindMatcher instance and performs the necessary validation.


47-49: Simplified struct initialization

The from_id method has been correctly updated to remove the phantom data initialization, properly aligning with the non-generic struct definition.


78-89: Updated Matcher trait implementation for non-generic struct

The Matcher trait implementation has been appropriately updated to reflect the non-generic nature of KindMatcher, maintaining the same functionality while simplifying the interface.

crates/cli/src/utils/mod.rs (3)

147-152: Simplified Pattern type by removing language generic parameter

The function signature has been updated to work with non-generic Pattern types rather than Pattern<SgLang>, while maintaining the language context through explicit parameters. This aligns with the broader refactoring effort.


155-165: Adapted closure to use non-generic Pattern

The do_match closure correctly adapts to use the non-generic Pattern type while maintaining all existing functionality with fixed string checks and proper MatchUnit construction.


192-196: Generalized MatchUnit to accept any Matcher implementation

The MatchUnit struct now accepts any type implementing the Matcher trait rather than being tied specifically to Matcher<SgLang>. This improves flexibility and reusability.

crates/core/src/matcher/text.rs (4)

3-4: Simplified imports after removing Language generic parameter

The import statement has been appropriately updated to remove the unused Language import, which is no longer needed after the refactoring to make RegexMatcher non-generic.


18-20: Refactored RegexMatcher to be non-generic

The RegexMatcher struct has been simplified by removing the language generic parameter and associated phantom data, making it a non-generic struct that only stores the regex pattern.


22-28: Simplified implementation block for RegexMatcher

The implementation block for RegexMatcher has been updated to remove the language generic parameter, which is appropriate given the struct is now non-generic.


30-37: Updated Matcher trait implementation for non-generic struct

The Matcher trait implementation has been correctly updated to reflect the non-generic nature of RegexMatcher, with appropriate updates to the method signatures while preserving the matching functionality.

crates/cli/src/config.rs (3)

131-160: Simplified GlobalRules type by removing language generic parameter

The find_util_rules function now returns a non-generic GlobalRules type instead of GlobalRules<SgLang>. The language parameter is now specified explicitly at the call site with DeserializeEnv::<SgLang>::parse_global_utils, which properly preserves the language-specific parsing.


162-166: Updated read_directory_yaml to use non-generic GlobalRules

The global_rules parameter type has been updated to non-generic GlobalRules, which aligns with the broader refactoring effort while maintaining the same functionality.


220-223: Updated read_rule_file to use non-generic GlobalRules

The global_rules parameter type has been simplified to Option<&GlobalRules>, removing the explicit language generic parameter while preserving the function's behavior.

crates/config/src/rule/range.rs (4)

1-1: Appropriate import update for the refactored language traits

The import statement has been updated to reference the core modules without language generic parameters, aligning with the trait separation refactoring.


40-43: Successfully removed language generic parameter

The RangeMatcher struct has been simplified by removing the generic language parameter and the associated PhantomData<L> field. This reduces type complexity while maintaining all functionality.


68-69: Clean adaptation of Matcher trait implementation

The Matcher trait implementation has been correctly updated to match the new non-generic trait definition. The method signature now uses <'tree, D: Doc> directly in the method rather than via a struct generic parameter.


104-110: Test code properly updated

The test code has been correctly updated to use the non-generic RangeMatcher, demonstrating that the refactoring doesn't affect the usage pattern in tests.

crates/napi/src/find_files.rs (2)

143-144: Simplified FindInFiles type alias

The FindInFiles type alias has been correctly updated to use the non-generic RuleCore type, aligning with the broader refactoring of removing language generic parameters from core types.


208-212: Function signature properly updated

The call_sg_node function signature has been correctly updated to use the non-generic RuleCore type in its parameter list, maintaining consistency with the type alias changes.

crates/core/src/matcher/pattern.rs (7)

1-1: Proper import of both language traits

The import statement correctly includes both the new CoreLanguage and the existing Language trait, reflecting the trait hierarchy refactoring where core parsing functionality has been separated.


15-19: Successful conversion to non-generic Pattern struct

The Pattern struct has been refactored to remove the generic language parameter, simplifying the type system. The PhantomData<L> field has been removed while maintaining all necessary functionality.


139-142: Method-level generic parameters appropriately added

The str method now includes a generic parameter <L: Language> at the method level rather than at the struct level. This provides better flexibility and clearer, more targeted type constraints.


197-199: Consistent approach to method-level generics

The try_new method continues the pattern of adding generic parameters at the method level, maintaining consistency across the implementation.


221-225: Contextual pattern creation correctly handled

The contextual method handles the language parameter appropriately, including proper cloning where needed. The generic parameters have been moved from the struct level to the method level without changing the functionality.


257-258: Matcher trait implementation properly updated

The Matcher trait implementation has been correctly updated to match the new non-generic trait definition, with method-level generic parameters instead of struct-level ones.


515-515: Memory size assertion updated

The memory size assertion in the test has been appropriately updated to reflect the structural changes in the Pattern type. This ensures that the size reduction from removing the PhantomData field is properly validated.

crates/core/src/match_tree/mod.rs (4)

47-47: Function signature appropriately updated

The match_end_non_recursive function signature has been updated to use the non-generic Pattern type, maintaining consistency with the Pattern struct changes.


112-116: Clean function signature update

The match_node_non_recursive function signature has been correctly updated to use the non-generic Pattern type, ensuring consistency across the codebase.


153-163: Test helper function properly updated

The find_node_recursive test helper function has been correctly updated to use the non-generic Pattern type, ensuring that tests continue to function as expected with the refactored type system.


300-300: Consistent test helper update

The find_end_recursive test helper function has been correctly updated to match the pattern of using the non-generic Pattern type, maintaining consistency across all related functions.

crates/core/src/node.rs (1)

305-324: Public API clarification: value vs reference for Matcher

matches / inside / has / precedes / follows now take M by value.
Existing user code that previously called:

let pat = Pattern::new("foo", lang);
node.matches(&pat);

continues to compile because &Pattern implements Matcher.
However, passing large matcher objects by value could lead to unintended moves:

let pat = make_heavy_pattern();
if node.matches(pat) {        // pat is moved here
    /* pat is now unusable */
}

Consider documenting this nuance (or switching the parameter to impl Matcher + ?Sized with &M) to avoid breaking expectations.

crates/config/src/rule/nth_child.rs (1)

165-169: Edge case: modulo with negative step_size may violate CSS spec

n % step_size where step_size is negative keeps the sign of the dividend in Rust.
Consequently, for step_size = -2 and n = 1, n % step_size == 1, which means 1 % -2 == 1 (non‑zero) and the match fails.
However, CSS’s definition treats “‐2n+1” as matching the first element.

A safer, sign‑agnostic check is:

let step = *step_size;
let delta = index - offset;
delta.signum() == step.signum() && delta % step == 0

Please cross‑check against the W3C micro‑syntax to ensure negative‑step selectors behave correctly.

@HerringtonDarkholme HerringtonDarkholme force-pushed the feat-non-ts branch 3 times, most recently from 2937132 to d02ddb9 Compare April 20, 2025 17:48
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
crates/core/src/meta_var.rs (1)

37-44: 🛠️ Refactor suggestion

Returning Option<&mut Self> complicates ergonomics

insert currently yields Some(self) on success to enable chaining, but exposes None on failure.
Most callers either (a) ignore the result or (b) only care about the boolean success status.

A simpler, more idiomatic signature would be:

-pub fn insert(&mut self, id: &str, ret: N) -> Option<&mut Self>
+pub fn insert(&mut self, id: &str, ret: N) -> bool

This:

  • avoids exposing a second mutable reference (preventing accidental misuse),
  • mirrors HashMap::insert ergonomics (bool or Option<Old>),
  • still allows chaining via if env.insert(...){ /* ... */ }.

The same observation applies to insert_multi.

🧹 Nitpick comments (4)
crates/core/src/meta_var.rs (4)

20-25: Derive Debug and consider Default for easier diagnostics

SgMetaVarEnv is a core container that frequently travels across APIs. Deriving Debug (and possibly Default) will:

  • allow println!("{:?}", env) during debugging/tests,
  • avoid manual boiler-plate when the struct evolves.
-#[derive(Clone)]
+#[derive(Clone, Debug)]
 pub struct SgMetaVarEnv<'tree, N: SgNode<'tree>, C> {

71-73: API inconsistency between get_labels and get_multiple_matches

After fixing the allocation issue above, get_multiple_matches will return &[N] while get_labels returns Option<&Vec<N>>.

For uniformity and clarity, consider aligning get_labels to the same slice-based signature:

-pub fn get_labels(&self, label: &str) -> Option<&Vec<N>> {
-  self.multi_matched.get(label)
+pub fn get_labels(&self, label: &str) -> Option<&[N]> {
+  self.multi_matched.get(label).map(Vec::as_slice)
 }

This prevents accidental mutations through the shared &Vec reference and reduces cognitive overhead.


151-167: Potential mutable-aliasing in match_constraints

Cow::Borrowed(self) is created while you still iterate over &self.single_matched.
If m.match_node_with_env mutates the environment, Cow will be upgraded to Owned, cloning the maps — safe but potentially expensive.
Two suggestions:

  1. Cheap path optimisation
    Check if var_matchers.is_empty() early and short-circuit.

  2. Avoid cloning when mutation is unlikely
    Replace the Cow strategy with a two-phase approach: first read-only check, then mutate only the variables that passed.

This keeps semantics intact while sidestepping the hidden allocation cost.


126-139: insert_transformation unnecessarily copies slices

formatted_slice(...).to_vec() allocates even when the formatted slice is already correctly aligned.
If the intent is to avoid copying when formatting is a no-op, you can store a Cow<'_, [u8]> instead of an owned Vec<u8> inside transformed_var.
This change:

  • eliminates redundant allocations,
  • is backwards-compatible for immutable readers.

Example diff (simplified):

-  transformed_var: HashMap<MetaVariableID, C>,
+  transformed_var: HashMap<MetaVariableID, Cow<'tree, [u8]>>,

and update call-sites accordingly.

If zero-copy isn’t achievable, at least add a comment clarifying why allocation is unavoidable.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc9619b and 1ab3183.

📒 Files selected for processing (2)
  • crates/core/src/meta_var.rs (6 hunks)
  • crates/core/src/node.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/core/src/node.rs
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: coverage
🔇 Additional comments (1)
crates/core/src/meta_var.rs (1)

16-16: Clarify or remove open question comment

The inline question (“why we need one more content?”) hints at an unresolved design doubt.
Leaving “why” comments in production code is a maintenance smell that quickly becomes stale. Please either:

  1. Convert it into a proper TODO(issue #) with an actionable description, or
  2. Replace it with a short explanation / reference clarifying the extra Content parameter’s purpose.

[ suggest_nitpick ]

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (5)
crates/config/src/transform/transformation.rs (1)

79-83: ⚠️ Potential issue

unwrap() on invalid regex can panic at runtime

Replace::<MetaVariable>::compute will call Regex::new(&self.replace).unwrap().
An end-user typo in the YAML config would therefore crash the whole run instead of producing a graceful error.

- let re = Regex::new(&self.replace).unwrap();
+ let re = match Regex::new(&self.replace) {
+   Ok(r) => r,
+   Err(e) => {
+     log::error!("Invalid regex in replace transform: {e}");
+     return None; // or propagate as a TransformError if you prefer
+   }
+ };

Alternatively, compile the regex once during parse(); that turns a user error into a configuration-time failure and avoids recompiling on every match.

crates/core/src/replacer.rs (1)

21-36: 🛠️ Refactor suggestion

Avoid unnecessary clone() in get_replaced_range

matcher.get_match_len(nm.get_node().clone()) clones the entire node even though get_match_len should only need a reference for read-only access. Cloning large sub-trees can be expensive when replacements are executed in bulk.

-    if let Some(len) = matcher.get_match_len(nm.get_node().clone()) {
+    if let Some(len) = matcher.get_match_len(nm.get_node()) {

You will need to adjust the Matcher::get_match_len signature to take impl Borrow<N> or &N to make this compile.

crates/config/src/transform/rewrite.rs (1)

46-86: ⚠️ Potential issue

Edge-case: returning empty string when no edits produced

If self.join_by is None and the find_and_make_edits call returns an empty vector, make_edit will receive an empty edit list and return a byte-for-byte copy of bytes, so all good.
However, when join_by is present but no edit matches any sub-node, rewritten becomes the empty ret vector created in the if let Some(joiner) arm (lines 62-81). The function then encodes that empty vector and returns Some(String::new()), silently dropping the original content.

Consider short-circuiting when edits.is_empty():

if edits.is_empty() {
  return None; // or Some(original) depending on desired semantics
}

Failing to do so may produce surprising, destructive rewrites.

crates/core/src/matcher/node_match.rs (1)

82-101: ⚠️ Potential issue

Trait impls on the type alias will not compile

impl From<NodeMatch<'tree, D>> … and impl Borrow<…> for NodeMatch<'tree, D> try to implement traits for the alias, which is currently disallowed on stable Rust (E0215).

Replace the impl targets with the concrete type:

-impl<'tree, D: Doc> From<NodeMatch<'tree, D>> for Node<'tree, D> {
+impl<'tree, D: Doc>
+  From<SgNodeMatch<
+        'tree,
+        Node<'tree, D>,
+        Vec<<<D as Doc>::Source as Content>::Underlying>>> for Node<'tree, D>
 {
   fn from(node_match: /* … */) -> Self { node_match.0 }
 }

-impl<'tree, D: Doc> Borrow<Node<'tree, D>> for NodeMatch<'tree, D> {
+impl<'tree, D: Doc>
+  Borrow<Node<'tree, D>> for SgNodeMatch<
+        'tree,
+        Node<'tree, D>,
+        Vec<<<D as Doc>::Source as Content>::Underlying>>
 {
   fn borrow(&self) -> &Node<'tree, D> { &self.0 }
 }

(or, alternatively, resurrect NodeMatch as a new-type wrapper instead of a type alias).

Without this fix the crate will fail to compile on stable/stable-beta channels.

crates/core/src/ops.rs (1)

209-218: ⚠️ Potential issue

Not may leave stray bindings in env

When the inner pattern matches (so Not should fail), any metavariables it bound remain in env, because the original environment is passed directly.
This behaviour diverges from the logical expectation that a failed Not leaves no trace.

Please take the same “speculative env” approach used in Or / Any:

-    self
-      .not
-      .match_node_with_env(node.clone(), env)
-      .xor(Some(node))
+    let mut new_env = Cow::Borrowed(env.as_ref());
+    let matched = self.not.match_node_with_env(node.clone(), &mut new_env);
+    if matched.is_some() {
+      // inner matched → NOT fails, discard changes
+      None
+    } else {
+      // inner failed → NOT succeeds, keep original env
+      Some(node)
+    }
🧹 Nitpick comments (18)
crates/core/src/match_tree/mod.rs (2)

14-23: Pass slices instead of Vec to avoid needless allocations

match_ellipsis receives nodes: Vec<N>, which forces every caller to allocate a fresh vector even when it only needs read-only access. Switching the parameter to &[N] (or an iterator) removes this cost and avoids one more clone when the nodes are already materialised elsewhere.

-  fn match_ellipsis(
-    &mut self,
-    var: Option<&str>,
-    nodes: Vec<N>,
-    skipped_anonymous: usize,
-  ) -> Option<()>;
+  fn match_ellipsis(
+    &mut self,
+    var: Option<&str>,
+    nodes: &[N],
+    skipped_anonymous: usize,
+  ) -> Option<()>;

All three implementors only read the slice (they call last() or drain), so the change is mechanical and backwards-compatible with the public API introduced in this PR.


119-140: Potential stack overflow on extremely deep trees

does_node_match_exactly is implemented recursively without tail-call optimisation. Real-world code normally keeps depth in check, but artificial or adversarial inputs (e.g. deeply nested parentheses) can overflow the stack.

If you want extra robustness, consider an explicit stack (Vec) or a try_fold style loop:

let mut stack = vec![(goal.clone(), candidate.clone())];
while let Some((g, c)) = stack.pop() {
  if g.node_id() == c.node_id() { continue; }
  if g.is_named_leaf() || c.is_named_leaf() {
    if g.text() != c.text() { return false; }
    continue;
  }
  if g.kind_id() != c.kind_id() { return false; }
  let (mut gc, mut cc) = (g.children(), c.children());
  if gc.len() != cc.len() { return false; }
  stack.extend(gc.zip(cc));
}
true

It keeps behaviour identical while capping stack use.

crates/config/src/rule/range.rs (1)

68-88: Minor performance nit: avoid repeated UTF-8 column computations

match_node_with_env calls column(&node) twice (start and end). column walks the UTF-8 string from the start each time, so on large files this adds up. Caching the result once per node is cheaper:

- if self.start.column != node_start_pos.column(&node)
-   || self.end.column != node_end_pos.column(&node)
+ let (start_col, end_col) = (node_start_pos.column(&node), node_end_pos.column(&node));
+ if self.start.column != start_col || self.end.column != end_col

Very small, but essentially free.

crates/config/src/transform/transformation.rs (3)

13-20: Double allocation when converting bytes to String

get_text_from_env first materialises the whole byte slice, then encode_bytes returns a Cow<str> that is immediately turned into an owned String. For large captures this allocates twice.

You can request an owned string directly from the content API (if available) or reuse the existing buffer:

- let bytes = ctx.env.get_var_bytes(var)?;
- Some(<<N::Doc as Doc>::Source as Content>::encode_bytes(bytes).into_owned())
+ let bytes = ctx.env.get_var_bytes(var)?;
+ Some(
+   <<N::Doc as Doc>::Source as Content>::encode_bytes(bytes)
+     .into_owned() // still one allocation; unavoidable without API change
+ )

If encode_bytes already returns an owned String (instead of Cow), drop the into_owned() call altogether.


38-48: Substring::compute creates an intermediate Vec<char> – consider a slice-based approach

Building chars: Vec<_> = text.chars().collect() allocates O(n) memory and copies every Unicode scalar value. You can achieve the same with char_indices() and byte slicing without the extra heap:

let start = resolve_char(&self.start_char, 0, len);
let end = resolve_char(&self.end_char, len, len);
let bytes = text
  .char_indices()
  .map(|(i, _)| i)
  .chain(std::iter::once(text.len()))
  .collect::<Vec<_>>();
let slice = &text[bytes[start]..bytes[end]];
Some(slice.to_owned())

This avoids the allocation and keeps complexity linear.


172-181: Avoid redundant allocation when storing transformed bytes

decode_str(&s).to_vec() copies the entire string into a new Vec<u8>. If the downstream API accepts a Cow<[u8]>, you can save one allocation:

- let bytes = if let Some(s) = opt {
-   <<N::Doc as Doc>::Source as Content>::decode_str(&s).to_vec()
+ let bytes = if let Some(s) = opt {
+   <<N::Doc as Doc>::Source as Content>::decode_str(&s).into_owned()
 } else {
   vec![]
 };

Not critical, but helps when transforms produce large blobs.

crates/core/src/replacer.rs (1)

48-53: Remove dead code or restore structural replacer

The Root<D> implementation is now commented out, but the corresponding mod structural; declaration (line 13) is still present. This leaves an empty module that is compiled but never used and may confuse future readers.

Either:

  1. Delete the commented block and remove the structural module entirely, or
  2. Re-enable the structural replacer and add #[cfg(feature = "structural_replace")] guards if it is planned for later.

Leaving commented code in main branches is generally discouraged.

crates/config/src/transform/rewrite.rs (2)

101-118: Double DFS may cause quadratic behaviour

replace_one performs for child in node.dfs() and invokes every rule on every descendant node. When multiple rewrites are chained (rules.len() * #descendants), this can become quadratic on large trees.

If rules are mutually exclusive or can be indexed by kind/text, consider:

• Pre-filtering descendants (child.kind_id() lookup)
• Short-circuiting the outer dfs() once a rule inside rules matches its ancestor to avoid visiting grandchildren that will be deleted anyway.

While correctness is intact, large files could show noticeable slow-downs.


67-67: Decode joiner once to avoid repeated UTF-8 work

let joiner = <N::Doc as Doc>::Source::decode_str(joiner); is called inside a loop even though joiner is constant. Move it outside the loop to save repeated allocations/decoding.

-        let joiner = <N::Doc as Doc>::Source::decode_str(joiner);
+        let joiner_bytes = <N::Doc as Doc>::Source::decode_str(joiner);
 ...
-          ret.extend_from_slice(&joiner);
+          ret.extend_from_slice(&joiner_bytes);
crates/core/src/replacer/template.rs (4)

41-50: generate_replacement allocates twice per fragment

decode_str is invoked on every fragment and the resulting Vec is immediately copied into ret via extend_from_slice, causing two heap allocations (decode_str → Vec, then extend → copy).

If fragments are sizeable or numerous, cache the decoded value:

-    ret.extend_from_slice(&<N::Doc as Doc>::Source::decode_str(frag));
+    let frag_bytes = <N::Doc as Doc>::Source::decode_str(frag);
+    ret.extend_from_slice(&frag_bytes);

Repeat the same change later in the loop (line 109).


114-155: maybe_get_var can use slice APIs to avoid indexing panics

Accessing nodes[0] and nodes[nodes.len() - 1] requires two bounds checks and will panic if the vector is empty (even though you guard for emptiness). The idiomatic pattern is .first() / .last():

-      let start = nodes[0].range().start;
-      let end = nodes[nodes.len() - 1].range().end;
-      let source = nodes[0].get_doc().get_source();
+      let start = nodes.first().unwrap().range().start;
+      let end = nodes.last().unwrap().range().end;
+      let source = nodes.first().unwrap().get_doc().get_source();

Same semantics, slightly safer & faster.


91-113: Minor: early-return for Textual branch simplifies control flow

Currently the function sets up variables and then immediately returns for the Textual case. An early return keeps the happy path shallow:

-  let template = match fixer {
-    TemplateFix::Textual(n) => return <N::Doc as Doc>::Source::decode_str(n).to_vec(),
-    TemplateFix::WithMetaVar(t) => t,
-  };
+  if let TemplateFix::Textual(src) = fixer {
+    return <N::Doc as Doc>::Source::decode_str(src).to_vec();
+  }
+  let TemplateFix::WithMetaVar(t) = fixer else { unreachable!() };

Purely stylistic, but improves readability.


158-164: Public helper leaks implementation detail

gen_replacement is a thin wrapper around TemplateFix. Exposing it perpetuates two APIs that must stay in sync. Consider marking it #[deprecated] and encouraging callers to create a TemplateFix directly, e.g.:

TemplateFix::try_new(template, nm.lang())?.generate_replacement(nm)

to avoid API surface bloat.

crates/core/src/matcher/pattern.rs (1)

517-518: Size assertion is now brittle after the PhantomData removal

Pattern no longer carries the PhantomData<L> field, so its memory layout has changed.
The ignored test still freezes the previous size (40) which is very likely wrong now and will break when the test is re-enabled.

-assert_eq!(std::mem::size_of::<Pattern>(), 40);
+// ⚠️ Update the expected value or, even better, remove the fragile
+//       size assertion – layout is an implementation detail.
+assert_eq!(std::mem::size_of::<Pattern>(), 32);

Please recompute or drop the assertion to avoid false failures.

crates/core/src/matcher.rs (2)

87-90: Avoid cloning the language object on every match

impl Matcher for str creates a fresh Pattern via Pattern::str(self, node.lang().clone()).
For languages with non-trivial clone() cost this happens for every node examined, which can dominate hot paths such as find_all.

Consider caching the compiled Pattern (e.g. inside a small once_cell::unsync::OnceCell) or switching to Pattern::borrowed(self, node.lang()) if such helper exists.


58-63: Minor: unnecessary env allocation in match_node helper

MatcherExt::match_node always starts with an empty MetaVarEnv, which is then cloned into the NodeMatch even when the pattern fails.
You can save one allocation per failure by constructing NodeMatch only after a successful match:

-    let mut env = Cow::Owned(MetaVarEnv::new());
-    let node = self.match_node_with_env(node, &mut env)?;
-    Some(NodeMatch::new(node, env.into_owned()))
+    let mut env = Cow::Owned(MetaVarEnv::new());
+    self.match_node_with_env(node, &mut env)
+        .map(|n| NodeMatch::new(n, env.into_owned()))
crates/core/src/node.rs (2)

38-41: Position::column now takes SgNode, but work is still O(n)

The generic signature is great, but the call still rescans the source every time.
If callers invoke column in tight loops (e.g. pretty printers) consider memoising the result or passing the pre-computed Source slice instead.


181-251: Potential self-recursion confusion in trait impl

Several methods delegate to identically-named inherent methods, e.g.:

fn node_id(&self) -> usize {
    self.node_id()
}

This works because inherent methods out-prioritise trait methods, but the pattern is easy to misread as infinite recursion.
Adding an explicit fully-qualified call improves clarity and avoids accidental refactor bugs:

-    self.node_id()
+    Node::node_id(self)

(same for is_named, text, etc.)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ab3183 and 1755a65.

📒 Files selected for processing (30)
  • crates/cli/src/print/colored_print.rs (1 hunks)
  • crates/config/src/fixer.rs (9 hunks)
  • crates/config/src/rule/mod.rs (8 hunks)
  • crates/config/src/rule/nth_child.rs (8 hunks)
  • crates/config/src/rule/range.rs (5 hunks)
  • crates/config/src/rule/referent_rule.rs (13 hunks)
  • crates/config/src/rule/relational_rule.rs (9 hunks)
  • crates/config/src/rule/stop_by.rs (5 hunks)
  • crates/config/src/rule_core.rs (13 hunks)
  • crates/config/src/transform/mod.rs (3 hunks)
  • crates/config/src/transform/rewrite.rs (9 hunks)
  • crates/config/src/transform/transformation.rs (5 hunks)
  • crates/core/src/language.rs (4 hunks)
  • crates/core/src/lib.rs (2 hunks)
  • crates/core/src/match_tree/match_node.rs (6 hunks)
  • crates/core/src/match_tree/mod.rs (5 hunks)
  • crates/core/src/matcher.rs (6 hunks)
  • crates/core/src/matcher/kind.rs (4 hunks)
  • crates/core/src/matcher/node_match.rs (5 hunks)
  • crates/core/src/matcher/pattern.rs (15 hunks)
  • crates/core/src/matcher/text.rs (2 hunks)
  • crates/core/src/meta_var.rs (8 hunks)
  • crates/core/src/node.rs (6 hunks)
  • crates/core/src/ops.rs (13 hunks)
  • crates/core/src/replacer.rs (3 hunks)
  • crates/core/src/replacer/structural.rs (1 hunks)
  • crates/core/src/replacer/template.rs (8 hunks)
  • crates/core/src/source.rs (1 hunks)
  • crates/language/src/lib.rs (3 hunks)
  • crates/napi/src/sg_node.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (14)
  • crates/core/src/language.rs
  • crates/core/src/matcher/text.rs
  • crates/core/src/lib.rs
  • crates/core/src/matcher/kind.rs
  • crates/language/src/lib.rs
  • crates/config/src/transform/mod.rs
  • crates/config/src/rule/mod.rs
  • crates/config/src/rule/stop_by.rs
  • crates/config/src/rule/nth_child.rs
  • crates/core/src/meta_var.rs
  • crates/config/src/rule/relational_rule.rs
  • crates/config/src/rule/referent_rule.rs
  • crates/config/src/fixer.rs
  • crates/config/src/rule_core.rs
🧰 Additional context used
🧬 Code Graph Analysis (5)
crates/core/src/match_tree/match_node.rs (2)
crates/core/src/node.rs (1)
  • std (520-532)
crates/core/src/match_tree/mod.rs (3)
  • match_ellipsis (17-22)
  • match_ellipsis (36-40)
  • match_ellipsis (92-105)
crates/core/src/replacer.rs (3)
crates/core/src/replacer/indent.rs (1)
  • formatted_slice (161-168)
crates/config/src/fixer.rs (2)
  • generate_replacement (134-140)
  • get_replaced_range (141-157)
crates/core/src/replacer/template.rs (2)
  • generate_replacement (41-50)
  • gen_replacement (158-164)
crates/config/src/transform/transformation.rs (1)
crates/config/src/transform/rewrite.rs (1)
  • compute (46-86)
crates/core/src/matcher.rs (8)
crates/core/src/node.rs (9)
  • kind (165-165)
  • kind (200-202)
  • kind (306-308)
  • text (168-168)
  • text (197-199)
  • text (341-344)
  • new (27-33)
  • new (72-74)
  • str (57-59)
crates/core/src/matcher/node_match.rs (2)
  • borrow (98-100)
  • new (24-26)
crates/config/src/rule_core.rs (3)
  • match_node_with_env (262-268)
  • new (152-159)
  • potential_kinds (270-272)
crates/core/src/matcher/pattern.rs (5)
  • match_node_with_env (259-277)
  • get_match_len (297-301)
  • new (213-215)
  • str (195-197)
  • potential_kinds (279-295)
crates/config/src/rule/mod.rs (2)
  • match_node_with_env (280-304)
  • potential_kinds (306-326)
crates/config/src/rule/relational_rule.rs (4)
  • match_node_with_env (70-94)
  • match_node_with_env (130-176)
  • match_node_with_env (212-221)
  • match_node_with_env (256-265)
crates/core/src/matcher/kind.rs (3)
  • match_node_with_env (79-89)
  • new (30-36)
  • potential_kinds (91-95)
crates/core/src/matcher/text.rs (1)
  • potential_kinds (39-41)
crates/core/src/ops.rs (7)
crates/core/src/node.rs (6)
  • std (520-532)
  • new (27-33)
  • new (72-74)
  • str (57-59)
  • root (88-93)
  • root (402-404)
crates/core/src/matcher.rs (6)
  • match_node_with_env (35-39)
  • match_node_with_env (78-85)
  • match_node_with_env (97-103)
  • match_node_with_env (150-156)
  • match_node_with_env (166-172)
  • new (122-127)
crates/core/src/matcher/pattern.rs (3)
  • match_node_with_env (259-277)
  • new (213-215)
  • str (195-197)
crates/config/src/rule/mod.rs (1)
  • match_node_with_env (280-304)
crates/config/src/rule/relational_rule.rs (4)
  • match_node_with_env (70-94)
  • match_node_with_env (130-176)
  • match_node_with_env (212-221)
  • match_node_with_env (256-265)
crates/core/src/matcher/kind.rs (2)
  • match_node_with_env (79-89)
  • new (30-36)
crates/core/src/meta_var.rs (1)
  • new (28-35)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: coverage
🔇 Additional comments (8)
crates/core/src/source.rs (1)

80-80:

Details

✅ Verification successful

Added 'static lifetime bound to Doc trait

This change enforces that any type implementing Doc must have a 'static lifetime. This aligns with the broader refactoring effort of removing language generic parameters from core types and replacing them with trait-based abstractions.

Let's verify if all implementors of the Doc trait in the codebase satisfy the new 'static bound:


🏁 Script executed:

#!/bin/bash
# Find all Doc trait implementations
rg "impl.+Doc for" --only-matching

Length of output: 124


🏁 Script executed:

#!/bin/bash
echo "=== Core Doc implementation context ==="
rg -n "impl<L: Language> Doc for" -C3 crates/core/src/source.rs

echo "=== napi/src/doc.rs implementation context ==="
rg -n "impl\s\+Doc\s\+for" -C3 crates/napi/src/doc.rs

Length of output: 465


🏁 Script executed:

#!/bin/bash
echo "=== StrDoc definition ==="
rg -n "struct StrDoc" -C3 crates/core/src/source.rs

echo ""
echo "=== Language trait ==="
rg -n "pub trait Language" -C3 crates/core/src/language.rs

echo ""
echo "=== impl Doc for in napi/src/doc.rs ==="
rg -n "impl Doc for" -C3 crates/napi/src/doc.rs

echo ""
echo "=== struct definitions in napi/src/doc.rs ==="
rg -n "struct" -C3 crates/napi/src/doc.rs

Length of output: 1685


Doc implementors satisfy the new ’static constraint

All existing Doc implementations use only owned data and do not introduce non‐'static references, so they meet the newly added +'static bound:

  • crates/core/src/source.rs:
    impl<L: Language> Doc for StrDoc<L>
    (fields: String, L with no hidden lifetimes)
  • crates/napi/src/doc.rs:
    impl Doc for JsDoc
    (fields: NapiLang, Wrapper, both own their data)

No further changes required—approving this update.

crates/cli/src/print/colored_print.rs (1)

237-237: Updated node access pattern to match new trait-based interface

Changed from direct reference to dereferencing diff.node_match via &* syntax to match the updated column method signature that now expects an SgNode trait object reference instead of a concrete type.

This change aligns with the broader refactoring that introduces trait-based node abstractions across the codebase. The functionality remains the same, but now works with the more generic node interface.

crates/napi/src/sg_node.rs (1)

46-46: Updated position calculation to use the new node trait interface

Changed from passing a reference to self.inner directly to using self.inner.get_node() to obtain a reference to a type implementing the SgNode trait.

This change aligns with the refactoring effort to make the codebase more abstract by accessing nodes through trait methods rather than directly using concrete types.

crates/config/src/rule/range.rs (1)

53-65: Edge-case: zero-length ranges are accepted – is that intended?

try_new rejects start > end but happily accepts start == end, producing a matcher whose start and end point to the same character. The rest of the implementation treats such a range as valid, so find_node will only succeed on zero-width AST nodes (very uncommon).

If zero-length ranges were not meant to be allowed, extend the validation:

- if start_pos.line > end_pos.line
-   || (start_pos.line == end_pos.line && start_pos.column > end_pos.column)
+ if start_pos.line > end_pos.line
+   || (start_pos.line == end_pos.line && start_pos.column >= end_pos.column)

Otherwise, consider documenting the behaviour explicitly so users know a zero-length range is meaningful.

crates/core/src/replacer.rs (1)

68-74: Preserve indentation in raw-node replacement

Node::generate_replacement currently returns the raw byte slice of the matched node. When this replacer is used inside a template, the inserted code may lose the indentation normalisation applied elsewhere (indent_lines). If callers expect consistent formatting, consider re-using formatted_slice or at least documenting that this implementation performs a “verbatim” replacement.

crates/core/src/match_tree/match_node.rs (1)

94-99: Unnecessary lifetime relationship 't: 'p creates extra complexity

The 't: 'p bound forces the tree-node lifetime to outlive the pattern lifetime even though the two lifetimes are unrelated – the pattern slice is read-only and never stored inside the candidate nodes.
Removing the bound simplifies the signature and helps the compiler reason about borrow-checking without introducing any regressions:

-fn may_match_ellipsis_impl<'p, 't: 'p, N: SgNode<'t>>(
+fn may_match_ellipsis_impl<'p, 't, N: SgNode<'t>>(

Unless you deliberately rely on the stricter relationship (I could not find such a use-site), consider dropping it.

crates/core/src/matcher/node_match.rs (1)

20-22: type alias is fine, but watch out for downstream impls

NodeMatch is now a plain type-alias to SgNodeMatch.
That keeps the public signature stable, but remember that you cannot add inherent impls or trait impls directly on a type alias (still an unstable feature).
The alias should only be used for re-exporting – all impl blocks must target the real struct (SgNodeMatch).

crates/core/src/node.rs (1)

148-178: Trait uses impl Trait return types – requires nightly feature

SgNode returns impl Iterator and impl ExactSizeIterator in several methods.
This needs the unstable return_position_impl_trait_in_trait (RPITIT) feature; compiling on stable Rust will fail.

Please ensure the crate is explicitly gated with:

#![cfg_attr(feature = "nightly", feature(return_position_impl_trait_in_trait))]

or provide associated iterator types as a stable fallback.

BREAKING CHANGE: a lot of internal API has been changed to allow new implemenation
@HerringtonDarkholme HerringtonDarkholme changed the title feat: ast-grep Parser Abstratction feat: ast-grep Parser Abstratction (Part 1, Node) Apr 25, 2025
@HerringtonDarkholme
Copy link
Copy Markdown
Member Author

Next PR will bring in part-2. Language

@HerringtonDarkholme HerringtonDarkholme added this pull request to the merge queue Apr 25, 2025
Merged via the queue into main with commit f395bd2 Apr 25, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant