feat: Add support for range matcher by mohebifar · Pull Request #1626 · ast-grep/ast-grep

mohebifar · 2024-11-29T01:07:55Z

This PR introduces a RangeMatcher, enabling users to select nodes based on precise range information such as line and column numbers. This functionality is particularly valuable when working in systems that rely on feedback loops or data derived from language servers which typically communicate in terms of line and column positions.

Motivation:

LSP integration: Many language servers, such as rust-analyzer, provide diagnostic and structural information in row-column format. The RangeMatcher bridges the gap between this data and ast-grep's capabilities, enabling precise node selection for downstream processing.
Advanced workflow support: The matcher allows workflows like:
- Identifying and modifying struct field references based on diagnostics.
- Updating or rewriting code using external feedback from tools like LSPs or compilers.
Complex NAPI workflows: By selecting nodes by range, it becomes easier to manage complex traversal logic.

Summary by CodeRabbit

Release Notes

New Features
- Introduced a range-based matching system, allowing rules to specify a range for node matching.
- Added new interfaces for Position and Range to enhance type definitions related to node positioning.
Bug Fixes
- Improved error handling for invalid ranges during rule serialization.
Tests
- Added a test case for finding nodes by range, enhancing coverage for Unicode and multiline string parsing.

coderabbitai · 2024-11-29T01:08:02Z

Walkthrough

The pull request introduces a new module range within the rule system, adding the SerializableRange type and enhancing the SerializableRule struct to include a range field. The Rule enum is updated with a new variant for range matching, and corresponding methods are modified to accommodate this functionality. Additionally, new error handling is implemented for invalid ranges. The TypeScript declaration file is updated to reflect these changes, introducing new interfaces for Position and Range. A new test case is added to validate the range functionality in the existing test suite.

Changes

File Path	Change Summary
`crates/config/src/rule/mod.rs`	- Added module `range`. - Updated `SerializableRule` to include `pub range: Maybe<SerializableRange>`. - Added variant `Range(RangeMatcher<L>)` to `Rule` enum. - Introduced `InvalidRange` error in `RuleSerializeError`.
`crates/config/src/rule/range.rs`	- Added `SerializablePosition` and `SerializableRange` structs. - Introduced `RangeMatcherError` enum and `RangeMatcher<L>` struct. - Implemented methods for range matching and validation.
`crates/napi/manual.d.ts`	- Added `Position` and `Range` interfaces. - Updated `Rule` interface to include optional `range` property.
`crates/napi/__test__/index.spec.ts`	- Added test case "find node by range" to validate range functionality in parsing.

Possibly related PRs

refactor: expose new struct Position for abstracting column #1632: This PR introduces a new Position struct, which is directly related to the SerializablePosition and SerializableRange types introduced in the main PR, enhancing the handling of positions in the codebase.
feat: support character based column number #1634: This PR modifies the Position struct to support character-based column numbers, which aligns with the changes in the main PR that enhance the functionality of range handling in rules, particularly in how positions are represented and utilized.

🐇 In the land of code where rules now play,
A range was added to brighten the day.
With positions defined, and errors in sight,
Our matching is stronger, our logic more bright.
So hop with delight, let the nodes intertwine,
For in this new range, all will align! 🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

HerringtonDarkholme · 2024-11-29T01:13:03Z

Hi @mohebifar, thanks for the code contribution.
Would you like to elaborate more about the range matcher and submit a feature request issue for it?
Thanks

coderabbitai

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (9)

crates/config/src/rule/range.rs (1)
4-9: Enhance documentation and add validation for position fields.

While the structure is well-designed, consider these improvements:

Expand documentation to explain:

Zero vs. one-based indexing for line/column

Expected range of values

Add validation to prevent invalid positions

Consider adding a constructor with validation:
impl SerializablePosition {
    pub fn new(line: usize, column: usize) -> Result<Self, &'static str> {
        // Assuming 0-based indexing
        if line == 0 && column == 0 {
            return Err("Position cannot be 0,0");
        }
        Ok(Self { line, column })
    }
}
crates/napi/manual.d.ts (2)
36-41: LGTM! Consider enhancing type documentation.

The Position interface is well-defined and aligns with standard LSP implementations. The JSDoc comments clearly indicate 0-based indexing.

Consider adding a brief interface-level JSDoc comment to provide context:
+/**
+ * Represents a position in the source code using 0-based indexing.
+ * This aligns with LSP's position representation.
+ */
 export interface Position {
   /** 0-indexed row number. */
   row: number
   /** 0-indexed column number. */
   column: number
 }
43-46: Add JSDoc documentation for the Range interface.

The Range interface structure is correct, but would benefit from clear documentation about range boundaries (inclusive/exclusive) and usage context.

Consider adding detailed JSDoc documentation:
+/**
+ * Represents a range in the source code.
+ * @property start The position where the range begins (inclusive)
+ * @property end The position where the range ends (exclusive)
+ * 
+ * This interface is used by the RangeMatcher to select nodes within specific boundaries,
+ * particularly useful for LSP integration scenarios.
+ */
 export interface Range {
   start: Position
   end: Position
 }
crates/core/src/matcher/range.rs (2)
80-128: Expand test coverage

The current test suite could be enhanced with additional test cases:

Error cases (invalid ranges)

Edge cases (zero-length ranges, end of file)

Multi-line ranges

Overlapping ranges

Position validation in try_new()

Add these test cases:
#[test]
fn test_invalid_range() {
    let result = RangeMatcher::<Tsx>::try_new(0, 10, 0, 5);
    assert!(matches!(result, Err(RangeMatcherError::InvalidRange)));
}

#[test]
fn test_multi_line_range() {
    let cand = pattern_node("class A {\n  a = 123\n}");
    let cand = cand.root();
    let pattern = RangeMatcher::new(0, 10, 1, 8);
    assert!(pattern.find_node(cand.clone()).is_some());
}
1-128: Consider architectural improvements for better integration

While the basic implementation is sound, consider these architectural improvements:

Add a builder pattern for constructing ranges with validation

Implement Debug and Display traits for better error messages

Consider adding a RangeMatcherConfig for customizable matching behavior (exact vs. overlap)

Add serialization support for integration with the rule system

These improvements would make the RangeMatcher more flexible and easier to integrate with the larger system, especially for LSP integration mentioned in the PR objectives.
crates/core/src/matcher.rs (1)
11-11: Update module documentation to include RangeMatcher

The file's doc comment lists "notable implementations" but should be updated to include the new RangeMatcher implementation, as it's a significant addition for LSP integration.

Apply this diff to update the documentation:
 //! `Matcher` has three notable implementations in this module:
 //! * Pattern: matches against a tree-sitter node based on its tree structure.
 //! * KindMatcher: matches a node based on its `kind`
 //! * RegexMatcher: matches a node based on its textual content using regex.
+//! * RangeMatcher: matches a node based on its position in source code.
crates/core/src/source.rs (1)
Line range hint 143-195: Add tests for the new get_source_text method.

While the implementation is straightforward, it would be beneficial to add tests to verify the behavior and prevent regressions.

Consider adding the following test to the test module:
#[test]
fn test_get_source_text() {
    let src = "var a = 1234".to_string();
    assert_eq!(src.get_source_text(), "var a = 1234");
}
crates/core/src/node.rs (1)
186-222: LGTM! The implementation correctly handles UTF-8 and line endings.

The method successfully converts byte offsets to codepoint positions while properly handling UTF-8 characters and CRLF line endings.

Consider caching the results for frequently accessed nodes to improve performance, as the current implementation needs to iterate through all characters up to the end position.
+use std::cell::RefCell;
+use std::collections::HashMap;
+
 pub struct Node<'r, D: Doc> {
   pub(crate) inner: tree_sitter::Node<'r>,
   pub(crate) root: &'r Root<D>,
+  pos_cache: RefCell<HashMap<usize, std::ops::Range<(usize, usize)>>>,
 }
crates/config/src/rule/mod.rs (1)
61-62: Improve documentation comments for 'range' field

Capitalize the first letter of the sentence and ensure consistent punctuation.

Apply this diff to improve the comments:
 /// `range` accepts a range object.
-/// the target node must exactly appear in the range.
+/// The target node must exactly appear in the range.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 29da2bf and a8a9f0d.

📒 Files selected for processing (7)

crates/config/src/rule/mod.rs (12 hunks)
crates/config/src/rule/range.rs (1 hunks)
crates/core/src/matcher.rs (2 hunks)
crates/core/src/matcher/range.rs (1 hunks)
crates/core/src/node.rs (2 hunks)
crates/core/src/source.rs (2 hunks)
crates/napi/manual.d.ts (1 hunks)

🔇 Additional comments (18)

crates/config/src/rule/range.rs (1)

1-2: LGTM!

The imports are appropriate for serialization and JSON schema generation capabilities.

crates/napi/manual.d.ts (1)

53-54: Enhance documentation and verify matcher combinations.

While the addition of the range property is well-placed, its documentation could be more comprehensive, especially regarding interaction with other matchers.

Enhance the documentation:

-  /** The exact range of the node in the source code. */
+  /**
+   * The exact range of the node in the source code.
+   * When specified, matches nodes that exactly span this range.
+   * Can be combined with other matchers like `pattern` or `kind` for more precise matching.
+   * Particularly useful for LSP integration scenarios where positions are provided in row/column format.
+   */
   range?: Range

Let's verify the integration with other matchers:

✅ Verification successful

Let me try a different search strategy to verify the integration of range matcher with other matchers.

Let's check the test file and range matcher implementation to verify the integration.

Documentation enhancement recommended, but range matcher integration is well-implemented

The range matcher is thoroughly implemented and tested:

It's properly integrated with other matchers as shown in the test cases (e.g., combining with kind matcher and pattern matching)
The implementation in range.rs shows it follows the same matcher interface used by other matchers
Test coverage includes various scenarios: basic matching, non-matching cases, unicode support, and integration with other features

Suggested documentation enhancement:

-  /** The exact range of the node in the source code. */
+  /**
+   * The exact range of the node in the source code.
+   * Matches nodes that precisely span the specified start and end positions.
+   * Can be used standalone or combined with other matchers (pattern, kind).
+   * Positions are zero-based and specified in {line, column} format.
+   */
   range?: Range

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for test cases that combine range matcher with other matchers

# Look for test cases combining range with other matchers
rg -l "range.*pattern|pattern.*range" "test" || echo "No combined matcher tests found"

# Check if there are examples in documentation
fd -e md | xargs rg -l "range.*matcher" || echo "No range matcher documentation found"

Length of output: 287

Script:

#!/bin/bash
# Search for test files in the entire repository
fd -e test.ts -e test.js -e spec.ts -e spec.js

# Search for any files containing 'range' in combination with matching-related terms
rg -l "range.*match|match.*range"

# Look for actual implementation of range matcher to understand integration
ast-grep --pattern 'function $_($$_) {
  $$$
  range
  $$$
}'

Length of output: 435

Script:

#!/bin/bash
# Check the test file for range matcher usage
rg -A 5 -B 5 "range" "crates/napi/__test__/index.spec.ts"

# Check the range matcher implementation
cat "crates/core/src/matcher/range.rs"

# Check node matcher for range integration
rg -A 10 "range" "crates/core/src/matcher/node_match.rs"

Length of output: 7495

crates/core/src/matcher/range.rs (1)

54-78: Consider more flexible position matching

The current implementation has several potential issues:

Exact position matching might be too strict - consider allowing partial overlaps
No handling of multi-byte characters which could affect column positions
No documentation explaining why potential_kinds returns None

Consider implementing a more flexible matching strategy:

Add a method to check if a node's range overlaps with matcher's range
Handle UTF-8 characters properly when comparing column positions
Document the rationale for returning None in potential_kinds

Let's verify how other matchers handle multi-byte characters:

crates/core/src/source.rs (2)

143-143: LGTM: Well-placed trait method addition.

The new get_source_text method is appropriately placed in the Content trait and follows the same pattern as the existing get_text method. The return type Cow<'a, str> is a good choice as it provides flexibility for implementations to return either borrowed or owned strings.

193-195: LGTM: Efficient String implementation.

The implementation for String is correct and efficient, returning a borrowed reference to avoid unnecessary copying. This aligns well with the trait's purpose and the existing implementations.

crates/core/src/node.rs (1)

186-222: Verify edge case handling

Please ensure the implementation handles these edge cases correctly:

Empty nodes (start_byte == end_byte)
End of file scenarios
Mixed line endings (both LF and CRLF)

✅ Verification successful

Let's gather more specific information about the edge cases by examining the test files and implementation details.

Let's search specifically for tests related to empty nodes and line endings in the codebase.

Based on the code review and test results, I can now provide a final response about the edge case handling:

Edge cases are properly handled in the implementation

The codepoint_pos_range() implementation correctly handles the specified edge cases:

Empty nodes (start_byte == end_byte): The implementation handles this implicitly - when start_byte equals end_byte, both start_pos and end_pos will be set to the same position.
End of file scenarios: The implementation doesn't make assumptions about EOF and works with the actual byte positions provided by the node's range.
Mixed line endings: The code explicitly handles both LF ('\n') and CRLF ('\r\n') sequences:

if c == '\r' && source.chars().nth(i + 1) == Some('\n') {
  current_byte += 1; // Skip the '\n' in CRLF
}

The implementation is also tested with multi-byte Unicode characters and line position calculations as shown in test_codepoint_range().

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for edge case tests in the codebase
echo "Searching for empty node tests..."
rg -l "empty.*node" --type rust

echo "Searching for EOF handling tests..."
rg -l "end.*file|eof" --type rust

echo "Searching for line ending tests..."
rg -l "CRLF|\\r\\n|line.*ending" --type rust

Length of output: 690

Script:

#!/bin/bash
# Search for test cases related to node positions and ranges
echo "Searching for node position/range tests..."
rg -l "test.*pos|test.*range" --type rust

# Look for actual test implementations
echo "Checking test implementations..."
rg -A 10 "fn.*test.*pos|fn.*test.*range" --type rust

# Search for specific handling of empty nodes
echo "Checking empty node handling..."
rg -A 5 "start_byte.*==.*end_byte" --type rust

# Check for line ending related code
echo "Checking line ending handling..."
rg -U -A 5 "\\r\\n|\\n|line.*end" --type rust crates/core/src/

Length of output: 152529

Script:

#!/bin/bash
# Search for tests in node.rs that might cover empty nodes and line endings
echo "Checking node.rs tests..."
rg -A 10 "fn test.*" crates/core/src/node.rs

# Look for specific test cases with empty nodes
echo "Checking empty node test cases..."
ast-grep --pattern 'fn test_$_() {
  $$$
  assert_eq!($_, $_, "empty node");
  $$$
}'

# Look for line ending related tests
echo "Checking line ending tests..."
rg -U -A 10 "\\r\\n|CRLF|line.*end" crates/core/src/node.rs

Length of output: 5340

crates/config/src/rule/mod.rs (12)

3-3: Integration of 'range' module looks appropriate

The addition of the mod range; declaration correctly integrates the new range module into the codebase.

9-9: Importing 'SerializableRange' for use in this module

The use range::SerializableRange; statement correctly imports the SerializableRange type for use within this module.

19-21: Addition of 'RangeMatcher' and 'RangeMatcherError' imports

Importing RangeMatcher and RangeMatcherError from ast_grep_core::matcher is appropriate for implementing range matching functionality.

114-114: Properly mapping 'range' field in 'categorized' method

The range field is correctly integrated into the AtomicRule within the categorized method.

137-137: Addition of 'range' field to 'AtomicRule' struct

The range field is appropriately added to the AtomicRule struct to support range-based matching.

218-218: Adding 'Range' variant to 'Rule' enum

The Range(RangeMatcher<L>) variant extends the Rule enum to handle range-based matching rules.

265-265: Handle 'Range' variant in 'defined_vars' method

Correctly returns an empty HashSet for the Range variant, as it does not define any metavariables.

285-285: Include 'Range' variant in 'verify_util' method

The Range variant is properly handled in verify_util, ensuring it passes utility verification.

311-311: Implement 'match_node_with_env' for 'Range' variant

Delegates matching logic to RangeMatcher correctly for the Range variant.

333-333: Include 'Range' variant in 'potential_kinds' method

The Range variant is correctly included in potential_kinds, obtaining potential kinds from RangeMatcher.

380-381: Addition of 'InvalidRange' variant in 'RuleSerializeError'

The InvalidRange error variant properly handles errors related to RangeMatcher during serialization.

500-507: Deserialization of 'range' field in 'deserialze_atomic_rule'

The code correctly handles the range field by creating a new RangeMatcher with the provided range values.

crates/config/src/rule/range.rs

crates/core/src/matcher/range.rs

crates/core/src/matcher.rs

crates/core/src/source.rs

crates/core/src/node.rs

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)

crates/core/src/matcher/range.rs (1)
16-20: Consider implementing Display trait for Position

While previous comments covered documentation and comparison methods, the Position struct could benefit from implementing the Display trait for better error messages and debugging, especially when used with LSP integration.
 pub struct Position<L: Language> {
     row: usize,
     column: usize,
     lang: PhantomData<L>,
 }
+
+impl<L: Language> std::fmt::Display for Position<L> {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "{}:{}", self.row, self.column)
+    }
+}
crates/napi/src/doc.rs (1)
111-114: LGTM! Consider a small optimization.

The implementation is correct and aligns well with the Content trait requirements. However, we could avoid the temporary variable s and directly return the result of encode_bytes.
  fn get_source_text(&self) -> Cow<'_, str> {
-    let s = Self::encode_bytes(&self.inner);
-    s
+    Self::encode_bytes(&self.inner)
  }
crates/core/src/node.rs (1)
186-222: Consider optimizing the implementation for better performance

The current implementation iterates through all characters up to the end position. This could be optimized by:

Using chars().take_while() to stop at end_byte

Pre-calculating the string slice up to end_byte

Using the tree-sitter's built-in position information where possible
 pub fn codepoint_pos_range(&self) -> std::ops::Range<(usize, usize)> {
   let source = self.root.doc.get_source().get_source_text();
   let start_byte = self.range().start;
   let end_byte = self.range().end;
-
   let mut start_pos = (0, 0);
   let mut end_pos = (0, 0);
   let mut current_pos = (0, 0);
   let mut current_byte = 0;
-
-  for (i, c) in source.chars().enumerate() {
+  // Optimize by only processing up to end_byte
+  let relevant_text = &source[..end_byte];
+  for (i, c) in relevant_text.chars().enumerate() {
     if current_byte == start_byte {
       start_pos = current_pos;
     }
     if current_byte == end_byte {
       end_pos = current_pos;
       break;
     }
     if c == '\n' || c == '\r' {
       if c == '\r' && source.chars().nth(i + 1) == Some('\n') {
         current_byte += 1; // Skip the '\n' in CRLF
       }
       current_pos.0 += 1;
       current_pos.1 = 0;
     } else {
       current_pos.1 += 1;
     }
     current_byte += c.len_utf8();
   }
-
-  if current_byte == end_byte {
-    end_pos = current_pos;
-  }
+  // Set end_pos if we haven't found it yet
+  if end_pos == (0, 0) {
+    end_pos = current_pos;
+  }
   start_pos..end_pos
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a8a9f0d and 73da6f4.

📒 Files selected for processing (4)

crates/core/src/matcher/range.rs (1 hunks)
crates/core/src/node.rs (2 hunks)
crates/core/src/source.rs (2 hunks)
crates/napi/src/doc.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

crates/core/src/source.rs

🔇 Additional comments (3)

crates/core/src/matcher/range.rs (2)

10-14: Enhance error handling implementation

The error handling implementation needs improvement as previously noted.

54-78: Consider relaxing exact range matching

The current implementation only matches nodes with exact range boundaries. For practical LSP integration, consider:

Adding a tolerance option for fuzzy matching
Supporting partial overlaps
Allowing containment-based matching

Example LSP scenario: When a language server reports an error at position 5:10, the actual AST node might span 5:8-5:12. The current exact matching would miss this node.

Let's verify if this is a real concern in the codebase:

crates/core/src/node.rs (1)

739-749: 🛠️ Refactor suggestion

Add more test cases for comprehensive coverage

The current test only covers a basic case with UTF-8 characters. As suggested in the past review, we should add more test cases.

The previous review comment suggesting additional test cases for CRLF line endings, empty nodes, multi-line nodes, and end of file scenarios is still valid and should be addressed.

 #[test]
 fn test_codepoint_range() {
     let multi_line_code = r#"let a = 123;
 let b = "Übergänge";
 "#;
     let root = Tsx.ast_grep(multi_line_code);
     let node = root.root().find("\"Übergänge\"").expect("should exist");
     let range = node.codepoint_pos_range();
     assert_eq!(range.start, (1, 8));
     assert_eq!(range.end, (1, 19));
+
+    // Test CRLF line endings
+    let crlf_code = "let a = 123;\r\nlet b = 456;";
+    let root = Tsx.ast_grep(crlf_code);
+    let node = root.root().find("456").expect("should exist");
+    let range = node.codepoint_pos_range();
+    assert_eq!(range.start, (1, 8));
+    assert_eq!(range.end, (1, 11));
+
+    // Test empty nodes
+    let empty_code = "let a = '';";
+    let root = Tsx.ast_grep(empty_code);
+    let node = root.root().find("''").expect("should exist");
+    let range = node.codepoint_pos_range();
+    assert_eq!(range.start, (0, 8));
+    assert_eq!(range.end, (0, 10));
+
+    // Test multi-line nodes
+    let multi_line = "let a = `\nline1\nline2\n`;";
+    let root = Tsx.ast_grep(multi_line);
+    let node = root.root().find("`$_`").expect("should exist");
+    let range = node.codepoint_pos_range();
+    assert_eq!(range.start, (0, 8));
+    assert_eq!(range.end, (3, 1));
 }

crates/core/src/matcher/range.rs

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)

crates/core/src/node.rs (1)
186-222: Consider optimizing the position calculation implementation.

The implementation is correct but could be optimized in a few ways:

Pre-calculate the end position if we know the byte length

Use chars().take_while() to avoid iterating past the end position

Consider caching the results for frequently accessed nodes
 pub fn codepoint_pos_range(&self) -> std::ops::Range<(usize, usize)> {
   let source = self.root.doc.get_source().get_source_text();
   let start_byte = self.range().start;
   let end_byte = self.range().end;

   let mut start_pos = (0, 0);
   let mut end_pos = (0, 0);
   let mut current_pos = (0, 0);
   let mut current_byte = 0;

-  for (i, c) in source.chars().enumerate() {
+  for (i, c) in source.chars()
+    .take_while(|_| current_byte <= end_byte)
+    .enumerate() {
     if current_byte == start_byte {
       start_pos = current_pos;
     }
     if current_byte == end_byte {
       end_pos = current_pos;
       break;
     }
     if c == '\n' || c == '\r' {
       if c == '\r' && source.chars().nth(i + 1) == Some('\n') {
         current_byte += 1; // Skip the '\n' in CRLF
       }
       current_pos.0 += 1;
       current_pos.1 = 0;
     } else {
       current_pos.1 += 1;
     }
     current_byte += c.len_utf8();
   }

   if current_byte == end_byte {
     end_pos = current_pos;
   }

   start_pos..end_pos
 }
crates/config/src/rule/mod.rs (3)
61-64: Consider enhancing the range field documentation

While the documentation is good, it could be more descriptive by including:

Example usage with line/column numbers

Common use cases (e.g., LSP integration)

Relationship with other matchers

500-507: Consider improving range deserialization robustness

The current implementation could be enhanced for better maintainability and validation:

Consider using a builder pattern or constructor method for RangeMatcher to encapsulate the creation logic

Add validation for row/column values before creating the matcher

Example approach:
-  if let Some(range) = atomic.range {
-    rules.push(R::Range(RangeMatcher::try_new(
-      range.start.row,
-      range.start.column,
-      range.end.row,
-      range.end.column,
-    )?));
-  }
+  if let Some(range) = atomic.range {
+    rules.push(R::Range(RangeMatcher::builder()
+      .with_start_position(range.start.row, range.start.column)
+      .with_end_position(range.end.row, range.end.column)
+      .validate()?
+      .build()?));
+  }
Line range hint 511-685: Add test coverage for range matcher

The test module lacks coverage for the new range matcher functionality. Consider adding tests for:

Range matcher serialization/deserialization

Invalid range scenarios

Integration with other matchers

Example test case:
#[test]
fn test_range_matcher() {
    let src = r#"
    range:
      start:
        row: 1
        column: 0
      end:
        row: 2
        column: 10
    "#;
    let rule: SerializableRule = from_str(src).expect("cannot parse rule");
    let env = DeserializeEnv::new(TypeScript::Tsx);
    let rule = deserialize_rule(rule, &env).expect("should deserialize");
    // Add assertions here
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 73da6f4 and d73d9f2.

📒 Files selected for processing (8)

crates/config/src/rule/mod.rs (12 hunks)
crates/config/src/rule/range.rs (1 hunks)
crates/core/src/matcher.rs (2 hunks)
crates/core/src/matcher/range.rs (1 hunks)
crates/core/src/node.rs (2 hunks)
crates/core/src/source.rs (2 hunks)
crates/napi/manual.d.ts (1 hunks)
crates/napi/src/doc.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

crates/core/src/matcher.rs
crates/core/src/matcher/range.rs
crates/core/src/source.rs
crates/napi/manual.d.ts
crates/napi/src/doc.rs

🔇 Additional comments (7)

crates/config/src/rule/range.rs (3)

1-2: LGTM!

The imports are appropriate for serialization and JSON schema generation.

4-9: ⚠️ Potential issue

Align position representation with LSP specification.

The current implementation might cause compatibility issues with Language Server Protocol (LSP) integration:

Field names should match LSP's Position type:
- row should be line
- column should be character
Types should be u32 instead of usize to match LSP
Documentation should clarify zero-based indexing

Apply this diff:

 /// Represents a position in a document
+/// 
+/// Position in a document expressed as zero-based line and character offset.
+/// A position is between two characters like an 'insert' cursor in a editor.
 #[derive(Serialize, Deserialize, Clone, JsonSchema)]
 pub struct SerializablePosition {
-    pub row: usize,
-    pub column: usize,
+    pub line: u32,
+    pub character: u32,
 }

11-16: 🛠️ Refactor suggestion

Enhance range validation and documentation.

The range structure needs additional safeguards and clarity:

Add validation to ensure start position precedes end position
Clarify range inclusivity in documentation
Add helper methods for common operations
Add Debug derive for better error messages

Apply this diff:

 /// Represents a range rule with a start and end position
+/// 
+/// A range represents a span between two positions.
+/// The start position is inclusive and the end position is exclusive.
- #[derive(Serialize, Deserialize, Clone, JsonSchema)]
+ #[derive(Debug, Serialize, Deserialize, Clone, JsonSchema)]
 pub struct SerializableRange {
     pub start: SerializablePosition,
     pub end: SerializablePosition,
 }
+
+impl SerializableRange {
+    /// Creates a new range after validating that start precedes end
+    pub fn new(start: SerializablePosition, end: SerializablePosition) -> Result<Self, &'static str> {
+        if start.line > end.line || (start.line == end.line && start.character > end.character) {
+            return Err("start position must precede end position");
+        }
+        Ok(Self { start, end })
+    }
+
+    /// Returns true if the given position is within this range
+    pub fn contains(&self, pos: &SerializablePosition) -> bool {
+        (self.start.line < pos.line || 
+         (self.start.line == pos.line && self.start.character <= pos.character))
+        &&
+        (pos.line < self.end.line ||
+         (pos.line == self.end.line && pos.character < self.end.character))
+    }
+}

crates/core/src/node.rs (1)

739-749: Add more test cases for comprehensive coverage.

While the current test case covers UTF-8 characters well, it's missing important scenarios as mentioned in the past review.

As suggested in the previous review, please add test cases for:

CRLF line endings
Empty nodes
Multi-line nodes
End of file scenarios

The test additions were already well-detailed in the past review comment.

crates/config/src/rule/mod.rs (3)

3-3: LGTM: Module organization follows Rust conventions

The new range module and its import are well-organized and follow Rust's module system conventions.

Also applies to: 9-9

265-265: LGTM: Consistent implementation across all Rule methods

The Range matcher is properly integrated into all relevant Rule methods, maintaining consistency with other atomic matchers.

Also applies to: 285-285, 311-311, 333-333

380-381: LGTM: Error handling follows Rust best practices

The error handling for invalid range matchers is properly implemented using thiserror and follows Rust's error handling conventions.

crates/config/src/rule/range.rs

crates/config/src/rule/mod.rs

crates/core/src/node.rs

crates/core/src/source.rs

codecov · 2024-12-02T02:34:17Z

Codecov Report

Attention: Patch coverage is 75.75758% with 24 lines in your changes missing coverage. Please review.

Project coverage is 87.05%. Comparing base (87a7294) to head (9de1b14).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/config/src/rule/range.rs	79.77%	18 Missing ⚠️
crates/config/src/rule/mod.rs	40.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1626      +/-   ##
==========================================
- Coverage   87.12%   87.05%   -0.08%     
==========================================
  Files          95       96       +1     
  Lines       15360    15458      +98     
==========================================
+ Hits        13383    13457      +74     
- Misses       1977     2001      +24

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

crates/config/src/rule/range.rs (3)

12-19: Enhance range documentation for clarity.

The documentation should specify:

Whether the end position is inclusive or exclusive
That start position must precede end position

 /// Represents a position in source code using 0-based row and column numbers
+/// A range represents a span of text in source code. The end position is exclusive.
+/// The start position must precede the end position in the document.
 #[derive(Serialize, Deserialize, Clone, JsonSchema)]
 pub struct SerializableRange {
   /// 0-based row number in the source code
   pub start: SerializablePosition,
-  /// 0-based column number in the source code
+  /// 0-based column number in the source code (exclusive)
   pub end: SerializablePosition,
 }

61-74: Extract range validation logic for reusability.

Consider moving the range validation logic to SerializableRange for reuse:

 impl SerializableRange {
+    pub fn validate(&self) -> Result<(), RangeMatcherError> {
+        if self.start.row > self.end.row
+            || (self.start.row == self.end.row && self.start.column > self.end.column)
+        {
+            return Err(RangeMatcherError::InvalidRange);
+        }
+        Ok(())
+    }
 }

 impl<L: Language> RangeMatcher<L> {
     pub fn try_new(
         start_pos: SerializablePosition,
         end_pos: SerializablePosition,
     ) -> Result<RangeMatcher<L>, RangeMatcherError> {
-        if start_pos.row > end_pos.row
-            || (start_pos.row == end_pos.row && start_pos.column > end_pos.column)
-        {
-            return Err(RangeMatcherError::InvalidRange);
-        }
+        let range = SerializableRange { start: start_pos, end: end_pos };
+        range.validate()?;
         let range = Self::new(start_pos, end_pos);
         Ok(range)
     }
 }

99-161: Add test cases for edge cases.

Consider adding tests for:

Zero-length ranges (where start equals end)
Ranges spanning multiple lines with Unicode characters
Ranges at file boundaries (start of file and end of file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d73d9f2 and f318c96.

📒 Files selected for processing (4)

crates/config/src/rule/mod.rs (13 hunks)
crates/config/src/rule/range.rs (1 hunks)
crates/napi/__test__/index.spec.ts (1 hunks)
crates/napi/manual.d.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

crates/napi/manual.d.ts

🔇 Additional comments (7)

crates/config/src/rule/range.rs (4)

38-44: LGTM!

The position comparison logic is correct and handles the node position appropriately.

46-50: LGTM!

The struct is correctly implemented with proper use of PhantomData for the language type parameter.

76-97: LGTM!

The Matcher trait implementation is correct and properly handles node matching and potential kinds.

5-10: 🛠️ Refactor suggestion

Align position fields with LSP types for better integration.

Consider modifying the struct to match LSP's Position type:

Rename row to line and column to character
Use u32 instead of usize for better cross-platform compatibility with LSP

 #[derive(Serialize, Deserialize, Clone, JsonSchema)]
 pub struct SerializablePosition {
-  pub row: usize,
-  pub column: usize,
+  pub line: u32,
+  pub character: u32,
 }

Likely invalid or redundant comment.

crates/napi/__test__/index.spec.ts (1)

377-389: LGTM!

The test case effectively validates the range matcher with Unicode characters and multiline content.

crates/config/src/rule/mod.rs (2)

59-62: LGTM!

The range field is properly integrated into SerializableRule with appropriate documentation.

501-503: LGTM!

The range deserialization is correctly implemented in the atomic rule handling.

crates/config/src/rule/range.rs

…matcher

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)

crates/config/src/rule/range.rs (3)
46-74: Add documentation and consider extracting validation logic.

The RangeMatcher implementation could be improved:

Add documentation for the struct explaining its purpose

Consider moving validation logic to SerializableRange for reuse
+/// Matches nodes that exactly appear within the specified range.
+/// The range is inclusive of both start and end positions.
 pub struct RangeMatcher<L: Language> {
   start: SerializablePosition,
   end: SerializablePosition,
   lang: PhantomData<L>,
 }
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 69-72: crates/config/src/rule/range.rs#L69-L72
Added lines #L69 - L72 were not covered by tests

76-97: Consider caching node positions for better performance.

The current implementation calls start_pos() and end_pos() for each node check. Consider caching these values when processing multiple nodes:
 fn match_node_with_env<'tree, D: Doc<Lang = L>>(
   &self,
   node: Node<'tree, D>,
   _env: &mut Cow<MetaVarEnv<'tree, D>>,
 ) -> Option<Node<'tree, D>> {
+  // Cache positions to avoid multiple calls
+  let (node_start_pos, node_end_pos) = (node.start_pos(), node.end_pos());
-  let node_start_pos = node.start_pos();
-  let node_end_pos = node.end_pos();

   if self.start.equals_node_pos(&node_start_pos, &node)
     && self.end.equals_node_pos(&node_end_pos, &node)
   {
     Some(node)
   } else {
     None
   }
 }
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 94-96: crates/config/src/rule/range.rs#L94-L96
Added lines #L94 - L96 were not covered by tests

99-161: Enhance test coverage with edge cases.

Consider adding tests for:

Positions at file boundaries (0,0)

Empty ranges (start = end)

Unicode character handling

Very large position values

Would you like me to help generate these additional test cases?

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 123-125: crates/config/src/rule/range.rs#L123-L125
Added lines #L123 - L125 were not covered by tests

[warning] 139-141: crates/config/src/rule/range.rs#L139-L141
Added lines #L139 - L141 were not covered by tests

[warning] 156-158: crates/config/src/rule/range.rs#L156-L158
Added lines #L156 - L158 were not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f318c96 and 9de1b14.

📒 Files selected for processing (4)

crates/config/src/rule/mod.rs (13 hunks)
crates/config/src/rule/range.rs (1 hunks)
crates/napi/__test__/index.spec.ts (1 hunks)
crates/napi/manual.d.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

crates/napi/test/index.spec.ts
crates/napi/manual.d.ts

🧰 Additional context used

🪛 GitHub Check: codecov/patch

crates/config/src/rule/mod.rs

[warning] 231-231: crates/config/src/rule/mod.rs#L231
Added line #L231 was not covered by tests

[warning] 266-266: crates/config/src/rule/mod.rs#L266
Added line #L266 was not covered by tests

[warning] 286-286: crates/config/src/rule/mod.rs#L286
Added line #L286 was not covered by tests

[warning] 312-312: crates/config/src/rule/mod.rs#L312
Added line #L312 was not covered by tests

[warning] 334-334: crates/config/src/rule/mod.rs#L334
Added line #L334 was not covered by tests

[warning] 502-502: crates/config/src/rule/mod.rs#L502
Added line #L502 was not covered by tests

crates/config/src/rule/range.rs

[warning] 6-6: crates/config/src/rule/range.rs#L6
Added line #L6 was not covered by tests

[warning] 13-13: crates/config/src/rule/range.rs#L13
Added line #L13 was not covered by tests

[warning] 69-72: crates/config/src/rule/range.rs#L69-L72
Added lines #L69 - L72 were not covered by tests

[warning] 94-96: crates/config/src/rule/range.rs#L94-L96
Added lines #L94 - L96 were not covered by tests

[warning] 123-125: crates/config/src/rule/range.rs#L123-L125
Added lines #L123 - L125 were not covered by tests

[warning] 139-141: crates/config/src/rule/range.rs#L139-L141
Added lines #L139 - L141 were not covered by tests

[warning] 156-158: crates/config/src/rule/range.rs#L156-L158
Added lines #L156 - L158 were not covered by tests

🔇 Additional comments (1)

crates/config/src/rule/mod.rs (1)

501-503: Add tests for range rule deserialization.

The static analysis indicates missing test coverage for range rule deserialization. Consider adding tests for:

Deserializing valid range rules
Error handling for invalid ranges
Integration with other rule types

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 502-502: crates/config/src/rule/mod.rs#L502
Added line #L502 was not covered by tests

crates/config/src/rule/range.rs

mohebifar force-pushed the feat/support-range-matcher branch from cc99316 to a8a9f0d Compare November 29, 2024 04:55

mohebifar marked this pull request as ready for review November 29, 2024 05:01

coderabbitai bot reviewed Nov 29, 2024

View reviewed changes

mohebifar force-pushed the feat/support-range-matcher branch from a8a9f0d to 73da6f4 Compare November 29, 2024 05:08

coderabbitai bot reviewed Nov 29, 2024

View reviewed changes

crates/core/src/matcher/range.rs Outdated Show resolved Hide resolved

crates/core/src/matcher/range.rs Outdated Show resolved Hide resolved

mohebifar force-pushed the feat/support-range-matcher branch 2 times, most recently from ce8c69b to d73d9f2 Compare November 29, 2024 05:36

coderabbitai bot reviewed Nov 29, 2024

View reviewed changes

crates/config/src/rule/range.rs Show resolved Hide resolved

crates/config/src/rule/mod.rs Show resolved Hide resolved

HerringtonDarkholme reviewed Nov 29, 2024

View reviewed changes

crates/core/src/node.rs Outdated Show resolved Hide resolved

HerringtonDarkholme reviewed Nov 29, 2024

View reviewed changes

crates/core/src/source.rs Outdated Show resolved Hide resolved

mohebifar force-pushed the feat/support-range-matcher branch from d73d9f2 to f318c96 Compare December 2, 2024 02:31

coderabbitai bot reviewed Dec 2, 2024

View reviewed changes

HerringtonDarkholme reviewed Dec 2, 2024

View reviewed changes

crates/config/src/rule/range.rs Show resolved Hide resolved

HerringtonDarkholme reviewed Dec 2, 2024

View reviewed changes

crates/config/src/rule/range.rs Show resolved Hide resolved

mohebifar added 2 commits December 1, 2024 23:35

feat: Add support for range matcher

4e76e4e

refactor: Use the utf-corrected row and column methods for the range …

9de1b14

…matcher

HerringtonDarkholme force-pushed the feat/support-range-matcher branch from f318c96 to 9de1b14 Compare December 2, 2024 04:35

coderabbitai bot reviewed Dec 2, 2024

View reviewed changes

crates/config/src/rule/range.rs Show resolved Hide resolved

crates/config/src/rule/range.rs Show resolved Hide resolved

HerringtonDarkholme enabled auto-merge December 2, 2024 05:43

HerringtonDarkholme added this pull request to the merge queue Dec 2, 2024

Merged via the queue into ast-grep:main with commit 2b30f56 Dec 2, 2024

This was referenced Dec 14, 2024

feat: improve type generation #1662

Merged

fix: rename range rule's row to line #1664

Merged

feat: reorganize type files #1678

Merged

coderabbitai bot mentioned this pull request Dec 29, 2025

Proposal: Pluggable Semantic Analysis Factory API for ast-grep (JS/Rust/CLI) #2413

Open

Uh oh!

Conversation

mohebifar commented Nov 29, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

HerringtonDarkholme commented Nov 29, 2024

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mohebifar commented Nov 29, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 29, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Dec 2, 2024 •

edited

Loading