Skip to content

feat: Add support for range matcher#1626

Merged
HerringtonDarkholme merged 2 commits intoast-grep:mainfrom
mohebifar:feat/support-range-matcher
Dec 2, 2024
Merged

feat: Add support for range matcher#1626
HerringtonDarkholme merged 2 commits intoast-grep:mainfrom
mohebifar:feat/support-range-matcher

Conversation

@mohebifar
Copy link
Copy Markdown
Member

@mohebifar mohebifar commented Nov 29, 2024

This PR introduces a RangeMatcher, enabling users to select nodes based on precise range information such as line and column numbers. This functionality is particularly valuable when working in systems that rely on feedback loops or data derived from language servers which typically communicate in terms of line and column positions.

Motivation:

  1. LSP integration: Many language servers, such as rust-analyzer, provide diagnostic and structural information in row-column format. The RangeMatcher bridges the gap between this data and ast-grep's capabilities, enabling precise node selection for downstream processing.
  2. Advanced workflow support: The matcher allows workflows like:
    • Identifying and modifying struct field references based on diagnostics.
    • Updating or rewriting code using external feedback from tools like LSPs or compilers.
  3. Complex NAPI workflows: By selecting nodes by range, it becomes easier to manage complex traversal logic.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a range-based matching system, allowing rules to specify a range for node matching.
    • Added new interfaces for Position and Range to enhance type definitions related to node positioning.
  • Bug Fixes

    • Improved error handling for invalid ranges during rule serialization.
  • Tests

    • Added a test case for finding nodes by range, enhancing coverage for Unicode and multiline string parsing.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 29, 2024

Walkthrough

The pull request introduces a new module range within the rule system, adding the SerializableRange type and enhancing the SerializableRule struct to include a range field. The Rule enum is updated with a new variant for range matching, and corresponding methods are modified to accommodate this functionality. Additionally, new error handling is implemented for invalid ranges. The TypeScript declaration file is updated to reflect these changes, introducing new interfaces for Position and Range. A new test case is added to validate the range functionality in the existing test suite.

Changes

File Path Change Summary
crates/config/src/rule/mod.rs - Added module range.
- Updated SerializableRule to include pub range: Maybe<SerializableRange>.
- Added variant Range(RangeMatcher<L>) to Rule enum.
- Introduced InvalidRange error in RuleSerializeError.
crates/config/src/rule/range.rs - Added SerializablePosition and SerializableRange structs.
- Introduced RangeMatcherError enum and RangeMatcher<L> struct.
- Implemented methods for range matching and validation.
crates/napi/manual.d.ts - Added Position and Range interfaces.
- Updated Rule interface to include optional range property.
crates/napi/__test__/index.spec.ts - Added test case "find node by range" to validate range functionality in parsing.

Possibly related PRs

🐇 In the land of code where rules now play,
A range was added to brighten the day.
With positions defined, and errors in sight,
Our matching is stronger, our logic more bright.
So hop with delight, let the nodes intertwine,
For in this new range, all will align! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@HerringtonDarkholme
Copy link
Copy Markdown
Member

Hi @mohebifar, thanks for the code contribution.
Would you like to elaborate more about the range matcher and submit a feature request issue for it?
Thanks

@mohebifar mohebifar force-pushed the feat/support-range-matcher branch from cc99316 to a8a9f0d Compare November 29, 2024 04:55
@mohebifar mohebifar marked this pull request as ready for review November 29, 2024 05:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (9)
crates/config/src/rule/range.rs (1)

4-9: Enhance documentation and add validation for position fields.

While the structure is well-designed, consider these improvements:

  1. Expand documentation to explain:
    • Zero vs. one-based indexing for line/column
    • Expected range of values
  2. Add validation to prevent invalid positions

Consider adding a constructor with validation:

impl SerializablePosition {
    pub fn new(line: usize, column: usize) -> Result<Self, &'static str> {
        // Assuming 0-based indexing
        if line == 0 && column == 0 {
            return Err("Position cannot be 0,0");
        }
        Ok(Self { line, column })
    }
}
crates/napi/manual.d.ts (2)

36-41: LGTM! Consider enhancing type documentation.

The Position interface is well-defined and aligns with standard LSP implementations. The JSDoc comments clearly indicate 0-based indexing.

Consider adding a brief interface-level JSDoc comment to provide context:

+/**
+ * Represents a position in the source code using 0-based indexing.
+ * This aligns with LSP's position representation.
+ */
 export interface Position {
   /** 0-indexed row number. */
   row: number
   /** 0-indexed column number. */
   column: number
 }

43-46: Add JSDoc documentation for the Range interface.

The Range interface structure is correct, but would benefit from clear documentation about range boundaries (inclusive/exclusive) and usage context.

Consider adding detailed JSDoc documentation:

+/**
+ * Represents a range in the source code.
+ * @property start The position where the range begins (inclusive)
+ * @property end The position where the range ends (exclusive)
+ * 
+ * This interface is used by the RangeMatcher to select nodes within specific boundaries,
+ * particularly useful for LSP integration scenarios.
+ */
 export interface Range {
   start: Position
   end: Position
 }
crates/core/src/matcher/range.rs (2)

80-128: Expand test coverage

The current test suite could be enhanced with additional test cases:

  1. Error cases (invalid ranges)
  2. Edge cases (zero-length ranges, end of file)
  3. Multi-line ranges
  4. Overlapping ranges
  5. Position validation in try_new()

Add these test cases:

#[test]
fn test_invalid_range() {
    let result = RangeMatcher::<Tsx>::try_new(0, 10, 0, 5);
    assert!(matches!(result, Err(RangeMatcherError::InvalidRange)));
}

#[test]
fn test_multi_line_range() {
    let cand = pattern_node("class A {\n  a = 123\n}");
    let cand = cand.root();
    let pattern = RangeMatcher::new(0, 10, 1, 8);
    assert!(pattern.find_node(cand.clone()).is_some());
}

1-128: Consider architectural improvements for better integration

While the basic implementation is sound, consider these architectural improvements:

  1. Add a builder pattern for constructing ranges with validation
  2. Implement Debug and Display traits for better error messages
  3. Consider adding a RangeMatcherConfig for customizable matching behavior (exact vs. overlap)
  4. Add serialization support for integration with the rule system

These improvements would make the RangeMatcher more flexible and easier to integrate with the larger system, especially for LSP integration mentioned in the PR objectives.

crates/core/src/matcher.rs (1)

11-11: Update module documentation to include RangeMatcher

The file's doc comment lists "notable implementations" but should be updated to include the new RangeMatcher implementation, as it's a significant addition for LSP integration.

Apply this diff to update the documentation:

 //! `Matcher` has three notable implementations in this module:
 //! * Pattern: matches against a tree-sitter node based on its tree structure.
 //! * KindMatcher: matches a node based on its `kind`
 //! * RegexMatcher: matches a node based on its textual content using regex.
+//! * RangeMatcher: matches a node based on its position in source code.
crates/core/src/source.rs (1)

Line range hint 143-195: Add tests for the new get_source_text method.

While the implementation is straightforward, it would be beneficial to add tests to verify the behavior and prevent regressions.

Consider adding the following test to the test module:

#[test]
fn test_get_source_text() {
    let src = "var a = 1234".to_string();
    assert_eq!(src.get_source_text(), "var a = 1234");
}
crates/core/src/node.rs (1)

186-222: LGTM! The implementation correctly handles UTF-8 and line endings.

The method successfully converts byte offsets to codepoint positions while properly handling UTF-8 characters and CRLF line endings.

Consider caching the results for frequently accessed nodes to improve performance, as the current implementation needs to iterate through all characters up to the end position.

+use std::cell::RefCell;
+use std::collections::HashMap;
+
 pub struct Node<'r, D: Doc> {
   pub(crate) inner: tree_sitter::Node<'r>,
   pub(crate) root: &'r Root<D>,
+  pos_cache: RefCell<HashMap<usize, std::ops::Range<(usize, usize)>>>,
 }
crates/config/src/rule/mod.rs (1)

61-62: Improve documentation comments for 'range' field

Capitalize the first letter of the sentence and ensure consistent punctuation.

Apply this diff to improve the comments:

 /// `range` accepts a range object.
-/// the target node must exactly appear in the range.
+/// The target node must exactly appear in the range.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 29da2bf and a8a9f0d.

📒 Files selected for processing (7)
  • crates/config/src/rule/mod.rs (12 hunks)
  • crates/config/src/rule/range.rs (1 hunks)
  • crates/core/src/matcher.rs (2 hunks)
  • crates/core/src/matcher/range.rs (1 hunks)
  • crates/core/src/node.rs (2 hunks)
  • crates/core/src/source.rs (2 hunks)
  • crates/napi/manual.d.ts (1 hunks)
🔇 Additional comments (18)
crates/config/src/rule/range.rs (1)

1-2: LGTM!

The imports are appropriate for serialization and JSON schema generation capabilities.

crates/napi/manual.d.ts (1)

53-54: Enhance documentation and verify matcher combinations.

While the addition of the range property is well-placed, its documentation could be more comprehensive, especially regarding interaction with other matchers.

  1. Enhance the documentation:
-  /** The exact range of the node in the source code. */
+  /**
+   * The exact range of the node in the source code.
+   * When specified, matches nodes that exactly span this range.
+   * Can be combined with other matchers like `pattern` or `kind` for more precise matching.
+   * Particularly useful for LSP integration scenarios where positions are provided in row/column format.
+   */
   range?: Range
  1. Let's verify the integration with other matchers:
✅ Verification successful

Let me try a different search strategy to verify the integration of range matcher with other matchers.


Let's check the test file and range matcher implementation to verify the integration.


Documentation enhancement recommended, but range matcher integration is well-implemented

The range matcher is thoroughly implemented and tested:

  • It's properly integrated with other matchers as shown in the test cases (e.g., combining with kind matcher and pattern matching)
  • The implementation in range.rs shows it follows the same matcher interface used by other matchers
  • Test coverage includes various scenarios: basic matching, non-matching cases, unicode support, and integration with other features

Suggested documentation enhancement:

-  /** The exact range of the node in the source code. */
+  /**
+   * The exact range of the node in the source code.
+   * Matches nodes that precisely span the specified start and end positions.
+   * Can be used standalone or combined with other matchers (pattern, kind).
+   * Positions are zero-based and specified in {line, column} format.
+   */
   range?: Range
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for test cases that combine range matcher with other matchers

# Look for test cases combining range with other matchers
rg -l "range.*pattern|pattern.*range" "test" || echo "No combined matcher tests found"

# Check if there are examples in documentation
fd -e md | xargs rg -l "range.*matcher" || echo "No range matcher documentation found"

Length of output: 287


Script:

#!/bin/bash
# Search for test files in the entire repository
fd -e test.ts -e test.js -e spec.ts -e spec.js

# Search for any files containing 'range' in combination with matching-related terms
rg -l "range.*match|match.*range"

# Look for actual implementation of range matcher to understand integration
ast-grep --pattern 'function $_($$_) {
  $$$
  range
  $$$
}'

Length of output: 435


Script:

#!/bin/bash
# Check the test file for range matcher usage
rg -A 5 -B 5 "range" "crates/napi/__test__/index.spec.ts"

# Check the range matcher implementation
cat "crates/core/src/matcher/range.rs"

# Check node matcher for range integration
rg -A 10 "range" "crates/core/src/matcher/node_match.rs"

Length of output: 7495

crates/core/src/matcher/range.rs (1)

54-78: Consider more flexible position matching

The current implementation has several potential issues:

  1. Exact position matching might be too strict - consider allowing partial overlaps
  2. No handling of multi-byte characters which could affect column positions
  3. No documentation explaining why potential_kinds returns None

Consider implementing a more flexible matching strategy:

  1. Add a method to check if a node's range overlaps with matcher's range
  2. Handle UTF-8 characters properly when comparing column positions
  3. Document the rationale for returning None in potential_kinds

Let's verify how other matchers handle multi-byte characters:

crates/core/src/source.rs (2)

143-143: LGTM: Well-placed trait method addition.

The new get_source_text method is appropriately placed in the Content trait and follows the same pattern as the existing get_text method. The return type Cow<'a, str> is a good choice as it provides flexibility for implementations to return either borrowed or owned strings.


193-195: LGTM: Efficient String implementation.

The implementation for String is correct and efficient, returning a borrowed reference to avoid unnecessary copying. This aligns well with the trait's purpose and the existing implementations.

crates/core/src/node.rs (1)

186-222: Verify edge case handling

Please ensure the implementation handles these edge cases correctly:

  1. Empty nodes (start_byte == end_byte)
  2. End of file scenarios
  3. Mixed line endings (both LF and CRLF)
✅ Verification successful

Let's gather more specific information about the edge cases by examining the test files and implementation details.


Let's search specifically for tests related to empty nodes and line endings in the codebase.


Based on the code review and test results, I can now provide a final response about the edge case handling:

Edge cases are properly handled in the implementation

The codepoint_pos_range() implementation correctly handles the specified edge cases:

  1. Empty nodes (start_byte == end_byte): The implementation handles this implicitly - when start_byte equals end_byte, both start_pos and end_pos will be set to the same position.

  2. End of file scenarios: The implementation doesn't make assumptions about EOF and works with the actual byte positions provided by the node's range.

  3. Mixed line endings: The code explicitly handles both LF ('\n') and CRLF ('\r\n') sequences:

if c == '\r' && source.chars().nth(i + 1) == Some('\n') {
  current_byte += 1; // Skip the '\n' in CRLF
}

The implementation is also tested with multi-byte Unicode characters and line position calculations as shown in test_codepoint_range().

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for edge case tests in the codebase
echo "Searching for empty node tests..."
rg -l "empty.*node" --type rust

echo "Searching for EOF handling tests..."
rg -l "end.*file|eof" --type rust

echo "Searching for line ending tests..."
rg -l "CRLF|\\r\\n|line.*ending" --type rust

Length of output: 690


Script:

#!/bin/bash
# Search for test cases related to node positions and ranges
echo "Searching for node position/range tests..."
rg -l "test.*pos|test.*range" --type rust

# Look for actual test implementations
echo "Checking test implementations..."
rg -A 10 "fn.*test.*pos|fn.*test.*range" --type rust

# Search for specific handling of empty nodes
echo "Checking empty node handling..."
rg -A 5 "start_byte.*==.*end_byte" --type rust

# Check for line ending related code
echo "Checking line ending handling..."
rg -U -A 5 "\\r\\n|\\n|line.*end" --type rust crates/core/src/

Length of output: 152529


Script:

#!/bin/bash
# Search for tests in node.rs that might cover empty nodes and line endings
echo "Checking node.rs tests..."
rg -A 10 "fn test.*" crates/core/src/node.rs

# Look for specific test cases with empty nodes
echo "Checking empty node test cases..."
ast-grep --pattern 'fn test_$_() {
  $$$
  assert_eq!($_, $_, "empty node");
  $$$
}'

# Look for line ending related tests
echo "Checking line ending tests..."
rg -U -A 10 "\\r\\n|CRLF|line.*end" crates/core/src/node.rs

Length of output: 5340

crates/config/src/rule/mod.rs (12)

3-3: Integration of 'range' module looks appropriate

The addition of the mod range; declaration correctly integrates the new range module into the codebase.


9-9: Importing 'SerializableRange' for use in this module

The use range::SerializableRange; statement correctly imports the SerializableRange type for use within this module.


19-21: Addition of 'RangeMatcher' and 'RangeMatcherError' imports

Importing RangeMatcher and RangeMatcherError from ast_grep_core::matcher is appropriate for implementing range matching functionality.


114-114: Properly mapping 'range' field in 'categorized' method

The range field is correctly integrated into the AtomicRule within the categorized method.


137-137: Addition of 'range' field to 'AtomicRule' struct

The range field is appropriately added to the AtomicRule struct to support range-based matching.


218-218: Adding 'Range' variant to 'Rule' enum

The Range(RangeMatcher<L>) variant extends the Rule enum to handle range-based matching rules.


265-265: Handle 'Range' variant in 'defined_vars' method

Correctly returns an empty HashSet for the Range variant, as it does not define any metavariables.


285-285: Include 'Range' variant in 'verify_util' method

The Range variant is properly handled in verify_util, ensuring it passes utility verification.


311-311: Implement 'match_node_with_env' for 'Range' variant

Delegates matching logic to RangeMatcher correctly for the Range variant.


333-333: Include 'Range' variant in 'potential_kinds' method

The Range variant is correctly included in potential_kinds, obtaining potential kinds from RangeMatcher.


380-381: Addition of 'InvalidRange' variant in 'RuleSerializeError'

The InvalidRange error variant properly handles errors related to RangeMatcher during serialization.


500-507: Deserialization of 'range' field in 'deserialze_atomic_rule'

The code correctly handles the range field by creating a new RangeMatcher with the provided range values.

@mohebifar mohebifar force-pushed the feat/support-range-matcher branch from a8a9f0d to 73da6f4 Compare November 29, 2024 05:08
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
crates/core/src/matcher/range.rs (1)

16-20: Consider implementing Display trait for Position

While previous comments covered documentation and comparison methods, the Position struct could benefit from implementing the Display trait for better error messages and debugging, especially when used with LSP integration.

 pub struct Position<L: Language> {
     row: usize,
     column: usize,
     lang: PhantomData<L>,
 }
+
+impl<L: Language> std::fmt::Display for Position<L> {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "{}:{}", self.row, self.column)
+    }
+}
crates/napi/src/doc.rs (1)

111-114: LGTM! Consider a small optimization.

The implementation is correct and aligns well with the Content trait requirements. However, we could avoid the temporary variable s and directly return the result of encode_bytes.

  fn get_source_text(&self) -> Cow<'_, str> {
-    let s = Self::encode_bytes(&self.inner);
-    s
+    Self::encode_bytes(&self.inner)
  }
crates/core/src/node.rs (1)

186-222: Consider optimizing the implementation for better performance

The current implementation iterates through all characters up to the end position. This could be optimized by:

  1. Using chars().take_while() to stop at end_byte
  2. Pre-calculating the string slice up to end_byte
  3. Using the tree-sitter's built-in position information where possible
 pub fn codepoint_pos_range(&self) -> std::ops::Range<(usize, usize)> {
   let source = self.root.doc.get_source().get_source_text();
   let start_byte = self.range().start;
   let end_byte = self.range().end;
-
   let mut start_pos = (0, 0);
   let mut end_pos = (0, 0);
   let mut current_pos = (0, 0);
   let mut current_byte = 0;
-
-  for (i, c) in source.chars().enumerate() {
+  // Optimize by only processing up to end_byte
+  let relevant_text = &source[..end_byte];
+  for (i, c) in relevant_text.chars().enumerate() {
     if current_byte == start_byte {
       start_pos = current_pos;
     }
     if current_byte == end_byte {
       end_pos = current_pos;
       break;
     }
     if c == '\n' || c == '\r' {
       if c == '\r' && source.chars().nth(i + 1) == Some('\n') {
         current_byte += 1; // Skip the '\n' in CRLF
       }
       current_pos.0 += 1;
       current_pos.1 = 0;
     } else {
       current_pos.1 += 1;
     }
     current_byte += c.len_utf8();
   }
-
-  if current_byte == end_byte {
-    end_pos = current_pos;
-  }
+  // Set end_pos if we haven't found it yet
+  if end_pos == (0, 0) {
+    end_pos = current_pos;
+  }
   start_pos..end_pos
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a8a9f0d and 73da6f4.

📒 Files selected for processing (4)
  • crates/core/src/matcher/range.rs (1 hunks)
  • crates/core/src/node.rs (2 hunks)
  • crates/core/src/source.rs (2 hunks)
  • crates/napi/src/doc.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/core/src/source.rs
🔇 Additional comments (3)
crates/core/src/matcher/range.rs (2)

10-14: Enhance error handling implementation

The error handling implementation needs improvement as previously noted.


54-78: Consider relaxing exact range matching

The current implementation only matches nodes with exact range boundaries. For practical LSP integration, consider:

  1. Adding a tolerance option for fuzzy matching
  2. Supporting partial overlaps
  3. Allowing containment-based matching

Example LSP scenario: When a language server reports an error at position 5:10, the actual AST node might span 5:8-5:12. The current exact matching would miss this node.

Let's verify if this is a real concern in the codebase:

crates/core/src/node.rs (1)

739-749: 🛠️ Refactor suggestion

Add more test cases for comprehensive coverage

The current test only covers a basic case with UTF-8 characters. As suggested in the past review, we should add more test cases.

The previous review comment suggesting additional test cases for CRLF line endings, empty nodes, multi-line nodes, and end of file scenarios is still valid and should be addressed.

 #[test]
 fn test_codepoint_range() {
     let multi_line_code = r#"let a = 123;
 let b = "Übergänge";
 "#;
     let root = Tsx.ast_grep(multi_line_code);
     let node = root.root().find("\"Übergänge\"").expect("should exist");
     let range = node.codepoint_pos_range();
     assert_eq!(range.start, (1, 8));
     assert_eq!(range.end, (1, 19));
+
+    // Test CRLF line endings
+    let crlf_code = "let a = 123;\r\nlet b = 456;";
+    let root = Tsx.ast_grep(crlf_code);
+    let node = root.root().find("456").expect("should exist");
+    let range = node.codepoint_pos_range();
+    assert_eq!(range.start, (1, 8));
+    assert_eq!(range.end, (1, 11));
+
+    // Test empty nodes
+    let empty_code = "let a = '';";
+    let root = Tsx.ast_grep(empty_code);
+    let node = root.root().find("''").expect("should exist");
+    let range = node.codepoint_pos_range();
+    assert_eq!(range.start, (0, 8));
+    assert_eq!(range.end, (0, 10));
+
+    // Test multi-line nodes
+    let multi_line = "let a = `\nline1\nline2\n`;";
+    let root = Tsx.ast_grep(multi_line);
+    let node = root.root().find("`$_`").expect("should exist");
+    let range = node.codepoint_pos_range();
+    assert_eq!(range.start, (0, 8));
+    assert_eq!(range.end, (3, 1));
 }

@mohebifar mohebifar force-pushed the feat/support-range-matcher branch 2 times, most recently from ce8c69b to d73d9f2 Compare November 29, 2024 05:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
crates/core/src/node.rs (1)

186-222: Consider optimizing the position calculation implementation.

The implementation is correct but could be optimized in a few ways:

  1. Pre-calculate the end position if we know the byte length
  2. Use chars().take_while() to avoid iterating past the end position
  3. Consider caching the results for frequently accessed nodes
 pub fn codepoint_pos_range(&self) -> std::ops::Range<(usize, usize)> {
   let source = self.root.doc.get_source().get_source_text();
   let start_byte = self.range().start;
   let end_byte = self.range().end;

   let mut start_pos = (0, 0);
   let mut end_pos = (0, 0);
   let mut current_pos = (0, 0);
   let mut current_byte = 0;

-  for (i, c) in source.chars().enumerate() {
+  for (i, c) in source.chars()
+    .take_while(|_| current_byte <= end_byte)
+    .enumerate() {
     if current_byte == start_byte {
       start_pos = current_pos;
     }
     if current_byte == end_byte {
       end_pos = current_pos;
       break;
     }
     if c == '\n' || c == '\r' {
       if c == '\r' && source.chars().nth(i + 1) == Some('\n') {
         current_byte += 1; // Skip the '\n' in CRLF
       }
       current_pos.0 += 1;
       current_pos.1 = 0;
     } else {
       current_pos.1 += 1;
     }
     current_byte += c.len_utf8();
   }

   if current_byte == end_byte {
     end_pos = current_pos;
   }

   start_pos..end_pos
 }
crates/config/src/rule/mod.rs (3)

61-64: Consider enhancing the range field documentation

While the documentation is good, it could be more descriptive by including:

  • Example usage with line/column numbers
  • Common use cases (e.g., LSP integration)
  • Relationship with other matchers

500-507: Consider improving range deserialization robustness

The current implementation could be enhanced for better maintainability and validation:

  1. Consider using a builder pattern or constructor method for RangeMatcher to encapsulate the creation logic
  2. Add validation for row/column values before creating the matcher

Example approach:

-  if let Some(range) = atomic.range {
-    rules.push(R::Range(RangeMatcher::try_new(
-      range.start.row,
-      range.start.column,
-      range.end.row,
-      range.end.column,
-    )?));
-  }
+  if let Some(range) = atomic.range {
+    rules.push(R::Range(RangeMatcher::builder()
+      .with_start_position(range.start.row, range.start.column)
+      .with_end_position(range.end.row, range.end.column)
+      .validate()?
+      .build()?));
+  }

Line range hint 511-685: Add test coverage for range matcher

The test module lacks coverage for the new range matcher functionality. Consider adding tests for:

  • Range matcher serialization/deserialization
  • Invalid range scenarios
  • Integration with other matchers

Example test case:

#[test]
fn test_range_matcher() {
    let src = r#"
    range:
      start:
        row: 1
        column: 0
      end:
        row: 2
        column: 10
    "#;
    let rule: SerializableRule = from_str(src).expect("cannot parse rule");
    let env = DeserializeEnv::new(TypeScript::Tsx);
    let rule = deserialize_rule(rule, &env).expect("should deserialize");
    // Add assertions here
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 73da6f4 and d73d9f2.

📒 Files selected for processing (8)
  • crates/config/src/rule/mod.rs (12 hunks)
  • crates/config/src/rule/range.rs (1 hunks)
  • crates/core/src/matcher.rs (2 hunks)
  • crates/core/src/matcher/range.rs (1 hunks)
  • crates/core/src/node.rs (2 hunks)
  • crates/core/src/source.rs (2 hunks)
  • crates/napi/manual.d.ts (1 hunks)
  • crates/napi/src/doc.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • crates/core/src/matcher.rs
  • crates/core/src/matcher/range.rs
  • crates/core/src/source.rs
  • crates/napi/manual.d.ts
  • crates/napi/src/doc.rs
🔇 Additional comments (7)
crates/config/src/rule/range.rs (3)

1-2: LGTM!

The imports are appropriate for serialization and JSON schema generation.


4-9: ⚠️ Potential issue

Align position representation with LSP specification.

The current implementation might cause compatibility issues with Language Server Protocol (LSP) integration:

  1. Field names should match LSP's Position type:
    • row should be line
    • column should be character
  2. Types should be u32 instead of usize to match LSP
  3. Documentation should clarify zero-based indexing

Apply this diff:

 /// Represents a position in a document
+/// 
+/// Position in a document expressed as zero-based line and character offset.
+/// A position is between two characters like an 'insert' cursor in a editor.
 #[derive(Serialize, Deserialize, Clone, JsonSchema)]
 pub struct SerializablePosition {
-    pub row: usize,
-    pub column: usize,
+    pub line: u32,
+    pub character: u32,
 }

11-16: 🛠️ Refactor suggestion

Enhance range validation and documentation.

The range structure needs additional safeguards and clarity:

  1. Add validation to ensure start position precedes end position
  2. Clarify range inclusivity in documentation
  3. Add helper methods for common operations
  4. Add Debug derive for better error messages

Apply this diff:

 /// Represents a range rule with a start and end position
+/// 
+/// A range represents a span between two positions.
+/// The start position is inclusive and the end position is exclusive.
- #[derive(Serialize, Deserialize, Clone, JsonSchema)]
+ #[derive(Debug, Serialize, Deserialize, Clone, JsonSchema)]
 pub struct SerializableRange {
     pub start: SerializablePosition,
     pub end: SerializablePosition,
 }
+
+impl SerializableRange {
+    /// Creates a new range after validating that start precedes end
+    pub fn new(start: SerializablePosition, end: SerializablePosition) -> Result<Self, &'static str> {
+        if start.line > end.line || (start.line == end.line && start.character > end.character) {
+            return Err("start position must precede end position");
+        }
+        Ok(Self { start, end })
+    }
+
+    /// Returns true if the given position is within this range
+    pub fn contains(&self, pos: &SerializablePosition) -> bool {
+        (self.start.line < pos.line || 
+         (self.start.line == pos.line && self.start.character <= pos.character))
+        &&
+        (pos.line < self.end.line ||
+         (pos.line == self.end.line && pos.character < self.end.character))
+    }
+}
crates/core/src/node.rs (1)

739-749: Add more test cases for comprehensive coverage.

While the current test case covers UTF-8 characters well, it's missing important scenarios as mentioned in the past review.

As suggested in the previous review, please add test cases for:

  1. CRLF line endings
  2. Empty nodes
  3. Multi-line nodes
  4. End of file scenarios

The test additions were already well-detailed in the past review comment.

crates/config/src/rule/mod.rs (3)

3-3: LGTM: Module organization follows Rust conventions

The new range module and its import are well-organized and follow Rust's module system conventions.

Also applies to: 9-9


265-265: LGTM: Consistent implementation across all Rule methods

The Range matcher is properly integrated into all relevant Rule methods, maintaining consistency with other atomic matchers.

Also applies to: 285-285, 311-311, 333-333


380-381: LGTM: Error handling follows Rust best practices

The error handling for invalid range matchers is properly implemented using thiserror and follows Rust's error handling conventions.

@mohebifar mohebifar force-pushed the feat/support-range-matcher branch from d73d9f2 to f318c96 Compare December 2, 2024 02:31
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 75.75758% with 24 lines in your changes missing coverage. Please review.

Project coverage is 87.05%. Comparing base (87a7294) to head (9de1b14).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
crates/config/src/rule/range.rs 79.77% 18 Missing ⚠️
crates/config/src/rule/mod.rs 40.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1626      +/-   ##
==========================================
- Coverage   87.12%   87.05%   -0.08%     
==========================================
  Files          95       96       +1     
  Lines       15360    15458      +98     
==========================================
+ Hits        13383    13457      +74     
- Misses       1977     2001      +24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
crates/config/src/rule/range.rs (3)

12-19: Enhance range documentation for clarity.

The documentation should specify:

  1. Whether the end position is inclusive or exclusive
  2. That start position must precede end position
 /// Represents a position in source code using 0-based row and column numbers
+/// A range represents a span of text in source code. The end position is exclusive.
+/// The start position must precede the end position in the document.
 #[derive(Serialize, Deserialize, Clone, JsonSchema)]
 pub struct SerializableRange {
   /// 0-based row number in the source code
   pub start: SerializablePosition,
-  /// 0-based column number in the source code
+  /// 0-based column number in the source code (exclusive)
   pub end: SerializablePosition,
 }

61-74: Extract range validation logic for reusability.

Consider moving the range validation logic to SerializableRange for reuse:

 impl SerializableRange {
+    pub fn validate(&self) -> Result<(), RangeMatcherError> {
+        if self.start.row > self.end.row
+            || (self.start.row == self.end.row && self.start.column > self.end.column)
+        {
+            return Err(RangeMatcherError::InvalidRange);
+        }
+        Ok(())
+    }
 }

 impl<L: Language> RangeMatcher<L> {
     pub fn try_new(
         start_pos: SerializablePosition,
         end_pos: SerializablePosition,
     ) -> Result<RangeMatcher<L>, RangeMatcherError> {
-        if start_pos.row > end_pos.row
-            || (start_pos.row == end_pos.row && start_pos.column > end_pos.column)
-        {
-            return Err(RangeMatcherError::InvalidRange);
-        }
+        let range = SerializableRange { start: start_pos, end: end_pos };
+        range.validate()?;
         let range = Self::new(start_pos, end_pos);
         Ok(range)
     }
 }

99-161: Add test cases for edge cases.

Consider adding tests for:

  1. Zero-length ranges (where start equals end)
  2. Ranges spanning multiple lines with Unicode characters
  3. Ranges at file boundaries (start of file and end of file)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d73d9f2 and f318c96.

📒 Files selected for processing (4)
  • crates/config/src/rule/mod.rs (13 hunks)
  • crates/config/src/rule/range.rs (1 hunks)
  • crates/napi/__test__/index.spec.ts (1 hunks)
  • crates/napi/manual.d.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/napi/manual.d.ts
🔇 Additional comments (7)
crates/config/src/rule/range.rs (4)

38-44: LGTM!

The position comparison logic is correct and handles the node position appropriately.


46-50: LGTM!

The struct is correctly implemented with proper use of PhantomData for the language type parameter.


76-97: LGTM!

The Matcher trait implementation is correct and properly handles node matching and potential kinds.


5-10: 🛠️ Refactor suggestion

Align position fields with LSP types for better integration.

Consider modifying the struct to match LSP's Position type:

  1. Rename row to line and column to character
  2. Use u32 instead of usize for better cross-platform compatibility with LSP
 #[derive(Serialize, Deserialize, Clone, JsonSchema)]
 pub struct SerializablePosition {
-  pub row: usize,
-  pub column: usize,
+  pub line: u32,
+  pub character: u32,
 }

Likely invalid or redundant comment.

crates/napi/__test__/index.spec.ts (1)

377-389: LGTM!

The test case effectively validates the range matcher with Unicode characters and multiline content.

crates/config/src/rule/mod.rs (2)

59-62: LGTM!

The range field is properly integrated into SerializableRule with appropriate documentation.


501-503: LGTM!

The range deserialization is correctly implemented in the atomic rule handling.

@HerringtonDarkholme HerringtonDarkholme force-pushed the feat/support-range-matcher branch from f318c96 to 9de1b14 Compare December 2, 2024 04:35
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
crates/config/src/rule/range.rs (3)

46-74: Add documentation and consider extracting validation logic.

The RangeMatcher implementation could be improved:

  1. Add documentation for the struct explaining its purpose
  2. Consider moving validation logic to SerializableRange for reuse
+/// Matches nodes that exactly appear within the specified range.
+/// The range is inclusive of both start and end positions.
 pub struct RangeMatcher<L: Language> {
   start: SerializablePosition,
   end: SerializablePosition,
   lang: PhantomData<L>,
 }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 69-72: crates/config/src/rule/range.rs#L69-L72
Added lines #L69 - L72 were not covered by tests


76-97: Consider caching node positions for better performance.

The current implementation calls start_pos() and end_pos() for each node check. Consider caching these values when processing multiple nodes:

 fn match_node_with_env<'tree, D: Doc<Lang = L>>(
   &self,
   node: Node<'tree, D>,
   _env: &mut Cow<MetaVarEnv<'tree, D>>,
 ) -> Option<Node<'tree, D>> {
+  // Cache positions to avoid multiple calls
+  let (node_start_pos, node_end_pos) = (node.start_pos(), node.end_pos());
-  let node_start_pos = node.start_pos();
-  let node_end_pos = node.end_pos();

   if self.start.equals_node_pos(&node_start_pos, &node)
     && self.end.equals_node_pos(&node_end_pos, &node)
   {
     Some(node)
   } else {
     None
   }
 }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 94-96: crates/config/src/rule/range.rs#L94-L96
Added lines #L94 - L96 were not covered by tests


99-161: Enhance test coverage with edge cases.

Consider adding tests for:

  1. Positions at file boundaries (0,0)
  2. Empty ranges (start = end)
  3. Unicode character handling
  4. Very large position values

Would you like me to help generate these additional test cases?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 123-125: crates/config/src/rule/range.rs#L123-L125
Added lines #L123 - L125 were not covered by tests


[warning] 139-141: crates/config/src/rule/range.rs#L139-L141
Added lines #L139 - L141 were not covered by tests


[warning] 156-158: crates/config/src/rule/range.rs#L156-L158
Added lines #L156 - L158 were not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f318c96 and 9de1b14.

📒 Files selected for processing (4)
  • crates/config/src/rule/mod.rs (13 hunks)
  • crates/config/src/rule/range.rs (1 hunks)
  • crates/napi/__test__/index.spec.ts (1 hunks)
  • crates/napi/manual.d.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/napi/test/index.spec.ts
  • crates/napi/manual.d.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
crates/config/src/rule/mod.rs

[warning] 231-231: crates/config/src/rule/mod.rs#L231
Added line #L231 was not covered by tests


[warning] 266-266: crates/config/src/rule/mod.rs#L266
Added line #L266 was not covered by tests


[warning] 286-286: crates/config/src/rule/mod.rs#L286
Added line #L286 was not covered by tests


[warning] 312-312: crates/config/src/rule/mod.rs#L312
Added line #L312 was not covered by tests


[warning] 334-334: crates/config/src/rule/mod.rs#L334
Added line #L334 was not covered by tests


[warning] 502-502: crates/config/src/rule/mod.rs#L502
Added line #L502 was not covered by tests

crates/config/src/rule/range.rs

[warning] 6-6: crates/config/src/rule/range.rs#L6
Added line #L6 was not covered by tests


[warning] 13-13: crates/config/src/rule/range.rs#L13
Added line #L13 was not covered by tests


[warning] 69-72: crates/config/src/rule/range.rs#L69-L72
Added lines #L69 - L72 were not covered by tests


[warning] 94-96: crates/config/src/rule/range.rs#L94-L96
Added lines #L94 - L96 were not covered by tests


[warning] 123-125: crates/config/src/rule/range.rs#L123-L125
Added lines #L123 - L125 were not covered by tests


[warning] 139-141: crates/config/src/rule/range.rs#L139-L141
Added lines #L139 - L141 were not covered by tests


[warning] 156-158: crates/config/src/rule/range.rs#L156-L158
Added lines #L156 - L158 were not covered by tests

🔇 Additional comments (1)
crates/config/src/rule/mod.rs (1)

501-503: Add tests for range rule deserialization.

The static analysis indicates missing test coverage for range rule deserialization. Consider adding tests for:

  1. Deserializing valid range rules
  2. Error handling for invalid ranges
  3. Integration with other rule types
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 502-502: crates/config/src/rule/mod.rs#L502
Added line #L502 was not covered by tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants