feat: Add support for range matcher#1626
Conversation
WalkthroughThe pull request introduces a new module Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Hi @mohebifar, thanks for the code contribution. |
cc99316 to
a8a9f0d
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (9)
crates/config/src/rule/range.rs (1)
4-9: Enhance documentation and add validation for position fields.While the structure is well-designed, consider these improvements:
- Expand documentation to explain:
- Zero vs. one-based indexing for line/column
- Expected range of values
- Add validation to prevent invalid positions
Consider adding a constructor with validation:
impl SerializablePosition { pub fn new(line: usize, column: usize) -> Result<Self, &'static str> { // Assuming 0-based indexing if line == 0 && column == 0 { return Err("Position cannot be 0,0"); } Ok(Self { line, column }) } }crates/napi/manual.d.ts (2)
36-41: LGTM! Consider enhancing type documentation.The
Positioninterface is well-defined and aligns with standard LSP implementations. The JSDoc comments clearly indicate 0-based indexing.Consider adding a brief interface-level JSDoc comment to provide context:
+/** + * Represents a position in the source code using 0-based indexing. + * This aligns with LSP's position representation. + */ export interface Position { /** 0-indexed row number. */ row: number /** 0-indexed column number. */ column: number }
43-46: Add JSDoc documentation for the Range interface.The
Rangeinterface structure is correct, but would benefit from clear documentation about range boundaries (inclusive/exclusive) and usage context.Consider adding detailed JSDoc documentation:
+/** + * Represents a range in the source code. + * @property start The position where the range begins (inclusive) + * @property end The position where the range ends (exclusive) + * + * This interface is used by the RangeMatcher to select nodes within specific boundaries, + * particularly useful for LSP integration scenarios. + */ export interface Range { start: Position end: Position }crates/core/src/matcher/range.rs (2)
80-128: Expand test coverageThe current test suite could be enhanced with additional test cases:
- Error cases (invalid ranges)
- Edge cases (zero-length ranges, end of file)
- Multi-line ranges
- Overlapping ranges
- Position validation in try_new()
Add these test cases:
#[test] fn test_invalid_range() { let result = RangeMatcher::<Tsx>::try_new(0, 10, 0, 5); assert!(matches!(result, Err(RangeMatcherError::InvalidRange))); } #[test] fn test_multi_line_range() { let cand = pattern_node("class A {\n a = 123\n}"); let cand = cand.root(); let pattern = RangeMatcher::new(0, 10, 1, 8); assert!(pattern.find_node(cand.clone()).is_some()); }
1-128: Consider architectural improvements for better integrationWhile the basic implementation is sound, consider these architectural improvements:
- Add a builder pattern for constructing ranges with validation
- Implement Debug and Display traits for better error messages
- Consider adding a RangeMatcherConfig for customizable matching behavior (exact vs. overlap)
- Add serialization support for integration with the rule system
These improvements would make the RangeMatcher more flexible and easier to integrate with the larger system, especially for LSP integration mentioned in the PR objectives.
crates/core/src/matcher.rs (1)
11-11: Update module documentation to include RangeMatcherThe file's doc comment lists "notable implementations" but should be updated to include the new
RangeMatcherimplementation, as it's a significant addition for LSP integration.Apply this diff to update the documentation:
//! `Matcher` has three notable implementations in this module: //! * Pattern: matches against a tree-sitter node based on its tree structure. //! * KindMatcher: matches a node based on its `kind` //! * RegexMatcher: matches a node based on its textual content using regex. +//! * RangeMatcher: matches a node based on its position in source code.crates/core/src/source.rs (1)
Line range hint
143-195: Add tests for the new get_source_text method.While the implementation is straightforward, it would be beneficial to add tests to verify the behavior and prevent regressions.
Consider adding the following test to the test module:
#[test] fn test_get_source_text() { let src = "var a = 1234".to_string(); assert_eq!(src.get_source_text(), "var a = 1234"); }crates/core/src/node.rs (1)
186-222: LGTM! The implementation correctly handles UTF-8 and line endings.The method successfully converts byte offsets to codepoint positions while properly handling UTF-8 characters and CRLF line endings.
Consider caching the results for frequently accessed nodes to improve performance, as the current implementation needs to iterate through all characters up to the end position.
+use std::cell::RefCell; +use std::collections::HashMap; + pub struct Node<'r, D: Doc> { pub(crate) inner: tree_sitter::Node<'r>, pub(crate) root: &'r Root<D>, + pos_cache: RefCell<HashMap<usize, std::ops::Range<(usize, usize)>>>, }crates/config/src/rule/mod.rs (1)
61-62: Improve documentation comments for 'range' fieldCapitalize the first letter of the sentence and ensure consistent punctuation.
Apply this diff to improve the comments:
/// `range` accepts a range object. -/// the target node must exactly appear in the range. +/// The target node must exactly appear in the range.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
crates/config/src/rule/mod.rs(12 hunks)crates/config/src/rule/range.rs(1 hunks)crates/core/src/matcher.rs(2 hunks)crates/core/src/matcher/range.rs(1 hunks)crates/core/src/node.rs(2 hunks)crates/core/src/source.rs(2 hunks)crates/napi/manual.d.ts(1 hunks)
🔇 Additional comments (18)
crates/config/src/rule/range.rs (1)
1-2: LGTM!
The imports are appropriate for serialization and JSON schema generation capabilities.
crates/napi/manual.d.ts (1)
53-54: Enhance documentation and verify matcher combinations.
While the addition of the range property is well-placed, its documentation could be more comprehensive, especially regarding interaction with other matchers.
- Enhance the documentation:
- /** The exact range of the node in the source code. */
+ /**
+ * The exact range of the node in the source code.
+ * When specified, matches nodes that exactly span this range.
+ * Can be combined with other matchers like `pattern` or `kind` for more precise matching.
+ * Particularly useful for LSP integration scenarios where positions are provided in row/column format.
+ */
range?: Range- Let's verify the integration with other matchers:
✅ Verification successful
Let me try a different search strategy to verify the integration of range matcher with other matchers.
Let's check the test file and range matcher implementation to verify the integration.
Documentation enhancement recommended, but range matcher integration is well-implemented
The range matcher is thoroughly implemented and tested:
- It's properly integrated with other matchers as shown in the test cases (e.g., combining with
kindmatcher and pattern matching) - The implementation in
range.rsshows it follows the same matcher interface used by other matchers - Test coverage includes various scenarios: basic matching, non-matching cases, unicode support, and integration with other features
Suggested documentation enhancement:
- /** The exact range of the node in the source code. */
+ /**
+ * The exact range of the node in the source code.
+ * Matches nodes that precisely span the specified start and end positions.
+ * Can be used standalone or combined with other matchers (pattern, kind).
+ * Positions are zero-based and specified in {line, column} format.
+ */
range?: Range🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for test cases that combine range matcher with other matchers
# Look for test cases combining range with other matchers
rg -l "range.*pattern|pattern.*range" "test" || echo "No combined matcher tests found"
# Check if there are examples in documentation
fd -e md | xargs rg -l "range.*matcher" || echo "No range matcher documentation found"
Length of output: 287
Script:
#!/bin/bash
# Search for test files in the entire repository
fd -e test.ts -e test.js -e spec.ts -e spec.js
# Search for any files containing 'range' in combination with matching-related terms
rg -l "range.*match|match.*range"
# Look for actual implementation of range matcher to understand integration
ast-grep --pattern 'function $_($$_) {
$$$
range
$$$
}'
Length of output: 435
Script:
#!/bin/bash
# Check the test file for range matcher usage
rg -A 5 -B 5 "range" "crates/napi/__test__/index.spec.ts"
# Check the range matcher implementation
cat "crates/core/src/matcher/range.rs"
# Check node matcher for range integration
rg -A 10 "range" "crates/core/src/matcher/node_match.rs"
Length of output: 7495
crates/core/src/matcher/range.rs (1)
54-78: Consider more flexible position matching
The current implementation has several potential issues:
- Exact position matching might be too strict - consider allowing partial overlaps
- No handling of multi-byte characters which could affect column positions
- No documentation explaining why
potential_kindsreturns None
Consider implementing a more flexible matching strategy:
- Add a method to check if a node's range overlaps with matcher's range
- Handle UTF-8 characters properly when comparing column positions
- Document the rationale for returning None in potential_kinds
Let's verify how other matchers handle multi-byte characters:
crates/core/src/source.rs (2)
143-143: LGTM: Well-placed trait method addition.
The new get_source_text method is appropriately placed in the Content trait and follows the same pattern as the existing get_text method. The return type Cow<'a, str> is a good choice as it provides flexibility for implementations to return either borrowed or owned strings.
193-195: LGTM: Efficient String implementation.
The implementation for String is correct and efficient, returning a borrowed reference to avoid unnecessary copying. This aligns well with the trait's purpose and the existing implementations.
crates/core/src/node.rs (1)
186-222: Verify edge case handling
Please ensure the implementation handles these edge cases correctly:
- Empty nodes (start_byte == end_byte)
- End of file scenarios
- Mixed line endings (both LF and CRLF)
✅ Verification successful
Let's gather more specific information about the edge cases by examining the test files and implementation details.
Let's search specifically for tests related to empty nodes and line endings in the codebase.
Based on the code review and test results, I can now provide a final response about the edge case handling:
Edge cases are properly handled in the implementation
The codepoint_pos_range() implementation correctly handles the specified edge cases:
-
Empty nodes (start_byte == end_byte): The implementation handles this implicitly - when start_byte equals end_byte, both start_pos and end_pos will be set to the same position.
-
End of file scenarios: The implementation doesn't make assumptions about EOF and works with the actual byte positions provided by the node's range.
-
Mixed line endings: The code explicitly handles both LF ('\n') and CRLF ('\r\n') sequences:
if c == '\r' && source.chars().nth(i + 1) == Some('\n') {
current_byte += 1; // Skip the '\n' in CRLF
}The implementation is also tested with multi-byte Unicode characters and line position calculations as shown in test_codepoint_range().
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for edge case tests in the codebase
echo "Searching for empty node tests..."
rg -l "empty.*node" --type rust
echo "Searching for EOF handling tests..."
rg -l "end.*file|eof" --type rust
echo "Searching for line ending tests..."
rg -l "CRLF|\\r\\n|line.*ending" --type rust
Length of output: 690
Script:
#!/bin/bash
# Search for test cases related to node positions and ranges
echo "Searching for node position/range tests..."
rg -l "test.*pos|test.*range" --type rust
# Look for actual test implementations
echo "Checking test implementations..."
rg -A 10 "fn.*test.*pos|fn.*test.*range" --type rust
# Search for specific handling of empty nodes
echo "Checking empty node handling..."
rg -A 5 "start_byte.*==.*end_byte" --type rust
# Check for line ending related code
echo "Checking line ending handling..."
rg -U -A 5 "\\r\\n|\\n|line.*end" --type rust crates/core/src/
Length of output: 152529
Script:
#!/bin/bash
# Search for tests in node.rs that might cover empty nodes and line endings
echo "Checking node.rs tests..."
rg -A 10 "fn test.*" crates/core/src/node.rs
# Look for specific test cases with empty nodes
echo "Checking empty node test cases..."
ast-grep --pattern 'fn test_$_() {
$$$
assert_eq!($_, $_, "empty node");
$$$
}'
# Look for line ending related tests
echo "Checking line ending tests..."
rg -U -A 10 "\\r\\n|CRLF|line.*end" crates/core/src/node.rs
Length of output: 5340
crates/config/src/rule/mod.rs (12)
3-3: Integration of 'range' module looks appropriate
The addition of the mod range; declaration correctly integrates the new range module into the codebase.
9-9: Importing 'SerializableRange' for use in this module
The use range::SerializableRange; statement correctly imports the SerializableRange type for use within this module.
19-21: Addition of 'RangeMatcher' and 'RangeMatcherError' imports
Importing RangeMatcher and RangeMatcherError from ast_grep_core::matcher is appropriate for implementing range matching functionality.
114-114: Properly mapping 'range' field in 'categorized' method
The range field is correctly integrated into the AtomicRule within the categorized method.
137-137: Addition of 'range' field to 'AtomicRule' struct
The range field is appropriately added to the AtomicRule struct to support range-based matching.
218-218: Adding 'Range' variant to 'Rule' enum
The Range(RangeMatcher<L>) variant extends the Rule enum to handle range-based matching rules.
265-265: Handle 'Range' variant in 'defined_vars' method
Correctly returns an empty HashSet for the Range variant, as it does not define any metavariables.
285-285: Include 'Range' variant in 'verify_util' method
The Range variant is properly handled in verify_util, ensuring it passes utility verification.
311-311: Implement 'match_node_with_env' for 'Range' variant
Delegates matching logic to RangeMatcher correctly for the Range variant.
333-333: Include 'Range' variant in 'potential_kinds' method
The Range variant is correctly included in potential_kinds, obtaining potential kinds from RangeMatcher.
380-381: Addition of 'InvalidRange' variant in 'RuleSerializeError'
The InvalidRange error variant properly handles errors related to RangeMatcher during serialization.
500-507: Deserialization of 'range' field in 'deserialze_atomic_rule'
The code correctly handles the range field by creating a new RangeMatcher with the provided range values.
a8a9f0d to
73da6f4
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (3)
crates/core/src/matcher/range.rs (1)
16-20: Consider implementing Display trait for PositionWhile previous comments covered documentation and comparison methods, the Position struct could benefit from implementing the Display trait for better error messages and debugging, especially when used with LSP integration.
pub struct Position<L: Language> { row: usize, column: usize, lang: PhantomData<L>, } + +impl<L: Language> std::fmt::Display for Position<L> { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!(f, "{}:{}", self.row, self.column) + } +}crates/napi/src/doc.rs (1)
111-114: LGTM! Consider a small optimization.The implementation is correct and aligns well with the
Contenttrait requirements. However, we could avoid the temporary variablesand directly return the result ofencode_bytes.fn get_source_text(&self) -> Cow<'_, str> { - let s = Self::encode_bytes(&self.inner); - s + Self::encode_bytes(&self.inner) }crates/core/src/node.rs (1)
186-222: Consider optimizing the implementation for better performanceThe current implementation iterates through all characters up to the end position. This could be optimized by:
- Using
chars().take_while()to stop at end_byte- Pre-calculating the string slice up to end_byte
- Using the tree-sitter's built-in position information where possible
pub fn codepoint_pos_range(&self) -> std::ops::Range<(usize, usize)> { let source = self.root.doc.get_source().get_source_text(); let start_byte = self.range().start; let end_byte = self.range().end; - let mut start_pos = (0, 0); let mut end_pos = (0, 0); let mut current_pos = (0, 0); let mut current_byte = 0; - - for (i, c) in source.chars().enumerate() { + // Optimize by only processing up to end_byte + let relevant_text = &source[..end_byte]; + for (i, c) in relevant_text.chars().enumerate() { if current_byte == start_byte { start_pos = current_pos; } if current_byte == end_byte { end_pos = current_pos; break; } if c == '\n' || c == '\r' { if c == '\r' && source.chars().nth(i + 1) == Some('\n') { current_byte += 1; // Skip the '\n' in CRLF } current_pos.0 += 1; current_pos.1 = 0; } else { current_pos.1 += 1; } current_byte += c.len_utf8(); } - - if current_byte == end_byte { - end_pos = current_pos; - } + // Set end_pos if we haven't found it yet + if end_pos == (0, 0) { + end_pos = current_pos; + } start_pos..end_pos }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
crates/core/src/matcher/range.rs(1 hunks)crates/core/src/node.rs(2 hunks)crates/core/src/source.rs(2 hunks)crates/napi/src/doc.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/core/src/source.rs
🔇 Additional comments (3)
crates/core/src/matcher/range.rs (2)
10-14: Enhance error handling implementation
The error handling implementation needs improvement as previously noted.
54-78: Consider relaxing exact range matching
The current implementation only matches nodes with exact range boundaries. For practical LSP integration, consider:
- Adding a tolerance option for fuzzy matching
- Supporting partial overlaps
- Allowing containment-based matching
Example LSP scenario: When a language server reports an error at position 5:10, the actual AST node might span 5:8-5:12. The current exact matching would miss this node.
Let's verify if this is a real concern in the codebase:
crates/core/src/node.rs (1)
739-749: 🛠️ Refactor suggestion
Add more test cases for comprehensive coverage
The current test only covers a basic case with UTF-8 characters. As suggested in the past review, we should add more test cases.
The previous review comment suggesting additional test cases for CRLF line endings, empty nodes, multi-line nodes, and end of file scenarios is still valid and should be addressed.
#[test]
fn test_codepoint_range() {
let multi_line_code = r#"let a = 123;
let b = "Übergänge";
"#;
let root = Tsx.ast_grep(multi_line_code);
let node = root.root().find("\"Übergänge\"").expect("should exist");
let range = node.codepoint_pos_range();
assert_eq!(range.start, (1, 8));
assert_eq!(range.end, (1, 19));
+
+ // Test CRLF line endings
+ let crlf_code = "let a = 123;\r\nlet b = 456;";
+ let root = Tsx.ast_grep(crlf_code);
+ let node = root.root().find("456").expect("should exist");
+ let range = node.codepoint_pos_range();
+ assert_eq!(range.start, (1, 8));
+ assert_eq!(range.end, (1, 11));
+
+ // Test empty nodes
+ let empty_code = "let a = '';";
+ let root = Tsx.ast_grep(empty_code);
+ let node = root.root().find("''").expect("should exist");
+ let range = node.codepoint_pos_range();
+ assert_eq!(range.start, (0, 8));
+ assert_eq!(range.end, (0, 10));
+
+ // Test multi-line nodes
+ let multi_line = "let a = `\nline1\nline2\n`;";
+ let root = Tsx.ast_grep(multi_line);
+ let node = root.root().find("`$_`").expect("should exist");
+ let range = node.codepoint_pos_range();
+ assert_eq!(range.start, (0, 8));
+ assert_eq!(range.end, (3, 1));
}ce8c69b to
d73d9f2
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (4)
crates/core/src/node.rs (1)
186-222: Consider optimizing the position calculation implementation.The implementation is correct but could be optimized in a few ways:
- Pre-calculate the end position if we know the byte length
- Use
chars().take_while()to avoid iterating past the end position- Consider caching the results for frequently accessed nodes
pub fn codepoint_pos_range(&self) -> std::ops::Range<(usize, usize)> { let source = self.root.doc.get_source().get_source_text(); let start_byte = self.range().start; let end_byte = self.range().end; let mut start_pos = (0, 0); let mut end_pos = (0, 0); let mut current_pos = (0, 0); let mut current_byte = 0; - for (i, c) in source.chars().enumerate() { + for (i, c) in source.chars() + .take_while(|_| current_byte <= end_byte) + .enumerate() { if current_byte == start_byte { start_pos = current_pos; } if current_byte == end_byte { end_pos = current_pos; break; } if c == '\n' || c == '\r' { if c == '\r' && source.chars().nth(i + 1) == Some('\n') { current_byte += 1; // Skip the '\n' in CRLF } current_pos.0 += 1; current_pos.1 = 0; } else { current_pos.1 += 1; } current_byte += c.len_utf8(); } if current_byte == end_byte { end_pos = current_pos; } start_pos..end_pos }crates/config/src/rule/mod.rs (3)
61-64: Consider enhancing the range field documentationWhile the documentation is good, it could be more descriptive by including:
- Example usage with line/column numbers
- Common use cases (e.g., LSP integration)
- Relationship with other matchers
500-507: Consider improving range deserialization robustnessThe current implementation could be enhanced for better maintainability and validation:
- Consider using a builder pattern or constructor method for RangeMatcher to encapsulate the creation logic
- Add validation for row/column values before creating the matcher
Example approach:
- if let Some(range) = atomic.range { - rules.push(R::Range(RangeMatcher::try_new( - range.start.row, - range.start.column, - range.end.row, - range.end.column, - )?)); - } + if let Some(range) = atomic.range { + rules.push(R::Range(RangeMatcher::builder() + .with_start_position(range.start.row, range.start.column) + .with_end_position(range.end.row, range.end.column) + .validate()? + .build()?)); + }
Line range hint
511-685: Add test coverage for range matcherThe test module lacks coverage for the new range matcher functionality. Consider adding tests for:
- Range matcher serialization/deserialization
- Invalid range scenarios
- Integration with other matchers
Example test case:
#[test] fn test_range_matcher() { let src = r#" range: start: row: 1 column: 0 end: row: 2 column: 10 "#; let rule: SerializableRule = from_str(src).expect("cannot parse rule"); let env = DeserializeEnv::new(TypeScript::Tsx); let rule = deserialize_rule(rule, &env).expect("should deserialize"); // Add assertions here }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
crates/config/src/rule/mod.rs(12 hunks)crates/config/src/rule/range.rs(1 hunks)crates/core/src/matcher.rs(2 hunks)crates/core/src/matcher/range.rs(1 hunks)crates/core/src/node.rs(2 hunks)crates/core/src/source.rs(2 hunks)crates/napi/manual.d.ts(1 hunks)crates/napi/src/doc.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- crates/core/src/matcher.rs
- crates/core/src/matcher/range.rs
- crates/core/src/source.rs
- crates/napi/manual.d.ts
- crates/napi/src/doc.rs
🔇 Additional comments (7)
crates/config/src/rule/range.rs (3)
1-2: LGTM!
The imports are appropriate for serialization and JSON schema generation.
4-9:
Align position representation with LSP specification.
The current implementation might cause compatibility issues with Language Server Protocol (LSP) integration:
- Field names should match LSP's Position type:
rowshould belinecolumnshould becharacter
- Types should be
u32instead ofusizeto match LSP - Documentation should clarify zero-based indexing
Apply this diff:
/// Represents a position in a document
+///
+/// Position in a document expressed as zero-based line and character offset.
+/// A position is between two characters like an 'insert' cursor in a editor.
#[derive(Serialize, Deserialize, Clone, JsonSchema)]
pub struct SerializablePosition {
- pub row: usize,
- pub column: usize,
+ pub line: u32,
+ pub character: u32,
}11-16: 🛠️ Refactor suggestion
Enhance range validation and documentation.
The range structure needs additional safeguards and clarity:
- Add validation to ensure start position precedes end position
- Clarify range inclusivity in documentation
- Add helper methods for common operations
- Add Debug derive for better error messages
Apply this diff:
/// Represents a range rule with a start and end position
+///
+/// A range represents a span between two positions.
+/// The start position is inclusive and the end position is exclusive.
- #[derive(Serialize, Deserialize, Clone, JsonSchema)]
+ #[derive(Debug, Serialize, Deserialize, Clone, JsonSchema)]
pub struct SerializableRange {
pub start: SerializablePosition,
pub end: SerializablePosition,
}
+
+impl SerializableRange {
+ /// Creates a new range after validating that start precedes end
+ pub fn new(start: SerializablePosition, end: SerializablePosition) -> Result<Self, &'static str> {
+ if start.line > end.line || (start.line == end.line && start.character > end.character) {
+ return Err("start position must precede end position");
+ }
+ Ok(Self { start, end })
+ }
+
+ /// Returns true if the given position is within this range
+ pub fn contains(&self, pos: &SerializablePosition) -> bool {
+ (self.start.line < pos.line ||
+ (self.start.line == pos.line && self.start.character <= pos.character))
+ &&
+ (pos.line < self.end.line ||
+ (pos.line == self.end.line && pos.character < self.end.character))
+ }
+}crates/core/src/node.rs (1)
739-749: Add more test cases for comprehensive coverage.
While the current test case covers UTF-8 characters well, it's missing important scenarios as mentioned in the past review.
As suggested in the previous review, please add test cases for:
- CRLF line endings
- Empty nodes
- Multi-line nodes
- End of file scenarios
The test additions were already well-detailed in the past review comment.
crates/config/src/rule/mod.rs (3)
3-3: LGTM: Module organization follows Rust conventions
The new range module and its import are well-organized and follow Rust's module system conventions.
Also applies to: 9-9
265-265: LGTM: Consistent implementation across all Rule methods
The Range matcher is properly integrated into all relevant Rule methods, maintaining consistency with other atomic matchers.
Also applies to: 285-285, 311-311, 333-333
380-381: LGTM: Error handling follows Rust best practices
The error handling for invalid range matchers is properly implemented using thiserror and follows Rust's error handling conventions.
d73d9f2 to
f318c96
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1626 +/- ##
==========================================
- Coverage 87.12% 87.05% -0.08%
==========================================
Files 95 96 +1
Lines 15360 15458 +98
==========================================
+ Hits 13383 13457 +74
- Misses 1977 2001 +24 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
crates/config/src/rule/range.rs (3)
12-19: Enhance range documentation for clarity.The documentation should specify:
- Whether the end position is inclusive or exclusive
- That start position must precede end position
/// Represents a position in source code using 0-based row and column numbers +/// A range represents a span of text in source code. The end position is exclusive. +/// The start position must precede the end position in the document. #[derive(Serialize, Deserialize, Clone, JsonSchema)] pub struct SerializableRange { /// 0-based row number in the source code pub start: SerializablePosition, - /// 0-based column number in the source code + /// 0-based column number in the source code (exclusive) pub end: SerializablePosition, }
61-74: Extract range validation logic for reusability.Consider moving the range validation logic to SerializableRange for reuse:
impl SerializableRange { + pub fn validate(&self) -> Result<(), RangeMatcherError> { + if self.start.row > self.end.row + || (self.start.row == self.end.row && self.start.column > self.end.column) + { + return Err(RangeMatcherError::InvalidRange); + } + Ok(()) + } } impl<L: Language> RangeMatcher<L> { pub fn try_new( start_pos: SerializablePosition, end_pos: SerializablePosition, ) -> Result<RangeMatcher<L>, RangeMatcherError> { - if start_pos.row > end_pos.row - || (start_pos.row == end_pos.row && start_pos.column > end_pos.column) - { - return Err(RangeMatcherError::InvalidRange); - } + let range = SerializableRange { start: start_pos, end: end_pos }; + range.validate()?; let range = Self::new(start_pos, end_pos); Ok(range) } }
99-161: Add test cases for edge cases.Consider adding tests for:
- Zero-length ranges (where start equals end)
- Ranges spanning multiple lines with Unicode characters
- Ranges at file boundaries (start of file and end of file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
crates/config/src/rule/mod.rs(13 hunks)crates/config/src/rule/range.rs(1 hunks)crates/napi/__test__/index.spec.ts(1 hunks)crates/napi/manual.d.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/napi/manual.d.ts
🔇 Additional comments (7)
crates/config/src/rule/range.rs (4)
38-44: LGTM!
The position comparison logic is correct and handles the node position appropriately.
46-50: LGTM!
The struct is correctly implemented with proper use of PhantomData for the language type parameter.
76-97: LGTM!
The Matcher trait implementation is correct and properly handles node matching and potential kinds.
5-10: 🛠️ Refactor suggestion
Align position fields with LSP types for better integration.
Consider modifying the struct to match LSP's Position type:
- Rename
rowtolineandcolumntocharacter - Use
u32instead ofusizefor better cross-platform compatibility with LSP
#[derive(Serialize, Deserialize, Clone, JsonSchema)]
pub struct SerializablePosition {
- pub row: usize,
- pub column: usize,
+ pub line: u32,
+ pub character: u32,
}Likely invalid or redundant comment.
crates/napi/__test__/index.spec.ts (1)
377-389: LGTM!
The test case effectively validates the range matcher with Unicode characters and multiline content.
crates/config/src/rule/mod.rs (2)
59-62: LGTM!
The range field is properly integrated into SerializableRule with appropriate documentation.
501-503: LGTM!
The range deserialization is correctly implemented in the atomic rule handling.
f318c96 to
9de1b14
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (3)
crates/config/src/rule/range.rs (3)
46-74: Add documentation and consider extracting validation logic.The RangeMatcher implementation could be improved:
- Add documentation for the struct explaining its purpose
- Consider moving validation logic to SerializableRange for reuse
+/// Matches nodes that exactly appear within the specified range. +/// The range is inclusive of both start and end positions. pub struct RangeMatcher<L: Language> { start: SerializablePosition, end: SerializablePosition, lang: PhantomData<L>, }🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 69-72: crates/config/src/rule/range.rs#L69-L72
Added lines #L69 - L72 were not covered by tests
76-97: Consider caching node positions for better performance.The current implementation calls
start_pos()andend_pos()for each node check. Consider caching these values when processing multiple nodes:fn match_node_with_env<'tree, D: Doc<Lang = L>>( &self, node: Node<'tree, D>, _env: &mut Cow<MetaVarEnv<'tree, D>>, ) -> Option<Node<'tree, D>> { + // Cache positions to avoid multiple calls + let (node_start_pos, node_end_pos) = (node.start_pos(), node.end_pos()); - let node_start_pos = node.start_pos(); - let node_end_pos = node.end_pos(); if self.start.equals_node_pos(&node_start_pos, &node) && self.end.equals_node_pos(&node_end_pos, &node) { Some(node) } else { None } }🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 94-96: crates/config/src/rule/range.rs#L94-L96
Added lines #L94 - L96 were not covered by tests
99-161: Enhance test coverage with edge cases.Consider adding tests for:
- Positions at file boundaries (0,0)
- Empty ranges (start = end)
- Unicode character handling
- Very large position values
Would you like me to help generate these additional test cases?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 123-125: crates/config/src/rule/range.rs#L123-L125
Added lines #L123 - L125 were not covered by tests
[warning] 139-141: crates/config/src/rule/range.rs#L139-L141
Added lines #L139 - L141 were not covered by tests
[warning] 156-158: crates/config/src/rule/range.rs#L156-L158
Added lines #L156 - L158 were not covered by tests
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
crates/config/src/rule/mod.rs(13 hunks)crates/config/src/rule/range.rs(1 hunks)crates/napi/__test__/index.spec.ts(1 hunks)crates/napi/manual.d.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/napi/test/index.spec.ts
- crates/napi/manual.d.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
crates/config/src/rule/mod.rs
[warning] 231-231: crates/config/src/rule/mod.rs#L231
Added line #L231 was not covered by tests
[warning] 266-266: crates/config/src/rule/mod.rs#L266
Added line #L266 was not covered by tests
[warning] 286-286: crates/config/src/rule/mod.rs#L286
Added line #L286 was not covered by tests
[warning] 312-312: crates/config/src/rule/mod.rs#L312
Added line #L312 was not covered by tests
[warning] 334-334: crates/config/src/rule/mod.rs#L334
Added line #L334 was not covered by tests
[warning] 502-502: crates/config/src/rule/mod.rs#L502
Added line #L502 was not covered by tests
crates/config/src/rule/range.rs
[warning] 6-6: crates/config/src/rule/range.rs#L6
Added line #L6 was not covered by tests
[warning] 13-13: crates/config/src/rule/range.rs#L13
Added line #L13 was not covered by tests
[warning] 69-72: crates/config/src/rule/range.rs#L69-L72
Added lines #L69 - L72 were not covered by tests
[warning] 94-96: crates/config/src/rule/range.rs#L94-L96
Added lines #L94 - L96 were not covered by tests
[warning] 123-125: crates/config/src/rule/range.rs#L123-L125
Added lines #L123 - L125 were not covered by tests
[warning] 139-141: crates/config/src/rule/range.rs#L139-L141
Added lines #L139 - L141 were not covered by tests
[warning] 156-158: crates/config/src/rule/range.rs#L156-L158
Added lines #L156 - L158 were not covered by tests
🔇 Additional comments (1)
crates/config/src/rule/mod.rs (1)
501-503: Add tests for range rule deserialization.
The static analysis indicates missing test coverage for range rule deserialization. Consider adding tests for:
- Deserializing valid range rules
- Error handling for invalid ranges
- Integration with other rule types
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 502-502: crates/config/src/rule/mod.rs#L502
Added line #L502 was not covered by tests
This PR introduces a
RangeMatcher, enabling users to select nodes based on precise range information such as line and column numbers. This functionality is particularly valuable when working in systems that rely on feedback loops or data derived from language servers which typically communicate in terms of line and column positions.Motivation:
rust-analyzer, provide diagnostic and structural information in row-column format. TheRangeMatcherbridges the gap between this data andast-grep's capabilities, enabling precise node selection for downstream processing.Summary by CodeRabbit
Release Notes
New Features
PositionandRangeto enhance type definitions related to node positioning.Bug Fixes
Tests