Syntactic indexing non local refs java#63822
Conversation
kritzcreek
left a comment
There was a problem hiding this comment.
Just a few drive-by comments
There was a problem hiding this comment.
This is a first step towards unifying the analysis into a single pass, with the ability to enable/disable certain features.
By default everything is enabled, and we can choose to disable certain features (like disabling global references for highlighting code).
This necessarily leads to --no-<feature> flags in the CLI. I don't have a way not to have them, nor do I think they're a problem (this came up in the past).
There was a problem hiding this comment.
What about the Nothing case here? No visibility modifier can still produce a local reference
There was a problem hiding this comment.
I'd prefer to use scip_strict here (means you'd need to move it from the scip-syntax crate into syntax-analysis).
There was a problem hiding this comment.
scip_strict this as well while we're at it.
There was a problem hiding this comment.
Huh, this one's weird. Overlapping def and ref might be a bug in scope tree construction that existed before this PR.
There was a problem hiding this comment.
Frankly at this point I'm not sure how to fix it without extra logic
There was a problem hiding this comment.
I've attempted to identify overlapping nodes in the simplest possible way, and while this does help, it breaks Perl:
11 11 │ ref @$self (8, 13)-(8, 19)
12 │- ref $self (8, 14)-(8, 19)
7 7 │ if (@_ == 2) {
8 8 │ my $self = shift;
9 9 │ // ^^^^^ definition local 2
10 10 │ push(@$self, shift);
11 │-// ^^^^^ reference local 2In the queries we extract both $self and @$self as references, but @$self comes first and contains within it $self. Problem is - @$self cannot be resolved to a local definition.
There was a problem hiding this comment.
Not the GOAT when it comes to Perl, but I think the reference and definition we should actually be targeting there is self. See https://perldoc.perl.org/perlintro#Arrays where you can declare an array as @animals and refer to it via $animals[0].
(Not saying go fix the Perl locals, just don't let them stop you from implementing a simpler solution if you've found one)
There was a problem hiding this comment.
The simple solution for choosing the longest reference node and ignore all matches contained within:
diff --git a/docker-images/syntax-highlighter/crates/syntax-analysis/src/locals.rs b/docker-images/syntax-highlighter/crates/syntax-analysis/src/locals.rs
index ba64c9362aa..4c57e85b56f 100644
--- a/docker-images/syntax-highlighter/crates/syntax-analysis/src/locals.rs
+++ b/docker-images/syntax-highlighter/crates/syntax-analysis/src/locals.rs
@@ -587,6 +587,7 @@ impl<'a> LocalResolver<'a> {
let mut definitions: Vec<DefCapture> = vec![];
let mut references: Vec<RefCapture<'a>> = vec![];
let mut skip_references_at_offsets: HashSet<usize> = HashSet::new();
+ let mut min_reference_start: usize = 0;
for match_ in cursor.matches(&config.query, tree.root_node(), source_bytes) {
let properties = config.query.property_settings(match_.pattern_index);
@@ -622,7 +623,11 @@ impl<'a> LocalResolver<'a> {
} else if capture_name.eq_ignore_ascii_case("reference") {
let offset = capture.node.start_byte();
- if skip_references_at_offsets.contains(&offset) {
+ if offset < min_reference_start {
+ continue;
+ }
+
+ if self.skip_occurrences_at_offsets.contains(&offset) {
continue;
}
@@ -642,11 +647,13 @@ impl<'a> LocalResolver<'a> {
other => other,
};
- if self.skip_occurrences_at_offsets.contains(&offset) {
- continue;
- }
+ eprintln!(
+ "{:?} {offset} {min_reference_start} {:?}",
+ capture.node.utf8_text(source_bytes),
+ capture.node.kind()
+ );
- skip_references_at_offsets.insert(offset);
+ min_reference_start = capture.node.end_byte();
let descriptor = match visibility {
Some(Visibility::Global) => kind_propertyThis fixes Java imports and is somewhat generic.
re:Perl - indeed the reference should be to self, but our current grammar does a very crude selection of entire (array) node. I'm exploring switching back to upstream or syncing at least part of the changes into our fork to make this possible
also adds special handling for imports
e1e27f4 to
3b094e2
Compare
There was a problem hiding this comment.
I think this is in good enough shape to merge. The extra reference to lang. is not going to matter, because search is going to give us identifiers on word boundaries anyway. (and having a few false positives is not the end of the world, it's certainly better than false negatives)
I'll open an issue for some follow-up scip_strict usage, and any simplification I think we could make in the future.
If there are other issues we know of, please open issues for those as well, and we can look at them independent of this PR.
|
For future reference, here's the latest attempt to prefer a longer reference in case there are referenced subparts contained entirely within it. It causes issues in perl because we cast a wide net when marking identifiers as reveferences in the grammar, producing I attempted to update the grammar we have (sourcegraph/tree-sitter-perl#4) but it causes other issues because of how it's structured. diff --git a/docker-images/syntax-highlighter/Cargo.lock b/docker-images/syntax-highlighter/Cargo.lock
index 7f33669d4d5..985bc58e8ae 100644
--- a/docker-images/syntax-highlighter/Cargo.lock
+++ b/docker-images/syntax-highlighter/Cargo.lock
@@ -2633,7 +2633,7 @@ dependencies = [
[[package]]
name = "tree-sitter-perl"
version = "0.0.1"
-source = "git+https://github.com/sourcegraph/tree-sitter-perl?rev=e1b4844afd17b7dc019a436b1ac890568d79a1f2#e1b4844afd17b7dc019a436b1ac890568d79a1f2"
+source = "git+https://github.com/sourcegraph/tree-sitter-perl?rev=25473ee75c2fc0757b019d5be54cba0cbe412437#25473ee75c2fc0757b019d5be54cba0cbe412437"
dependencies = [
"cc",
"tree-sitter",
diff --git a/docker-images/syntax-highlighter/crates/syntax-analysis/queries/java/scip-locals.scm b/docker-images/syntax-highlighter/crates/syntax-analysis/queries/java/scip-locals.scm
index fcf8e3c67d8..1a91b868483 100644
--- a/docker-images/syntax-highlighter/crates/syntax-analysis/queries/java/scip-locals.scm
+++ b/docker-images/syntax-highlighter/crates/syntax-analysis/queries/java/scip-locals.scm
@@ -91,18 +91,28 @@
; REFERENCES
-
+; NOTE: commented out because current implementation produces overlapping
+; references (in the example below, java.util and util will be produced)
; import java.util.HashSet
; ^^^^^^^^^ namespace
; ^^^^^^^ type (could also be a constant, but type is more common)
+; (import_declaration
+; (scoped_identifier
+; scope: ((_) @reference (#set! "kind" "global.namespace"))
+; ))
+
+; (import_declaration
+; (scoped_identifier
+; name: (_) @reference (#set! "kind" "global.type")
+; ))
(import_declaration
(scoped_identifier
- scope: ((_) @reference (#set! "kind" "global.namespace"))
+ scope: (_) @occurrence.skip
))
(import_declaration
(scoped_identifier
- name: (_) @reference (#set! "kind" "global.type")
+ name: (_) @occurrence.skip
))
(field_access object: (identifier) @reference)
diff --git a/docker-images/syntax-highlighter/crates/syntax-analysis/queries/perl/scip-locals.scm b/docker-images/syntax-highlighter/crates/syntax-analysis/queries/perl/scip-locals.scm
index 80fd51e2eaf..3d0c04bd8ee 100644
--- a/docker-images/syntax-highlighter/crates/syntax-analysis/queries/perl/scip-locals.scm
+++ b/docker-images/syntax-highlighter/crates/syntax-analysis/queries/perl/scip-locals.scm
@@ -5,11 +5,56 @@
;; TODO: Add `state` variables once we've updated the grammar. Only
;; `our` variables are non-local, so we must not include them here
-(variable_declaration "my" (_) @definition.term)
-(for_statement my_var: (_) @definition.term)
-
-((scalar) @reference (#set! "kind" "local"))
-((array) @reference (#set! "kind" "local"))
-((arraylen) @reference (#set! "kind" "local"))
-((hash) @reference (#set! "kind" "local"))
-((glob) @reference (#set! "kind" "local"))
+(variable_declaration
+ "my"
+ variable: [
+ (scalar (varname) @definition.term)
+ (array (varname) @definition.term)
+ (arraylen (varname) @definition.term)
+ (hash (varname) @definition.term)
+ (glob (varname) @definition.term)
+ ]
+ ; (_) @definition.term
+)
+
+(variable_declaration
+ "my"
+ variables: ([
+ (scalar ((varname) @definition.term))
+ (array ((varname) @definition.term))
+ (arraylen ((varname) @definition.term))
+ (hash ((varname) @definition.term))
+ (glob ((varname) @definition.term))
+ ])*
+)
+
+(for_statement
+ my_var: [
+ (scalar (varname) @definition.term)
+ (array (varname) @definition.term)
+ (arraylen (varname) @definition.term)
+ (hash (varname) @definition.term)
+ (glob (varname) @definition.term)
+ ]
+)
+
+(scalar
+ ((varname) @reference (#set! "kind" "local"))
+)
+
+(array
+ (varname) @reference (#set! "kind" "local")
+)
+
+(arraylen
+ (varname) @reference (#set! "kind" "local")
+)
+
+(hash
+ (varname) @reference (#set! "kind" "local")
+)
+
+(glob
+ (varname) @reference (#set! "kind" "local")
+)
+
diff --git a/docker-images/syntax-highlighter/crates/syntax-analysis/src/locals.rs b/docker-images/syntax-highlighter/crates/syntax-analysis/src/locals.rs
index ba64c9362aa..a5cf70b6e1c 100644
--- a/docker-images/syntax-highlighter/crates/syntax-analysis/src/locals.rs
+++ b/docker-images/syntax-highlighter/crates/syntax-analysis/src/locals.rs
@@ -587,6 +587,7 @@ impl<'a> LocalResolver<'a> {
let mut definitions: Vec<DefCapture> = vec![];
let mut references: Vec<RefCapture<'a>> = vec![];
let mut skip_references_at_offsets: HashSet<usize> = HashSet::new();
+ let mut skip_references_until: Option<usize> = None;
for match_ in cursor.matches(&config.query, tree.root_node(), source_bytes) {
let properties = config.query.property_settings(match_.pattern_index);
@@ -626,6 +627,11 @@ impl<'a> LocalResolver<'a> {
continue;
}
+ match skip_references_until {
+ Some(end_offset) if offset < end_offset => continue,
+ _ => (),
+ }
+
let kind_property = properties
.iter()
.find(|p| p.key.as_ref() == "kind")
@@ -647,6 +653,7 @@ impl<'a> LocalResolver<'a> {
}
skip_references_at_offsets.insert(offset);
+ skip_references_until = Some(capture.node.end_byte());
let descriptor = match visibility {
Some(Visibility::Global) => kind_property
@@ -666,6 +673,7 @@ impl<'a> LocalResolver<'a> {
} else if capture_name == "occurrence.skip" {
let offset = capture.node.start_byte();
self.skip_occurrences_at_offsets.insert(offset);
+ skip_references_until = Some(capture.node.end_byte());
} else {
debug_assert!(false, "Discarded capture: {capture_name}")
}
diff --git a/docker-images/syntax-highlighter/crates/syntax-analysis/src/snapshots/syntax_analysis__locals__test__java_occurrences.snap b/docker-images/syntax-highlighter/crates/syntax-analysis/src/snapshots/syntax_analysis__locals__test__java_occurrences.snap
index 924113e1d29..f58a8ac1117 100644
--- a/docker-images/syntax-highlighter/crates/syntax-analysis/src/snapshots/syntax_analysis__locals__test__java_occurrences.snap
+++ b/docker-images/syntax-highlighter/crates/syntax-analysis/src/snapshots/syntax_analysis__locals__test__java_occurrences.snap
@@ -5,16 +5,8 @@ expression: dumped
package com.hello;
import java.lang.AutoCloseable;
-// ^^^^^^^^^ reference scip-syntax `java.lang`/
-// ^^^^ reference scip-syntax lang.
-// ^^^^^^^^^^^^^ reference scip-syntax AutoCloseable#
import java.util.*;
-// ^^^^ reference scip-syntax java/
-// ^^^^ reference scip-syntax util#
import java.util.stream.*;
-// ^^^^^^^^^ reference scip-syntax `java.util`/
-// ^^^^ reference scip-syntax util.
-// ^^^^^^ reference scip-syntax stream#
@Deprecated
// ^^^^^^^^^^ reference scip-syntax Deprecated#
diff --git a/docker-images/syntax-highlighter/crates/syntax-analysis/src/snapshots/syntax_analysis__locals__test__java_scopes.snap b/docker-images/syntax-highlighter/crates/syntax-analysis/src/snapshots/syntax_analysis__locals__test__java_scopes.snap
index 9c6eba09de3..457d5d7317e 100644
--- a/docker-images/syntax-highlighter/crates/syntax-analysis/src/snapshots/syntax_analysis__locals__test__java_scopes.snap
+++ b/docker-images/syntax-highlighter/crates/syntax-analysis/src/snapshots/syntax_analysis__locals__test__java_scopes.snap
@@ -3,14 +3,6 @@ source: crates/syntax-analysis/src/locals.rs
expression: scope_tree
---
scope global (0, 0)-(121, 0)
- ref java.lang (2, 7)-(2, 16)
- ref lang (2, 12)-(2, 16)
- ref AutoCloseable (2, 17)-(2, 30)
- ref java (3, 7)-(3, 11)
- ref util (3, 12)-(3, 16)
- ref java.util (4, 7)-(4, 16)
- ref util (4, 12)-(4, 16)
- ref stream (4, 17)-(4, 23)
scope scope (6, 0)-(120, 1)
ref Deprecated (6, 1)-(6, 11)
def Container (7, 20)-(7, 29)
diff --git a/docker-images/syntax-highlighter/crates/tree-sitter-all-languages/Cargo.toml b/docker-images/syntax-highlighter/crates/tree-sitter-all-languages/Cargo.toml
index 19d7905ab28..c99a0f5f9c8 100644
--- a/docker-images/syntax-highlighter/crates/tree-sitter-all-languages/Cargo.toml
+++ b/docker-images/syntax-highlighter/crates/tree-sitter-all-languages/Cargo.toml
@@ -26,7 +26,7 @@ tree-sitter-kotlin = { git = "https://github.com/fwcd/tree-sitter-kotlin", rev =
tree-sitter-magik = { git = "https://github.com/sourcegraph/tree-sitter-magik", rev = "dfabbd4e70ff311939cf48fdf90d340f07758d37" }
tree-sitter-matlab = { git = "https://github.com/acristoffers/tree-sitter-matlab", rev = "6071891a8c39600203eba20513666cf93b4d650a" }
tree-sitter-nickel = { git = "https://github.com/nickel-lang/tree-sitter-nickel", rev = "d6c7eeb751038f934b5b1aa7ff236376d0235c56" }
-tree-sitter-perl = { git = "https://github.com/sourcegraph/tree-sitter-perl", rev = "e1b4844afd17b7dc019a436b1ac890568d79a1f2" }
+tree-sitter-perl = { git = "https://github.com/sourcegraph/tree-sitter-perl", rev = "25473ee75c2fc0757b019d5be54cba0cbe412437" }
tree-sitter-pkl = { git = "https://github.com/apple/tree-sitter-pkl", rev = "b79c8c4d2419e82324d9aca31e9de47ed8304f1f" }
tree-sitter-pod = { git = "https://github.com/sourcegraph/tree-sitter-pod", rev = "f422a0dca6847c692e811f06fd92c6a75d647222" }
tree-sitter-xlsg = { git = "https://github.com/sourcegraph/tree-sitter-xlsg", rev = "d956b54ea151b12f19c945f7be421c3dcd3a77ba" } |
Fixes GRAPH-649
This PR adds non-local references support to Java (and more generally to our locals handling code).
The main idea is to cherrypick nodes that can contain global or resolved (see below) references, handle them first, and then mark everything else as locals.
Pure Global references are formed from types of definitions that can't be locals in our code. Currently it's only methods in Java that we treat as always global references
Pure local references - these references should only be resolved against the locals scope tree and definitions within it - never emitting a non-local references
Resolved references - these are first resolved as locals, and if that doesn't succeed, a global reference is emitted. These will be used most frequently, as TS grammars don't carry enough information for us to identify more pure global references. For example, in Java's type bounds
class Hello<N extends Number>, Number can refer to both the local type parameter of an enclosing class (in which case we should emit a local reference), or it can refer tojava.lang.Number(in which case we should emit a global reference)Test plan
Changelog