Conversation
a7c1853 to
8793075
Compare
15e4243 to
27324fb
Compare
src/call_graph/Call_graph.ml
Outdated
| (* Call Graph Types - core types for call graph construction and analysis *) | ||
|
|
||
| (** Position in a file (line/column) *) | ||
| type position = { line : int; column : int [@key "character"] } |
There was a problem hiding this comment.
Why a new type? Can't we simply use Pos.t?
src/call_graph/Call_graph.ml
Outdated
| [@@deriving yojson, eq, show, ord] | ||
|
|
||
| (** Range in a file (start/end positions) *) | ||
| type range = { start : position; end_ : position [@key "end"] } |
There was a problem hiding this comment.
As above, maybe reuse somethign we already have? (Tok_range.t maybe?)
src/call_graph/Call_graph.ml
Outdated
| [@@deriving yojson, eq, show, ord] | ||
|
|
||
| (** Function identifier - IL.name contains name string + token with file/position *) | ||
| type function_id = IL.name option |
There was a problem hiding this comment.
IL.name seems to have some irrelevant information, like sid, which is not stable across runs. Maybe use Tok.location?
src/call_graph/Call_graph.ml
Outdated
| (* Helper to extract normalized key for node comparison. | ||
| Resolves file paths to canonical form to handle different representations | ||
| of the same file (e.g., /foo/bar vs /foo/baz/../bar). *) | ||
| let node_compare_key (v : node) = |
There was a problem hiding this comment.
Code duplication: hash_function_id vs node_compare_key
src/call_graph/Call_graph.ml
Outdated
| Hashtbl.hash (fst n.IL.ident, filename, line, col) | ||
|
|
||
| (** Call graph node - just a function identifier (IL.name) *) | ||
| type node = IL.name |
There was a problem hiding this comment.
type node = function_id?
I guess most of the code is oblivious to what the type function_id actually is, so let's not break the abstraction
src/call_graph/Call_graph.ml
Outdated
|
|
||
| (** Call graph edge - represents a call from one function to another *) | ||
| type edge = { | ||
| callee_fn_id : IL.name; |
There was a problem hiding this comment.
callee_fn_id : function_id; ?
src/call_graph/Call_graph.ml
Outdated
| module Dot = Graph.Graphviz.Dot (Display) | ||
|
|
||
| (** Node tracking for incremental updates *) | ||
| let removed_node_keys : (string, unit) Hashtbl.t = Hashtbl.create 1000 |
There was a problem hiding this comment.
Hashtable on the toplevel. Not thread-safe
There was a problem hiding this comment.
Also a starting size of 1000 is maybe not an optimal choice... start smaller?
There was a problem hiding this comment.
Ideal solution: no top level hashtable, can this be embedded where it's used?
Second-best: DLS assuming of course that our concurrency architecture has not changed in any way.
| close_out oc | ||
|
|
||
| (* Load graph from marshal file *) | ||
| let load_graph (path : string) : G.t = |
There was a problem hiding this comment.
Don't we need user-friendly error messages in case of read/write error?
| type fun_info = { | ||
| fn_id : Shape_and_sig.fn_id; | ||
| opt_name : IL.name option; | ||
| name : IL.name; |
There was a problem hiding this comment.
Don't we prefer name : Call_grap_types.function_id?
As in the comments in Call_graph_types.ml: I think we should keep name an abstract type, as in most of the codebase we don't care about what it is exactly
| (*****************************************************************************) | ||
|
|
||
| let check_fundef (taint_inst : Taint_rule_inst.t) name ctx ?glob_env ?class_name | ||
| let check_fundef (taint_inst : Taint_rule_inst.t) (name : IL.name) ctx ?glob_env ?class_name |
| type fun_info = { | ||
| fn_id : Shape_and_sig.fn_id; | ||
| opt_name : IL.name option; | ||
| name : IL.name; |
There was a problem hiding this comment.
Maybe not relevant to this commit, but what is the exact purpose of class_name_str and method_properties in this type?
| val check_fundef : | ||
| Taint_rule_inst.t -> | ||
| Shape_and_sig.fn_id (** entity being analyzed *) -> | ||
| IL.name (** entity being analyzed *) -> |
There was a problem hiding this comment.
Again: IL.name or func_id?
src/call_graph/Call_graph.ml
Outdated
| [@@deriving yojson, eq, show, ord] | ||
|
|
||
| (** Function identifier - IL.name contains name string + token with file/position *) | ||
| type function_id = IL.name option |
There was a problem hiding this comment.
Also, if you want to move parts of the call graph to lib, I don't think it should depend on anything in IL
There was a problem hiding this comment.
Also, if we want to have a single type we use to identify functions (at the moment the use of IL.name is quite confusing for me), which serves as the type of keys in the signature db and can be obtained by querying the LSP, we need to keep it in a more profound and specific place. Especially that it is a cross-file thing, so I would stress it strongly in the code that it as a different kind of being that leave outside the AST/IL of the current target. I propose that we add a separate module for this
There was a problem hiding this comment.
By "this" I mean a type that identifies a function within the entire project (not a single target). This module would also be a good place to keep functions that query LSP to convert a given location in the current target to the global location, etc.
src/call_graph/Call_graph.ml
Outdated
| type range = { start : position; end_ : position [@key "end"] } | ||
| [@@deriving yojson, eq, show, ord] | ||
|
|
||
| (** Function identifier - IL.name contains name string + token with file/position *) |
There was a problem hiding this comment.
I'm pretty sure I saw a comment that something file names in tokens are not 100% trustworthy, but I can't find it now. I'm also not sure if it was a global thing, or only at that given place. We need to make sure token is enough to identify a file
src/tainting/Shape_and_sig.ml
Outdated
|
|
||
| (* Function key for the signature database - uses just the function name (last element of fn_id). | ||
| This matches the graph vertex type in Function_call_graph.ml. *) | ||
| type func_key = IL.name |
There was a problem hiding this comment.
This is another place where we would benefit from having a proper type for global function identifiers instead of IL.name
maciejpirog
left a comment
There was a problem hiding this comment.
I think it is time to do some major clean up about types, identifiers etc we use in the codebase. I left some comments in particular files, but my major concern is the growing complexity and size of the code without introducing any abstraction over what we're doing. I propose that:
-
We are more precise about how functions are identified. At the moment we have types like
Shape_and_sig.fn_id,Shape_and_sig.func_key,Call_grap_types.function_id. and on top of that in most places of the code, we useIL.name. I propose we make a separate module, saysrc/tainting/Signature_db, in which we:-
Explicitly define the one and only type to identify functions, as they are used cross-file. If possible, I would like to keep this type abstract,
-
Move the signature db code here,
-
Add code to deal with LSPs here as well,
-
Reason for the two above: I guess these are the only two things that need to know the internal structure of these identifiers.
-
Problems with the above: Does naming want to know the structure of these (also: we cannot keep two separate mechanism for naming and we need to kill off one of them)
-
-
We don't put Call_graph_types.ml in lib, since it depends on things in src. Maybe we can make it a functor and parameterize with things from src inside src?
a3d5806 to
0cd71c5
Compare
4af7de3 to
c31bfc8
Compare
src/call_graph/Call_graph.ml
Outdated
| (** Call graph node - just a function identifier (IL.name) *) | ||
| type node = Function_id.t | ||
|
|
||
| (* Helper to extract normalized key for node comparison. |
| close_out oc | ||
|
|
||
| (* Export graph to JSON format (Graphology format for Cytoscape.js viewer) *) | ||
| let graph_to_json (graph : G.t) : Yojson.Basic.t = |
There was a problem hiding this comment.
rename to graph_to_graphology_format
| open Call_graph | ||
|
|
||
| (* Convert path to file:// URI *) | ||
| let fpath_to_uri path = "file://" ^ Fpath.to_string path |
There was a problem hiding this comment.
I don't think there is a need to sanitize this path: it is only used as a label in the exported json
| let opt_name = | ||
| let* ent = opt_ent in | ||
| AST_to_IL.name_of_entity ent | ||
| | Some ent -> |
dimitris-m
left a comment
There was a problem hiding this comment.
lgtm, taking into account what we discussed offline.
once adapted we can merge it.
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [opengrep/opengrep](https://github.com/opengrep/opengrep) | minor | `v1.15.1` → `v1.16.0` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>opengrep/opengrep (opengrep/opengrep)</summary> ### [`v1.16.0`](https://github.com/opengrep/opengrep/releases/tag/v1.16.0): Opengrep 1.16.0 [Compare Source](opengrep/opengrep@v1.15.1...v1.16.0) #### Improvements - Dart: Add typed metavariabless by [@​maciejpirog](https://github.com/maciejpirog) in [#​551](opengrep/opengrep#551) - Dart: Use case of identifier to guess call vs new by [@​maciejpirog](https://github.com/maciejpirog) in [#​555](opengrep/opengrep#555) - Go: Enable goroutines in taint tracking by [@​maciejpirog](https://github.com/maciejpirog) in [#​559](opengrep/opengrep#559) - Add taint propagation via "for" comprehensions by [@​maciejpirog](https://github.com/maciejpirog) in [#​564](opengrep/opengrep#564) #### Bug Fixes - Rust: Missing Rust type alias translation by [@​smith-xyz](https://github.com/smith-xyz) in [#​549](opengrep/opengrep#549) - Fix: Ensure that linux binaries have 8mb stack size (musl) by [@​dimitris-m](https://github.com/dimitris-m) in [#​563](opengrep/opengrep#563) - Fixed a perf regression by removing system calls and improving the reachability graph and the callee lookup by [@​corneliuhoffman](https://github.com/corneliuhoffman) in [#​556](opengrep/opengrep#556) - Fixed intrafile bug introduced by a superfluous fallback by [@​corneliuhoffman](https://github.com/corneliuhoffman) in [#​567](opengrep/opengrep#567) - Ruby: Always translate `or` and `and` to expression by [@​maciejpirog](https://github.com/maciejpirog) in [#​562](opengrep/opengrep#562) - Bash: Allow redirects before command arguments by [@​maciejpirog](https://github.com/maciejpirog) in [#​548](opengrep/opengrep#548) #### Internal Improvements - Add `show dump-intrafile-graph` and `show dump-taint-signatures` commands by [@​corneliuhoffman](https://github.com/corneliuhoffman) in [#​552](opengrep/opengrep#552) - Improve tainting code by [@​maciejpirog](https://github.com/maciejpirog) in [#​546](opengrep/opengrep#546) - Graph refactoring by [@​corneliuhoffman](https://github.com/corneliuhoffman) in [#​553](opengrep/opengrep#553) #### New Contributors - [@​smith-xyz](https://github.com/smith-xyz) made their first contribution in [#​549](opengrep/opengrep#549) **Full Changelog**: <opengrep/opengrep@v1.15.1...v1.16.0> </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi45Ni4yIiwidXBkYXRlZEluVmVyIjoiNDIuOTYuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90IiwiYXV0b21hdGlvbjpib3QtYXV0aG9yZWQiLCJkZXBlbmRlbmN5LXR5cGU6Om1pbm9yIl19-->
Summary
This si a refactoring consolidating PR. The main conceptual change is that in the intrafile graph/signature we have no need for the type of nodes/keys to be a name list but just a name. This cleans up the graph and simplifies the procedures.
libs/graph_liblibrary with shared types for call graph construction abstracting the graphs a bitCall_graph_types.node,Call_graph_types.edge,Call_graph_types.G)Lang_config.ml(HOF patterns, constructor models)Builtin_models.mlandObject_initialization.mlby extracting shared logicAdditional chages by @maciejpirog
src/call_graph. Rationale: it depends on AST and IL, so should not be in libs. Move some helper functions fromtainting/Function_call_graphto thecall_graph/Call_graph(in future, I want to remove the Function_call_graph module). Make Reachable module a module (not a functor). Rename it toGraph_reachability.callee_fn_idfrom the type of edge labels. This is because it can be always recovered form the "src" of the edge.Function_id, which serves as an abstract type for identifying functions globally. It is used as the type of nodes in function call graph and as the type of keys in the signature db.fn_id(the list of names) from Shape_and_sig. Thefn_idtype will be an internal type used when building a call graph from AST. Maybe removed completely when function name resolution is moved to naming.tainting/Function_call_graphand rename it totainting/Graph_from_AST. Ideally, it would be moved tosrc/call_graph, but it depends on things fromsrc/tainting, so for now it stays where it is.Graph_reachabilityandGraph_from_AST. These modules now export only 1 and 3 functions respectively, which should clarify the code.