Skip to content

Fixed a perf regression by removing system calls and improving the reachability graph and the callee lookup#556

Merged
corneliuhoffman merged 3 commits intomainfrom
ch/graph_performance
Feb 2, 2026
Merged

Fixed a perf regression by removing system calls and improving the reachability graph and the callee lookup#556
corneliuhoffman merged 3 commits intomainfrom
ch/graph_performance

Conversation

@corneliuhoffman
Copy link
Contributor

@corneliuhoffman corneliuhoffman commented Jan 29, 2026

This PR introduces 4 changes

  • removes the Unix.realpath that slowed everything down
  • Simplifies the reachability graph to compute in batch using OcamlGraph primitives, faster computation.
  • Improves the lookuop_callee_from_graph by not searching over all edges but only those going int the caller
  • adds a fix for elixir shortLambdas

(* Elixir: &func/n - ShortLambda wrapping a call to the named function.
Structure: OtherExpr("ShortLambda", [Params[&1,...]; S(ExprStmt(Call(func, args)))])
Create a _tmp node to match what AST_to_IL creates for the anonymous wrapper. *)
| G.OtherExpr (("ShortLambda", shortlambda_tok), [G.Params _; G.S { G.s = G.ExprStmt (inner_e, _); _ }]) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In places like AST_to_IL we have a way to specify the language, for example:

G.OtherExpr (("ShortLambda", _), _) when env.lang =*= Lang.Elixir

We also have short lambdas in Clojure, are these covered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will write a clojure test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Clojure is not yet supported

Comment on lines 113 to +126
|> List.find_opt (fun edge ->
let label = G.E.label edge in
(* Implicit edges have line 0 - match by callee name if call is to same function *)
Int.equal label.call_site.Pos.line 0)
|> Option.map G.E.src
Pos.equal label.call_site call_pos)
in
match exact_match with
| Some edge ->
Some (G.E.src edge)
| None ->
(* Fallback: check for implicit/HOF edges by matching line 0 *)
incoming_edges
|> List.find_opt (fun edge ->
let label = G.E.label edge in
Int.equal label.call_site.Pos.line 0)
|> Option.map G.E.src
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact one can do find_opt only once right?

And return a type Explicit _ | Fallback _ or whatever.

Not so important for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but then you need to worry about whioch came first I agreee it might be mildly cheaper though I still have to convince myself wether the fallback ever happens

in
(* Combine and deduplicate *)
all_callback_args @ configured_callbacks |> dedup_fn_ids
(* Combine - dedup would need to handle the tmp_opt too, skip for now *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we skip dedup? Because _tmp are identified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bc the dedup is already done by the graph

let tmp_tok = Tok.fake_tok shortlambda_tok "_tmp" in
let tmp_name = IL.{
ident = ("_tmp", tmp_tok);
sid = G.SId.unsafe_default;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But AST_to_IL.fresh_var has a proper sid:

  let i = G.SId.mk () in

which I guess does not matter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the sids are irrelevant here

@dimitris-m
Copy link
Collaborator

Gentle reminder to change PR title to something more informative like:

Fix performance regression in intrafile (system calls)

let from_sources = reachable_vertices_batch graph sources in
let from_sinks = reachable_vertices_batch graph sinks in
(* Fast set intersection *)
let common = VSet.inter from_sources from_sinks in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the situation when the input graph is:

src -> foo <- snk
        |
        v
       bar

My understanding is that it will add bar to the relevant graph, although it is not really necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, this my plan for today, the graph lib has some trick for that, you need to look at SCC's that have no predecesors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we agreed we will keep the filtering for now, In principle things like this will be a problem but our HOF technology is not yet sophisticated enough to catch them anyway now:

def f():
    return source()

def g(x):
    # ruleid: test-tuple-unpacking-taint
    sink(x)
    return x

def h(x):
    return (f(), lambda y: g(y))

def k(x):
    a, b = h(x)
    z, w = b(a)

@corneliuhoffman corneliuhoffman changed the title bug fix and improvements Fixed a perf regression by removing system calls and improving the reachability graph and the callee lookup Feb 2, 2026
@corneliuhoffman corneliuhoffman merged commit eb76031 into main Feb 2, 2026
11 of 12 checks passed
@corneliuhoffman corneliuhoffman deleted the ch/graph_performance branch February 2, 2026 14:34
@maciejpirog maciejpirog mentioned this pull request Feb 4, 2026
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Feb 15, 2026
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [opengrep/opengrep](https://github.com/opengrep/opengrep) | minor | `v1.15.1` → `v1.16.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>opengrep/opengrep (opengrep/opengrep)</summary>

### [`v1.16.0`](https://github.com/opengrep/opengrep/releases/tag/v1.16.0): Opengrep 1.16.0

[Compare Source](opengrep/opengrep@v1.15.1...v1.16.0)

#### Improvements

- Dart: Add typed metavariabless by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;551](opengrep/opengrep#551)
- Dart: Use case of identifier to guess call vs new by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;555](opengrep/opengrep#555)
- Go: Enable goroutines in taint tracking by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;559](opengrep/opengrep#559)
- Add taint propagation via "for" comprehensions by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;564](opengrep/opengrep#564)

#### Bug Fixes

- Rust: Missing Rust type alias translation by [@&#8203;smith-xyz](https://github.com/smith-xyz) in [#&#8203;549](opengrep/opengrep#549)
- Fix: Ensure that linux binaries have 8mb stack size (musl) by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;563](opengrep/opengrep#563)
- Fixed a perf regression by removing system calls and improving the reachability graph and the callee lookup by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;556](opengrep/opengrep#556)
- Fixed intrafile bug introduced by a superfluous fallback by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;567](opengrep/opengrep#567)
- Ruby: Always translate `or` and `and` to expression by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;562](opengrep/opengrep#562)
- Bash: Allow redirects before command arguments by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;548](opengrep/opengrep#548)

#### Internal Improvements

- Add `show dump-intrafile-graph` and `show dump-taint-signatures` commands by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;552](opengrep/opengrep#552)
- Improve tainting code by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;546](opengrep/opengrep#546)
- Graph refactoring by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;553](opengrep/opengrep#553)

#### New Contributors

- [@&#8203;smith-xyz](https://github.com/smith-xyz) made their first contribution in [#&#8203;549](opengrep/opengrep#549)

**Full Changelog**: <opengrep/opengrep@v1.15.1...v1.16.0>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi45Ni4yIiwidXBkYXRlZEluVmVyIjoiNDIuOTYuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90IiwiYXV0b21hdGlvbjpib3QtYXV0aG9yZWQiLCJkZXBlbmRlbmN5LXR5cGU6Om1pbm9yIl19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants