HOFs by corneliuhoffman · Pull Request #469 · opengrep/opengrep

corneliuhoffman · 2025-11-20T17:55:00Z

Higher-Order Function (HOF) Taint Analysis - PR Documentation

Overview

This PR introduces comprehensive support for tracking taint through higher-order functions (HOFs) in opengrep's cross-function taint analysis.
Problem: Before this PR, taint analysis could not track data flow through HOF calls like:

let userInput = getSource();
let results = [userInput].map(x => x);  // Taint was lost here
sink(results[0]);  // Not detected as a vulnerability

Solution: We now model HOF behavior with ToSinkInCall effects that represent "call this callback with tainted data", combined with a call graph that tracks both direct calls and HOF callbacks for proper analysis ordering.

Architecture Changes

1. Path-Based Function Identification (`fn_id`)

File: src/tainting/Shape_and_sig.ml:695-710

Before:

type fn_id = { class_name : IL.name option; name : IL.name option }

After:

type fn_id = IL.name option list

Why: The old structure couldn't represent nested functions. The new path-based representation supports arbitrary nesting:

[None; Some "foo"] - Top-level function foo
[Some "MyClass"; Some "method"] - Method in a class
[None; Some "outer"; Some "inner"] - Nested function inner inside outer
[Some "Class"; Some "method"; Some "_tmp"] - Lambda inside a class method

Helper functions added:

let make_fn_id_with_class class_name method_name = [Some class_name; Some method_name]
let make_fn_id_no_class name = [None; Some name]
let show_fn_id fn_id = ... (* e.g., "MyClass::method::nested" *)
let get_fn_name fn_id = ... (* Extract last element *)

2. Builtin Signature Database

New File: src/tainting/Builtin_models.ml (471 lines)

Standard library HOFs (like Array.map, filter, Python's map()) don't have user-defined signatures. This module provides built-in models.

Key concept: ToSinkInCall effect represents "call a callback with tainted data":

Effect.ToSinkInCall {
  callee = <callback expression>;
  arg = { name = "callback"; index = 0 };  (* Which param is the callback *)
  args_taints = [tainted_data];            (* What data to pass *)
}

Three signature patterns:

MethodHOF - arr.map(callback): Passes this (the array) to the callback

(* Signature models: callback receives BThis (the receiver) *)
let this_taint = Taint.{ orig = Var { base = BThis; offset = [] }; tokens = [] }

FunctionHOF - map(callback, data): Passes one parameter to another

(* Signature models: callback receives data from another parameter *)
let data_param_taint = Taint.{ orig = Var { base = BArg data_arg; offset = [] }; ... }

ReturningFunctionHOF - Ruby's arr.map returns a function that takes a block

(* Returns Shape.Fun with nested signature for deferred callback *)
let return_effect = Effect.ToReturn {
  data_taints = this_taint_set;
  data_shape = Shape.Fun returned_fun_sig;  (* The function shape *)
  ...
}

Supported languages and functions (get_hof_configs):

Language	Method HOFs	Function HOFs
JavaScript/TypeScript	`map`, `flatMap`, `filter`, `forEach`, `find`, `findIndex`, `some`, `every`, `reduce`, `reduceRight`	-
Python	-	`map`, `filter`
Ruby	-	`map`, `each`, `select`, `filter`, `flat_map`, `collect`, `find`, `detect` (returning-function pattern)
Java	`map`, `filter`, `forEach`, `flatMap`	-
Kotlin	`map`, `filter`, `forEach`, `flatMap`, `find`, `any`, `all` (arity 0 and 1)	-
Swift	`map`, `filter`, `forEach`, `flatMap`, `compactMap`, `first`, `contains`	-
Scala	`map`, `filter`, `foreach`, `flatMap`, `find`, `exists`, `forall`	-
C#	`Select`, `Where`, `ForEach`, `SelectMany`, `First`, `Any`, `All`	-
Rust	`map`, `for_each`, `filter`, `flat_map`, `find`, `any`, `all`	-
PHP	-	`array_map`, `array_filter`, `array_walk`
Julia	-	`map`, `foreach`, `filter`
C++	-	`for_each` (arity 3), `transform` (arity 4)
Elixir	-	`Enum.map`, `Enum.each`, `Enum.filter`, `Enum.flat_map`, `Enum.find`

3. Labeled Call Graph with HOF Edges

File: src/tainting/Function_call_graph.ml (major rewrite, +869 lines)

Before: Simple graph FuncGraph = Persistent.Digraph.Concrete(FuncVertex)

After: Labeled imperative graph with edge metadata:

type call_edge = {
  callee_fn_id : fn_id;   (* Full path of callee *)
  call_site : Tok.t;      (* Token at the call site *)
}

module FuncGraph = Graph.Imperative.Digraph.ConcreteLabeled(FuncVertex)(...)

Why labeled edges:

Multiple calls to the same function from different sites need to be distinguished
The call site token allows signature instantiation to find the correct callee
HOF callbacks need different edges than regular calls

Key functions:

identify_callee - Resolves a call expression to its fn_id:
- Checks nested functions in the same scope first
- Then class methods
- Then top-level functions
- Handles this.method(), obj.method(), qualified names
extract_calls - Extracts all (fn_id, call_site_tok) pairs from a function body
extract_hof_callbacks - Extracts HOF callback edges:
- Uses Builtin_models.get_hof_configs to know which methods are HOFs
- Detects user-defined HOFs via detect_user_hof
- Creates special HOF edges with fake tokens like <hof_callback:foo>

detect_user_hof - Checks if a function calls any of its parameters:

let detect_user_hof (fdef : G.function_definition) : (string * int) list

4. Graph Reachability Optimization

New File: src/tainting/Graph_functor.ml (132 lines)

For large codebases, analyzing every function is expensive. This module computes the minimal subgraph needed for a specific rule.

Key function: nearest_common_descendant_subgraph

(* Subgraph containing all paths from sources to sinks *)
val nearest_common_descendant_subgraph : graph -> vertex -> vertex -> graph

Algorithm:

Compute forward-reachable sets from source functions (R1) and sink functions (R2)
Intersect to get functions reachable from both (R1 ∩ R2)
Find minimal SCCs in intersection (handles cycles from mutual recursion)
Reverse BFS from minima to include all ancestors

Usage in Match_tainting_mode.ml:

let source_functions = find_functions_containing_ranges ast source_ranges in
let sink_functions = find_functions_containing_ranges ast sink_ranges in
let relevant_graph = compute_relevant_subgraph call_graph source_functions sink_functions in

5. Parent Path Visitor

File: src/analyzing/Visit_function_defs.ml (+167 lines)

New visitor: visitor_with_parent_path tracks the full path to each function during AST traversal.

class visitor_with_parent_path = object
  val parent_path : IL.name option list ref = ref []
  (* Tracks: [class; outer_func; inner_func; ...] *)
end

val fold_with_parent_path :
  ('acc -> G.entity option -> IL.name option list -> G.function_definition -> 'acc)
  -> 'acc -> G.program -> 'acc

Why: The old fold_with_class_context only tracked class names. For nested functions, we need the full path to build correct fn_ids.

6. ToSinkInCall Effect

File: src/tainting/Sig_inst.ml:38-48, 876-1107

New effect type added to signature instantiation:

type call_effect =
  | ToReturn of Effect.taints_to_return
  | ToSink of Effect.taints_to_sink
  | ToLval of Taint.taints * IL.name * Taint.offset list
  | ToSinkInCall of {       (* NEW *)
      callee : IL.exp;
      arg : Taint.arg;
      args_taints : Effect.args_taints;
    }

Instantiation logic (simplified):

When encountering ToSinkInCall effect during signature instantiation:
Look up the callback's signature (from lval_env shape, or via lookup_sig)
Instantiate the callback with the provided args_taints
If no signature found, preserve the effect for later resolution
Add depth limiting to prevent infinite recursion on recursive HOFs

Key change: instantiate_function_signature now takes optional lookup_sig and depth parameters:

let rec instantiate_function_signature lval_env taint_sig
    ~callee ~args args_taints
    ?(lookup_sig : (IL.exp -> int -> Signature.t option) option)
    ?(depth : int = 0) () : call_effects option

7. Signature Lookup via Call Graph

File: src/tainting/Dataflow_tainting.ml:667-730

New function queries the call graph to resolve function calls:

let lookup_callee_from_graph graph caller_fn_id call_tok : fn_id option =
  (* Find edge with matching call_site token *)
  all_edges
  |> List.find_opt (fun edge ->
      Int.equal (Tok.compare label.call_site call_tok) 0)
  |> Option.map (fun edge -> label.callee_fn_id)

Lookup order:

Check call graph for edge with matching call site token
If found, use the pre-computed callee_fn_id from the edge label
Fall back to direct name lookup
Fall back to builtin signature database

8. Ruby Class Method Fix

File: languages/ruby/generic/ruby_to_generic.ml:794-815

Bug: Ruby class methods (def self.method_name) were being parsed with empty entity names, causing all class methods to collapse into a single node.

Before:

| SingletonM e ->
    let e = expr e in
    let ent = G.basic_entity ("", fake t "") in  (* Empty name! *)
    ...

After:

| SingletonM e ->
    let method_id =
      match e with
      | DotAccess (_, _, mn) -> (
          match method_name mn with
          | Left id -> id
          | Right _ -> ("", fake t ""))
      | ScopedId (Scope (_, _, SM mn)) -> ...
      | _ -> ("", fake t "")
    in
    let e = expr e in
    let ent = G.basic_entity method_id in  (* Correct name! *)
    ...

Impact: Call graph now correctly shows Service.default_integration and Service.closest_group_integration as separate nodes instead of both being Service..

9. Parameter Shape for HOF Detection

File: src/tainting/Taint_signature_extractor.ml:123-214

Parameters now get Shape.Arg shape instead of just taints:

let add_param_to_env il_lval taint_set taint_arg env =
  let param_shape = Shape.Arg taint_arg in
  Taint_lval_env.add_lval_shape il_lval taint_set param_shape env

Why: When a parameter is used as a callback in a HOF call, we need to know it's an "argument" to create proper ToSinkInCall effects. The Arg shape carries the parameter's index.

10. Match_tainting_mode Integration

File: src/engine/Match_tainting_mode.ml (+343 lines of changes)

Major refactoring to use the new infrastructure:

Shared call graph: Computed once per file, shared across rules

let shared_call_graph_opt =
  if taint_intrafile then
    Some (Function_call_graph.build_call_graph ~lang ~object_mappings ast)
  else None

Relevant subgraph optimization:

let source_functions = find_functions_containing_ranges ast source_ranges in
let sink_functions = find_functions_containing_ranges ast sink_ranges in
let relevant_graph = compute_relevant_subgraph call_graph source_functions sink_functions in

Topological ordering for analysis:

let analysis_order = Topo.fold (fun fn acc -> fn :: acc) relevant_graph [] |> List.rev

Kotlin trailing lambda support: Functions with lambda as last parameter get signatures extracted at arity and arity-1:

if Lang.equal lang Lang.Kotlin && arity >= 2 then
  let last_param_is_lambda = ... in
  if last_param_is_lambda then
    extract_signature_with_file_context ~arity:(arity - 1) ...

Data Flow Example

Consider this TypeScript code:

function source() { return tainted; }
function sink(x) { /* dangerous */ }
function process(items) {
  return items.map(x => transform(x));
}
function transform(x) { return x; }

let data = [source()];
let result = process(data);
sink(result[0]);

Analysis flow:

Build call graph:

<toplevel> --calls--> source (at source() call)
<toplevel> --calls--> process (at process(data) call)
<toplevel> --calls--> sink (at sink(...) call)
process --calls--> transform (via map HOF callback edge)

Compute relevant subgraph: Functions between source and sink
Topological order: transform, process, <toplevel> (leaves first)
Extract signatures:
- transform: ToReturn { data_taints = {Arg(x, 0)} } (returns its input)
- process:
  - map builtin has ToSinkInCall { callee = callback, args_taints = [BThis] }
  - When lambda x => transform(x) is analyzed, we look up transform's signature
  - Result: ToReturn { data_taints = {Arg(items, 0)} } (returns transformed items)
Instantiate at call sites:
- process(data) where data is tainted from source()
- Substitute Arg(items, 0) → taint from data
- Result flows to sink(result[0]) → FINDING!

Test Coverage

New test files (17 languages):

test_hof_comprehensive_*.{js,ts,py,rb,java,kt,scala,swift,cs,rs,lua,php,go,cpp,c,julia,ex}

Each test covers:

Basic HOF taint propagation (map, filter, forEach)
Chained HOFs (arr.map(...).filter(...))
Reduce with accumulator
Nested callbacks
User-defined HOFs (functions that call their parameters)
Cross-function taint through HOFs

Configuration

No new configuration options. HOF support is automatically enabled when taint_intrafile is true.

Performance Considerations

Call graph is shared: Computed once per file, reused across all rules
Subgraph optimization: Only functions on paths between sources and sinks are analyzed
Depth limiting: Recursive HOF instantiation is limited by taint_MAX_VISITS_PER_NODE
Builtin signatures are pre-computed: Language-specific HOF models are created once

Benchmark Script

A benchmark script is provided at benchmark_taint.sh for comparing OpenGrep vs Semgrep taint analysis performance:

# Usage
./benchmark_taint.sh [--no-semgrep] <yaml_file_or_dir> <target_file_or_dir> [num_runs] [sha1] [sha2] ...

# Examples
./benchmark_taint.sh rules/sqli.yaml target_repo/ 3
./benchmark_taint.sh --no-semgrep rules/ target_repo/ 5
./benchmark_taint.sh rules/ target_repo/ 3 main abc123  # Compare current vs main vs abc123

Options:

--no-semgrep: Skip Semgrep benchmarks (useful when Semgrep is not installed or for quick OpenGrep-only tests)
num_runs: Number of runs to average (default: 3)
sha1, sha2, ...: Git commits/branches to compare (will checkout, compile, and benchmark each)

Output: Summary table with findings count and timing (avg ± std) for each configuration

languages/elixir/generic/Elixir_to_generic.ml

src/core_scan/Core_scan.ml

src/matching/Generic_vs_generic.ml

src/tainting/Dataflow_tainting.ml

src/analyzing/AST_to_IL.ml

benchmark_taint.sh

dimitris-m · 2025-12-18T14:38:08Z

src/engine/Match_tainting_mode.ml

+          List.iteri (fun i fn_id ->
+            let name = match Shape_and_sig.get_fn_name fn_id with
+              | Some n -> fst n.IL.ident
+              | None -> "<no-name>"
+            in
+            Log.debug (fun m -> m "SUBGRAPH: source_function[%d] = %s" i name))
+            source_functions;
+          List.iteri (fun i fn_id ->
+            let name = match Shape_and_sig.get_fn_name fn_id with
+              | Some n -> fst n.IL.ident
+              | None -> "<no-name>"
+            in
+            Log.debug (fun m -> m "SUBGRAPH: sink_function[%d] = %s" i name))
+            sink_functions;


For performance reasons, it's best to only make these calculations inside the fun m -> ... since they are not needed outside of the debug context.

dimitris-m · 2025-12-18T14:49:38Z

src/engine/Match_tainting_mode.ml

+                  | G.Param { G.ptype = Some ptype; _ } :: _ -> (
+                      match ptype.G.t with
+                      | G.TyFun _ -> true
+                      | _ -> false)
+                  | _ -> false


Why not just:

| G.Param { G.ptype = Some {t = G.TyFun _; _}; _ } :: _ -> true | _ -> false

dimitris-m · 2025-12-18T14:54:48Z

src/engine/Match_tainting_mode.ml

+             * also extract signature with arity-1 to handle trailing lambda syntax:
+             * f(a, b) vs f(a) { b } *)
+            let updated_db =
+              if Lang.equal lang Lang.Kotlin && arity >= 2 then


Why not && arity >= 1 ?

Is it not the case that f(a) where a is a function is also f { a } ?

dimitris-m · 2025-12-18T15:14:17Z

src/matching/Generic_vs_generic.ml

+  | G.FuncDef a1, B.FieldDefColon { B.vinit = Some { e = B.Lambda b1; _ }; _ }
+  | B.FieldDefColon { B.vinit = Some { e = B.Lambda b1; _ }; _ }, G.FuncDef a1 ->  


I think this is wrong: we are inverting the order of arguments, left is Pattern (G) and right is Source code (B).
But from the cases we should not assume that they are interchangeable, ie m_expr a b is not necessarily the same as m_expr b a.

We need 2 separate cases or maybe (if possible) rename the variables on the rhs of | ... to have a1 pattern and b1 code in both cases.

dimitris-m · 2025-12-18T15:14:36Z

src/matching/Generic_vs_generic.ml

+  | G.FuncDef a1, B.VarDef { B.vinit = Some { e = B.Lambda b1; _ }; _ }
+  | B.VarDef { B.vinit = Some { e = B.Lambda b1; _ }; _ }, G.FuncDef a1  ->


Same comment as below.

dimitris-m · 2025-12-18T15:58:39Z

src/matching/Generic_vs_generic.ml

+  (* iso: FuncDef pattern can match VarDef with Lambda (arrow function) *)
+  | G.FuncDef a1, B.VarDef { B.vinit = Some { e = B.Lambda b1; _ }; _ }
+  | B.VarDef { B.vinit = Some { e = B.Lambda a1; _ }; _ }, G.FuncDef b1  ->
+      if_config
+        (fun x -> x.arrow_is_function)
+        ~then_:(m_function_definition a1 b1)
+        ~else_:(fail ())
+  (* iso: FuncDef pattern can match FieldDefColon with Lambda (arrow function in object literal) *)
+  | G.FuncDef a1, B.FieldDefColon { B.vinit = Some { e = B.Lambda b1; _ }; _ }
+  | B.FieldDefColon { B.vinit = Some { e = B.Lambda a1; _ }; _ }, G.FuncDef b1 ->  
+      if_config
+        (fun x -> x.arrow_is_function)
+        ~then_:(m_function_definition a1 b1)
+        ~else_:(fail ())


I am starting to doubt we need any of that.

What is the reason to have these identifications of different constructs? Won't it return weird results in search mode? Won't it enlarge the set of taint results in perhaps unintentional ways (if one has pattern-inside for example)?

Leaving a TODO for Part II: ensure this is what we want to do.

Hopefully this only happens when we have already matched the entity and we should be comparing two DefStmt definition.

dimitris-m · 2025-12-20T00:26:25Z

languages/elixir/generic/Elixir_to_generic.ml

+        | _ -> ());
+        super#visit_expr env expr


I think we can do:

| _ -> super#visit_expr env expr);

dimitris-m · 2025-12-20T02:39:29Z

languages/elixir/generic/Elixir_to_generic.ml

+    Structure: OtherExpr("ShortLambda", [Params [...]; S body_stmt])
+    This allows Naming_AST to create proper scope for the params. *)
+let convert_short_lambda (tok : Tok.t) (body_expr : G.expr) : G.expr =
+  let max_placeholder = find_max_placeholder body_expr in


When you match a pattern such as &(foo(... &3 ...)) how can you depend on the max placeholder?
It will only match short lambdas with exactly 3 parameters.
How about &(foo(...))? No parameters at all, and no match in target code.

I think that if we are to be able to match using such patterns, we need to treat patterns differently.

What I did in my current lang is to add an ellipsis after the max index (extra ellipsis parameter) only when parsing a pattern, so you can simply match any number of parameters with such patterns.

For the empty parameters case (no &n appears at all in the pattern), one should still add that ellipsis and it will match short lambdas of any parameter count. Better than no matches at all.

For Elixir the mechanics are there in languages/elixir/generic/Elixir_to_generic.ml: type env = Program | Pattern.

I recommend to add some tests about this material under tests/patterns/elixir/.

Also, tainting should work for a rule with:

- pattern-inside: | &($_(... $P ...)) - focus-metavariable: $P

and a sink like sink(...), for example:

&(sink &3)

should match I think.

TODO for Part II: changes as discussed offline.

dimitris-m · 2025-12-20T19:07:52Z

src/tainting/Object_initialization.ml

+            | G.TyN name ->
+                (* For anonymous classes, accept the interface name even if not in class_names *)
+                Some name


In that we could just remove then when from the case above.

dimitris-m · 2025-12-22T15:24:17Z

perf/opengrep-scripts/bench.py

@maciejpirog we have agreed to simplify this, leaving here as todo.

dimitris-m · 2025-12-22T15:37:05Z

languages/elixir/generic/Elixir_to_generic.ml

+let find_max_placeholder (e : G.expr) : int =
+  let max_found = ref 0 in
+  let visitor =
+    object
+      inherit [_] AST_generic.iter_no_id_info as super
+
+      method! visit_expr env expr =
+        (match expr.G.e with
+        | G.OtherExpr
+            (("PlaceHolder", _), [ G.E { e = G.L (G.Int (Some n, _)); _ } ]) ->
+            max_found := max !max_found (Int64.to_int n)
+        | _ -> super#visit_expr env expr)
+    end
+  in
+  visitor#visit_expr () e;
+  !max_found


TODO for Part II:

make class;

place max_found in env so the instance can be reused.

dimitris-m · 2025-12-22T15:37:32Z

languages/elixir/generic/Elixir_to_generic.ml

+    object
+      inherit [_] AST_generic.map as super
+
+      method! visit_expr env expr =
+        match expr.G.e with
+        | G.OtherExpr
+            (("PlaceHolder", _), [ G.E { e = G.L (G.Int (Some n, tk)); _ } ]) ->
+            let param_name = Printf.sprintf "&%Ld" n in
+            let param_id = (param_name, tk) in
+            G.N (G.Id (param_id, G.empty_id_info ())) |> G.e
+        | _ -> super#visit_expr env expr
+    end


TODO for Part II: make class and reuse instance.

dimitris-m · 2025-12-22T15:44:17Z

src/analyzing/Visit_function_defs.ml

+    val current_class : G.name option ref = ref None
+    val parent_path : IL.name option list ref = ref []
+
+    method! visit_definition f ((ent, def_kind) as def) =
+      match def_kind with


TODO for Part II:

make current_class and parent_path part of env (together with f)

make stateless and reuse 1 instance of the class.

- removed retries - removed the need of fixpoint_TIMEOUT by delegating to the fixpoint graph - improved the graph by adding HOFs to it as well as top level stuff

dimitris-m · 2025-12-22T15:57:08Z

src/engine/Match_tainting_mode.ml

+            match shared_call_graph with
+            | Some (graph, _shared_mappings) -> graph
+            | None ->
+                (* Compute call graph as before *)
+                Function_call_graph.build_call_graph ~lang
+                  ~object_mappings:all_object_mappings ast
+          in


Can this happen? Where is it stored after it's calculated?

TODO for Part II: Check if this can really happen, at least debug log if it does.

dimitris-m · 2025-12-22T16:08:28Z

src/tainting/Dataflow_tainting.ml

+              (* Fallback: try qualified function name (Module.function for Elixir, etc.) *)
+              let qualified_name =
                {
-                  ident = (class_str, snd method_name.ident);
+                  ident = (fst obj.ident ^ "." ^ fst method_name.ident, snd method_name.ident);


TODO for Part II: Can we use IdQualified here? We might have to extend things a bit.

dimitris-m · 2025-12-22T16:09:53Z

src/tainting/Dataflow_tainting.ml

+          | IL.Unnamed ({ e = Fetch lval; _ } as lambda_exp) ->
+              (* Single Fetch argument - check if it's a lambda by looking at its shape *)
+              (match Lval_env.find_lval env.lval_env lval with
+              | Some (S.Cell (_, shape)) ->
+                  (match shape with
+                  | S.Fun _fun_sig ->
+                      (* It's a function/lambda! *)
+                      Some (e, lambda_exp)
+                  | _ -> None)
+              | None -> None)
+          | _ -> None)
+      | _ -> None)


TODO for Part II: Simplify this with a more refined pattern (else None)

dimitris-m · 2025-12-22T16:21:16Z

src/tainting/Dataflow_tainting.ml

  let timeout =
-    if taint_inst.options.taint_intrafile then base_timeout *. 10.0
+    if taint_inst.options.taint_intrafile then base_timeout *. 20.0


TODO for Part II: timeout is not used any more (see Dataflow_core.ml). Remove.

dimitris-m · 2025-12-22T16:22:50Z

src/tainting/Dataflow_tainting.ml

+                   (* Create assumptions for lambda parameters *)
+                   (* Create assumptions for lambda parameters using Fold_IL_params *)


TODO for Part II: Unify comments.

dimitris-m · 2025-12-22T16:47:19Z

src/tainting/Function_call_graph.ml

TODO: Check for Part II

dimitris-m · 2025-12-22T17:22:15Z

src/tainting/Taint_signature_extractor.ml

+               (* Give the parameter an Arg shape so it can be used in HOF *)
+               let new_env = add_param_to_env il_lval taint_set taint_arg env in


TODO for Part II:
See below case of IL.PatternParm.
More cases needed probably.

dimitris-m

LGTM, let's merge this.

We have to make a follow-up, especially now that there is no taint_fixpoint_timeout, we need to remove the rule option and all usage of the parameter.

There are several TODO for part II and we should do these asap.

This MR contains the following updates: | Package | Update | Change | |---|---|---| | [opengrep/opengrep](https://github.com/opengrep/opengrep) | minor | `v1.13.2` → `v1.14.1` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>opengrep/opengrep (opengrep/opengrep)</summary> ### [`v1.14.1`](https://github.com/opengrep/opengrep/releases/tag/v1.14.1): Opengrep 1.14.1 [Compare Source](opengrep/opengrep@v1.14.0...v1.14.1) #### Improvements - Clojure translation part II by [@dimitris-m](https://github.com/dimitris-m) in [#517](opengrep/opengrep#517) - C#: Allow implicit variables in properties to be taint sources by [@maciejpirog](https://github.com/maciejpirog) in [#516](opengrep/opengrep#516) - Add core flags `dump_rule` and `dump_patterns_of_rule` as options in the show command by [@maciejpirog](https://github.com/maciejpirog) in [#519](opengrep/opengrep#519) #### Bug fixes - Fix: pass signature databaseb to lambda analysis, handle method mutation tainting by [@corneliuhoffman](https://github.com/corneliuhoffman) in [#520](opengrep/opengrep#520) #### Tech debt - Fix CHANGELOG.md, OPENGREP.md, remove unused files by [@dimitris-m](https://github.com/dimitris-m) in [#523](opengrep/opengrep#523) **Full Changelog**: <opengrep/opengrep@v1.14.0...v1.14.1> ### [`v1.14.0`](https://github.com/opengrep/opengrep/releases/tag/v1.14.0): Opengrep 1.14.0 [Compare Source](opengrep/opengrep@v1.13.2...v1.14.0) #### Improvements - Support for higher-order functions in intrafile taint analysis by [@corneliuhoffman](https://github.com/corneliuhoffman) in [#469](opengrep/opengrep#469) and [#513](opengrep/opengrep#513) - Clojure: Improved support for Clojure (incl. tainting) by [@dimitris-m](https://github.com/dimitris-m) in [#501](opengrep/opengrep#501) - Dart: Improved support for Dart by [@maciejpirog](https://github.com/maciejpirog) in [#508](opengrep/opengrep#508) - C#: Better handing of extension methods and extension blocks by [@maciejpirog](https://github.com/maciejpirog) in [#514](opengrep/opengrep#514) #### Fixes - Bump cygwin install action by [@dimitris-m](https://github.com/dimitris-m) in [#503](opengrep/opengrep#503) and [#509](opengrep/opengrep#509) **Full Changelog**: <opengrep/opengrep@v1.13.2...v1.14.0> </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).

corneliuhoffman requested review from dimitris-m, maciejpirog and willem-delbare as code owners November 20, 2025 17:55

dimitris-m removed the request for review from willem-delbare November 20, 2025 17:56

dimitris-m reviewed Nov 20, 2025

View reviewed changes

languages/elixir/generic/Elixir_to_generic.ml Outdated Show resolved Hide resolved

dimitris-m added the taint label Nov 20, 2025

corneliuhoffman force-pushed the HOFS branch 3 times, most recently from 99a9592 to 9d1318b Compare November 27, 2025 12:52

corneliuhoffman changed the title ~~WIP HOFs~~ HOFs Nov 27, 2025

corneliuhoffman force-pushed the HOFS branch 4 times, most recently from 5992bc2 to 1ebf5ab Compare November 27, 2025 16:48

dimitris-m added the intrafile label Nov 27, 2025

corneliuhoffman force-pushed the HOFS branch 2 times, most recently from f911e04 to 6a7d912 Compare November 28, 2025 13:22

dimitris-m mentioned this pull request Nov 28, 2025

intrafile: add a no-flag run on timeout #462

Closed

corneliuhoffman force-pushed the HOFS branch 4 times, most recently from 6ae650c to 0cd3d1c Compare December 4, 2025 17:26

dimitris-m reviewed Dec 4, 2025

View reviewed changes

src/core_scan/Core_scan.ml Show resolved Hide resolved

dimitris-m reviewed Dec 4, 2025

View reviewed changes

src/matching/Generic_vs_generic.ml Show resolved Hide resolved

dimitris-m reviewed Dec 4, 2025

View reviewed changes

src/matching/Generic_vs_generic.ml Show resolved Hide resolved

corneliuhoffman force-pushed the HOFS branch from 0cd3d1c to 1165c52 Compare December 4, 2025 17:45

dimitris-m reviewed Dec 4, 2025

View reviewed changes

src/tainting/Dataflow_tainting.ml Outdated Show resolved Hide resolved

dimitris-m reviewed Dec 5, 2025

View reviewed changes

src/analyzing/AST_to_IL.ml Outdated Show resolved Hide resolved

dimitris-m reviewed Dec 5, 2025

View reviewed changes

src/analyzing/AST_to_IL.ml Outdated Show resolved Hide resolved

dimitris-m reviewed Dec 5, 2025

View reviewed changes

benchmark_taint.sh Outdated Show resolved Hide resolved

dimitris-m reviewed Dec 18, 2025

View reviewed changes

dimitris-m reviewed Dec 20, 2025

View reviewed changes

dimitris-m reviewed Dec 22, 2025

View reviewed changes

perf/opengrep-scripts/bench.py

Copy link

Collaborator

dimitris-m Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maciejpirog we have agreed to simplify this, leaving here as todo.

dimitris-m reviewed Dec 22, 2025

View reviewed changes

complete HOFs

94bcb0e

dimitris-m reviewed Dec 22, 2025

View reviewed changes

Corneliu Hoffman added 3 commits December 22, 2025 15:45

removed fixpoint timout _ some other improvements

573e335

- removed retries - removed the need of fixpoint_TIMEOUT by delegating to the fixpoint graph - improved the graph by adding HOFs to it as well as top level stuff

graph approach

2e7b987

fixed the ToReturn bug also added the graph map

45767bb

corneliuhoffman force-pushed the HOFS branch from 034d431 to 45767bb Compare December 22, 2025 15:47

dimitris-m reviewed Dec 22, 2025

View reviewed changes

src/tainting/Function_call_graph.ml

Copy link

Collaborator

dimitris-m Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Check for Part II

dimitris-m reviewed Dec 22, 2025

View reviewed changes

dimitris-m approved these changes Dec 22, 2025

View reviewed changes

corneliuhoffman merged commit 1292b07 into main Dec 22, 2025
6 checks passed

corneliuhoffman deleted the HOFS branch December 22, 2025 17:25

maciejpirog mentioned this pull request Dec 31, 2025

Release v1.14.0 #515

Merged

		\| G.FuncDef a1, B.FieldDefColon { B.vinit = Some { e = B.Lambda b1; _ }; _ }
		\| B.FieldDefColon { B.vinit = Some { e = B.Lambda b1; _ }; _ }, G.FuncDef a1 ->

		\| G.FuncDef a1, B.VarDef { B.vinit = Some { e = B.Lambda b1; _ }; _ }
		\| B.VarDef { B.vinit = Some { e = B.Lambda b1; _ }; _ }, G.FuncDef a1 ->

		(* Create assumptions for lambda parameters *)
		(* Create assumptions for lambda parameters using Fold_IL_params *)

		(* Give the parameter an Arg shape so it can be used in HOF *)
		let new_env = add_param_to_env il_lval taint_set taint_arg env in

Conversation

corneliuhoffman commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Higher-Order Function (HOF) Taint Analysis - PR Documentation

Overview

Architecture Changes

1. Path-Based Function Identification (fn_id)

2. Builtin Signature Database

3. Labeled Call Graph with HOF Edges

4. Graph Reachability Optimization

5. Parent Path Visitor

6. ToSinkInCall Effect

7. Signature Lookup via Call Graph

8. Ruby Class Method Fix

9. Parameter Shape for HOF Detection

10. Match_tainting_mode Integration

Data Flow Example

Test Coverage

Configuration

Performance Considerations

Benchmark Script

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitris-m Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitris-m Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

corneliuhoffman commented Nov 20, 2025 •

edited

Loading

1. Path-Based Function Identification (`fn_id`)

dimitris-m Dec 18, 2025 •

edited

Loading

dimitris-m Dec 20, 2025 •

edited

Loading