Skip to content

HOFs#469

Merged
corneliuhoffman merged 4 commits intomainfrom
HOFS
Dec 22, 2025
Merged

HOFs#469
corneliuhoffman merged 4 commits intomainfrom
HOFS

Conversation

@corneliuhoffman
Copy link
Contributor

@corneliuhoffman corneliuhoffman commented Nov 20, 2025

Higher-Order Function (HOF) Taint Analysis - PR Documentation

Overview

This PR introduces comprehensive support for tracking taint through higher-order functions (HOFs) in opengrep's cross-function taint analysis.
Problem: Before this PR, taint analysis could not track data flow through HOF calls like:

let userInput = getSource();
let results = [userInput].map(x => x);  // Taint was lost here
sink(results[0]);  // Not detected as a vulnerability

Solution: We now model HOF behavior with ToSinkInCall effects that represent "call this callback with tainted data", combined with a call graph that tracks both direct calls and HOF callbacks for proper analysis ordering.


Architecture Changes

1. Path-Based Function Identification (fn_id)

File: src/tainting/Shape_and_sig.ml:695-710

Before:

type fn_id = { class_name : IL.name option; name : IL.name option }

After:

type fn_id = IL.name option list

Why: The old structure couldn't represent nested functions. The new path-based representation supports arbitrary nesting:

  • [None; Some "foo"] - Top-level function foo
  • [Some "MyClass"; Some "method"] - Method in a class
  • [None; Some "outer"; Some "inner"] - Nested function inner inside outer
  • [Some "Class"; Some "method"; Some "_tmp"] - Lambda inside a class method

Helper functions added:

let make_fn_id_with_class class_name method_name = [Some class_name; Some method_name]
let make_fn_id_no_class name = [None; Some name]
let show_fn_id fn_id = ... (* e.g., "MyClass::method::nested" *)
let get_fn_name fn_id = ... (* Extract last element *)

2. Builtin Signature Database

New File: src/tainting/Builtin_models.ml (471 lines)

Standard library HOFs (like Array.map, filter, Python's map()) don't have user-defined signatures. This module provides built-in models.

Key concept: ToSinkInCall effect represents "call a callback with tainted data":

Effect.ToSinkInCall {
  callee = <callback expression>;
  arg = { name = "callback"; index = 0 };  (* Which param is the callback *)
  args_taints = [tainted_data];            (* What data to pass *)
}

Three signature patterns:

  1. MethodHOF - arr.map(callback): Passes this (the array) to the callback

    (* Signature models: callback receives BThis (the receiver) *)
    let this_taint = Taint.{ orig = Var { base = BThis; offset = [] }; tokens = [] }
  2. FunctionHOF - map(callback, data): Passes one parameter to another

    (* Signature models: callback receives data from another parameter *)
    let data_param_taint = Taint.{ orig = Var { base = BArg data_arg; offset = [] }; ... }
  3. ReturningFunctionHOF - Ruby's arr.map returns a function that takes a block

    (* Returns Shape.Fun with nested signature for deferred callback *)
    let return_effect = Effect.ToReturn {
      data_taints = this_taint_set;
      data_shape = Shape.Fun returned_fun_sig;  (* The function shape *)
      ...
    }

Supported languages and functions (get_hof_configs):

Language Method HOFs Function HOFs
JavaScript/TypeScript map, flatMap, filter, forEach, find, findIndex, some, every, reduce, reduceRight -
Python - map, filter
Ruby - map, each, select, filter, flat_map, collect, find, detect (returning-function pattern)
Java map, filter, forEach, flatMap -
Kotlin map, filter, forEach, flatMap, find, any, all (arity 0 and 1) -
Swift map, filter, forEach, flatMap, compactMap, first, contains -
Scala map, filter, foreach, flatMap, find, exists, forall -
C# Select, Where, ForEach, SelectMany, First, Any, All -
Rust map, for_each, filter, flat_map, find, any, all -
PHP - array_map, array_filter, array_walk
Julia - map, foreach, filter
C++ - for_each (arity 3), transform (arity 4)
Elixir - Enum.map, Enum.each, Enum.filter, Enum.flat_map, Enum.find

3. Labeled Call Graph with HOF Edges

File: src/tainting/Function_call_graph.ml (major rewrite, +869 lines)

Before: Simple graph FuncGraph = Persistent.Digraph.Concrete(FuncVertex)

After: Labeled imperative graph with edge metadata:

type call_edge = {
  callee_fn_id : fn_id;   (* Full path of callee *)
  call_site : Tok.t;      (* Token at the call site *)
}

module FuncGraph = Graph.Imperative.Digraph.ConcreteLabeled(FuncVertex)(...)

Why labeled edges:

  • Multiple calls to the same function from different sites need to be distinguished
  • The call site token allows signature instantiation to find the correct callee
  • HOF callbacks need different edges than regular calls

Key functions:

  1. identify_callee - Resolves a call expression to its fn_id:

    • Checks nested functions in the same scope first
    • Then class methods
    • Then top-level functions
    • Handles this.method(), obj.method(), qualified names
  2. extract_calls - Extracts all (fn_id, call_site_tok) pairs from a function body

  3. extract_hof_callbacks - Extracts HOF callback edges:

    • Uses Builtin_models.get_hof_configs to know which methods are HOFs
    • Detects user-defined HOFs via detect_user_hof
    • Creates special HOF edges with fake tokens like <hof_callback:foo>
  4. detect_user_hof - Checks if a function calls any of its parameters:

    let detect_user_hof (fdef : G.function_definition) : (string * int) list

4. Graph Reachability Optimization

New File: src/tainting/Graph_functor.ml (132 lines)

For large codebases, analyzing every function is expensive. This module computes the minimal subgraph needed for a specific rule.

Key function: nearest_common_descendant_subgraph

(* Subgraph containing all paths from sources to sinks *)
val nearest_common_descendant_subgraph : graph -> vertex -> vertex -> graph

Algorithm:

  1. Compute forward-reachable sets from source functions (R1) and sink functions (R2)
  2. Intersect to get functions reachable from both (R1 ∩ R2)
  3. Find minimal SCCs in intersection (handles cycles from mutual recursion)
  4. Reverse BFS from minima to include all ancestors

Usage in Match_tainting_mode.ml:

let source_functions = find_functions_containing_ranges ast source_ranges in
let sink_functions = find_functions_containing_ranges ast sink_ranges in
let relevant_graph = compute_relevant_subgraph call_graph source_functions sink_functions in

5. Parent Path Visitor

File: src/analyzing/Visit_function_defs.ml (+167 lines)

New visitor: visitor_with_parent_path tracks the full path to each function during AST traversal.

class visitor_with_parent_path = object
  val parent_path : IL.name option list ref = ref []
  (* Tracks: [class; outer_func; inner_func; ...] *)
end

val fold_with_parent_path :
  ('acc -> G.entity option -> IL.name option list -> G.function_definition -> 'acc)
  -> 'acc -> G.program -> 'acc

Why: The old fold_with_class_context only tracked class names. For nested functions, we need the full path to build correct fn_ids.


6. ToSinkInCall Effect

File: src/tainting/Sig_inst.ml:38-48, 876-1107

New effect type added to signature instantiation:

type call_effect =
  | ToReturn of Effect.taints_to_return
  | ToSink of Effect.taints_to_sink
  | ToLval of Taint.taints * IL.name * Taint.offset list
  | ToSinkInCall of {       (* NEW *)
      callee : IL.exp;
      arg : Taint.arg;
      args_taints : Effect.args_taints;
    }

Instantiation logic (simplified):

  1. When encountering ToSinkInCall effect during signature instantiation:
  2. Look up the callback's signature (from lval_env shape, or via lookup_sig)
  3. Instantiate the callback with the provided args_taints
  4. If no signature found, preserve the effect for later resolution
  5. Add depth limiting to prevent infinite recursion on recursive HOFs

Key change: instantiate_function_signature now takes optional lookup_sig and depth parameters:

let rec instantiate_function_signature lval_env taint_sig
    ~callee ~args args_taints
    ?(lookup_sig : (IL.exp -> int -> Signature.t option) option)
    ?(depth : int = 0) () : call_effects option

7. Signature Lookup via Call Graph

File: src/tainting/Dataflow_tainting.ml:667-730

New function queries the call graph to resolve function calls:

let lookup_callee_from_graph graph caller_fn_id call_tok : fn_id option =
  (* Find edge with matching call_site token *)
  all_edges
  |> List.find_opt (fun edge ->
      Int.equal (Tok.compare label.call_site call_tok) 0)
  |> Option.map (fun edge -> label.callee_fn_id)

Lookup order:

  1. Check call graph for edge with matching call site token
  2. If found, use the pre-computed callee_fn_id from the edge label
  3. Fall back to direct name lookup
  4. Fall back to builtin signature database

8. Ruby Class Method Fix

File: languages/ruby/generic/ruby_to_generic.ml:794-815

Bug: Ruby class methods (def self.method_name) were being parsed with empty entity names, causing all class methods to collapse into a single node.

Before:

| SingletonM e ->
    let e = expr e in
    let ent = G.basic_entity ("", fake t "") in  (* Empty name! *)
    ...

After:

| SingletonM e ->
    let method_id =
      match e with
      | DotAccess (_, _, mn) -> (
          match method_name mn with
          | Left id -> id
          | Right _ -> ("", fake t ""))
      | ScopedId (Scope (_, _, SM mn)) -> ...
      | _ -> ("", fake t "")
    in
    let e = expr e in
    let ent = G.basic_entity method_id in  (* Correct name! *)
    ...

Impact: Call graph now correctly shows Service.default_integration and Service.closest_group_integration as separate nodes instead of both being Service..


9. Parameter Shape for HOF Detection

File: src/tainting/Taint_signature_extractor.ml:123-214

Parameters now get Shape.Arg shape instead of just taints:

let add_param_to_env il_lval taint_set taint_arg env =
  let param_shape = Shape.Arg taint_arg in
  Taint_lval_env.add_lval_shape il_lval taint_set param_shape env

Why: When a parameter is used as a callback in a HOF call, we need to know it's an "argument" to create proper ToSinkInCall effects. The Arg shape carries the parameter's index.


10. Match_tainting_mode Integration

File: src/engine/Match_tainting_mode.ml (+343 lines of changes)

Major refactoring to use the new infrastructure:

  1. Shared call graph: Computed once per file, shared across rules

    let shared_call_graph_opt =
      if taint_intrafile then
        Some (Function_call_graph.build_call_graph ~lang ~object_mappings ast)
      else None
  2. Relevant subgraph optimization:

    let source_functions = find_functions_containing_ranges ast source_ranges in
    let sink_functions = find_functions_containing_ranges ast sink_ranges in
    let relevant_graph = compute_relevant_subgraph call_graph source_functions sink_functions in
  3. Topological ordering for analysis:

    let analysis_order = Topo.fold (fun fn acc -> fn :: acc) relevant_graph [] |> List.rev
  4. Kotlin trailing lambda support: Functions with lambda as last parameter get signatures extracted at arity and arity-1:

    if Lang.equal lang Lang.Kotlin && arity >= 2 then
      let last_param_is_lambda = ... in
      if last_param_is_lambda then
        extract_signature_with_file_context ~arity:(arity - 1) ...

Data Flow Example

Consider this TypeScript code:

function source() { return tainted; }
function sink(x) { /* dangerous */ }
function process(items) {
  return items.map(x => transform(x));
}
function transform(x) { return x; }

let data = [source()];
let result = process(data);
sink(result[0]);

Analysis flow:

  1. Build call graph:

    <toplevel> --calls--> source (at source() call)
    <toplevel> --calls--> process (at process(data) call)
    <toplevel> --calls--> sink (at sink(...) call)
    process --calls--> transform (via map HOF callback edge)
    
  2. Compute relevant subgraph: Functions between source and sink

  3. Topological order: transform, process, <toplevel> (leaves first)

  4. Extract signatures:

    • transform: ToReturn { data_taints = {Arg(x, 0)} } (returns its input)
    • process:
      • map builtin has ToSinkInCall { callee = callback, args_taints = [BThis] }
      • When lambda x => transform(x) is analyzed, we look up transform's signature
      • Result: ToReturn { data_taints = {Arg(items, 0)} } (returns transformed items)
  5. Instantiate at call sites:

    • process(data) where data is tainted from source()
    • Substitute Arg(items, 0) → taint from data
    • Result flows to sink(result[0])FINDING!

Test Coverage

New test files (17 languages):

  • test_hof_comprehensive_*.{js,ts,py,rb,java,kt,scala,swift,cs,rs,lua,php,go,cpp,c,julia,ex}

Each test covers:

  1. Basic HOF taint propagation (map, filter, forEach)
  2. Chained HOFs (arr.map(...).filter(...))
  3. Reduce with accumulator
  4. Nested callbacks
  5. User-defined HOFs (functions that call their parameters)
  6. Cross-function taint through HOFs

Configuration

No new configuration options. HOF support is automatically enabled when taint_intrafile is true.


Performance Considerations

  1. Call graph is shared: Computed once per file, reused across all rules
  2. Subgraph optimization: Only functions on paths between sources and sinks are analyzed
  3. Depth limiting: Recursive HOF instantiation is limited by taint_MAX_VISITS_PER_NODE
  4. Builtin signatures are pre-computed: Language-specific HOF models are created once

Benchmark Script

A benchmark script is provided at benchmark_taint.sh for comparing OpenGrep vs Semgrep taint analysis performance:

# Usage
./benchmark_taint.sh [--no-semgrep] <yaml_file_or_dir> <target_file_or_dir> [num_runs] [sha1] [sha2] ...

# Examples
./benchmark_taint.sh rules/sqli.yaml target_repo/ 3
./benchmark_taint.sh --no-semgrep rules/ target_repo/ 5
./benchmark_taint.sh rules/ target_repo/ 3 main abc123  # Compare current vs main vs abc123

Options:

  • --no-semgrep: Skip Semgrep benchmarks (useful when Semgrep is not installed or for quick OpenGrep-only tests)
  • num_runs: Number of runs to average (default: 3)
  • sha1, sha2, ...: Git commits/branches to compare (will checkout, compile, and benchmark each)

Output: Summary table with findings count and timing (avg ± std) for each configuration

@corneliuhoffman corneliuhoffman force-pushed the HOFS branch 3 times, most recently from 99a9592 to 9d1318b Compare November 27, 2025 12:52
@corneliuhoffman corneliuhoffman changed the title WIP HOFs HOFs Nov 27, 2025
@corneliuhoffman corneliuhoffman force-pushed the HOFS branch 4 times, most recently from 5992bc2 to 1ebf5ab Compare November 27, 2025 16:48
@corneliuhoffman corneliuhoffman force-pushed the HOFS branch 2 times, most recently from f911e04 to 6a7d912 Compare November 28, 2025 13:22
@corneliuhoffman corneliuhoffman force-pushed the HOFS branch 4 times, most recently from 6ae650c to 0cd3d1c Compare December 4, 2025 17:26
Comment on lines +521 to +534
List.iteri (fun i fn_id ->
let name = match Shape_and_sig.get_fn_name fn_id with
| Some n -> fst n.IL.ident
| None -> "<no-name>"
in
Log.debug (fun m -> m "SUBGRAPH: source_function[%d] = %s" i name))
source_functions;
List.iteri (fun i fn_id ->
let name = match Shape_and_sig.get_fn_name fn_id with
| Some n -> fst n.IL.ident
| None -> "<no-name>"
in
Log.debug (fun m -> m "SUBGRAPH: sink_function[%d] = %s" i name))
sink_functions;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance reasons, it's best to only make these calculations inside the fun m -> ... since they are not needed outside of the debug context.

Comment on lines +584 to +588
| G.Param { G.ptype = Some ptype; _ } :: _ -> (
match ptype.G.t with
| G.TyFun _ -> true
| _ -> false)
| _ -> false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just:

                 | G.Param { G.ptype = Some {t = G.TyFun _; _}; _ } :: _ -> true
                 | _ -> false

* also extract signature with arity-1 to handle trailing lambda syntax:
* f(a, b) vs f(a) { b } *)
let updated_db =
if Lang.equal lang Lang.Kotlin && arity >= 2 then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not && arity >= 1 ?

Is it not the case that f(a) where a is a function is also f { a } ?

Comment on lines +3100 to +3101
| G.FuncDef a1, B.FieldDefColon { B.vinit = Some { e = B.Lambda b1; _ }; _ }
| B.FieldDefColon { B.vinit = Some { e = B.Lambda b1; _ }; _ }, G.FuncDef a1 ->
Copy link
Collaborator

@dimitris-m dimitris-m Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong: we are inverting the order of arguments, left is Pattern (G) and right is Source code (B).
But from the cases we should not assume that they are interchangeable, ie m_expr a b is not necessarily the same as m_expr b a.

We need 2 separate cases or maybe (if possible) rename the variables on the rhs of | ... to have a1 pattern and b1 code in both cases.

Comment on lines +3093 to +3094
| G.FuncDef a1, B.VarDef { B.vinit = Some { e = B.Lambda b1; _ }; _ }
| B.VarDef { B.vinit = Some { e = B.Lambda b1; _ }; _ }, G.FuncDef a1 ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as below.

Comment on lines +3092 to +3105
(* iso: FuncDef pattern can match VarDef with Lambda (arrow function) *)
| G.FuncDef a1, B.VarDef { B.vinit = Some { e = B.Lambda b1; _ }; _ }
| B.VarDef { B.vinit = Some { e = B.Lambda a1; _ }; _ }, G.FuncDef b1 ->
if_config
(fun x -> x.arrow_is_function)
~then_:(m_function_definition a1 b1)
~else_:(fail ())
(* iso: FuncDef pattern can match FieldDefColon with Lambda (arrow function in object literal) *)
| G.FuncDef a1, B.FieldDefColon { B.vinit = Some { e = B.Lambda b1; _ }; _ }
| B.FieldDefColon { B.vinit = Some { e = B.Lambda a1; _ }; _ }, G.FuncDef b1 ->
if_config
(fun x -> x.arrow_is_function)
~then_:(m_function_definition a1 b1)
~else_:(fail ())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am starting to doubt we need any of that.

What is the reason to have these identifications of different constructs? Won't it return weird results in search mode? Won't it enlarge the set of taint results in perhaps unintentional ways (if one has pattern-inside for example)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a TODO for Part II: ensure this is what we want to do.

Hopefully this only happens when we have already matched the entity and we should be comparing two DefStmt definition.

Comment on lines +213 to +214
| _ -> ());
super#visit_expr env expr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do:

| _ -> super#visit_expr env expr);

Structure: OtherExpr("ShortLambda", [Params [...]; S body_stmt])
This allows Naming_AST to create proper scope for the params. *)
let convert_short_lambda (tok : Tok.t) (body_expr : G.expr) : G.expr =
let max_placeholder = find_max_placeholder body_expr in
Copy link
Collaborator

@dimitris-m dimitris-m Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you match a pattern such as &(foo(... &3 ...)) how can you depend on the max placeholder?
It will only match short lambdas with exactly 3 parameters.
How about &(foo(...))? No parameters at all, and no match in target code.

I think that if we are to be able to match using such patterns, we need to treat patterns differently.

What I did in my current lang is to add an ellipsis after the max index (extra ellipsis parameter) only when parsing a pattern, so you can simply match any number of parameters with such patterns.

For the empty parameters case (no &n appears at all in the pattern), one should still add that ellipsis and it will match short lambdas of any parameter count. Better than no matches at all.

For Elixir the mechanics are there in languages/elixir/generic/Elixir_to_generic.ml: type env = Program | Pattern.

I recommend to add some tests about this material under tests/patterns/elixir/.

Also, tainting should work for a rule with:

- pattern-inside: |
       &($_(... $P ...))
- focus-metavariable: $P

and a sink like sink(...), for example:

&(sink &3)

should match I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II: changes as discussed offline.

Comment on lines +374 to +376
| G.TyN name ->
(* For anonymous classes, accept the interface name even if not in class_names *)
Some name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that we could just remove then when from the case above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maciejpirog we have agreed to simplify this, leaving here as todo.

Comment on lines +202 to +217
let find_max_placeholder (e : G.expr) : int =
let max_found = ref 0 in
let visitor =
object
inherit [_] AST_generic.iter_no_id_info as super

method! visit_expr env expr =
(match expr.G.e with
| G.OtherExpr
(("PlaceHolder", _), [ G.E { e = G.L (G.Int (Some n, _)); _ } ]) ->
max_found := max !max_found (Int64.to_int n)
| _ -> super#visit_expr env expr)
end
in
visitor#visit_expr () e;
!max_found
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II:

  • make class;
  • place max_found in env so the instance can be reused.

Comment on lines +223 to +234
object
inherit [_] AST_generic.map as super

method! visit_expr env expr =
match expr.G.e with
| G.OtherExpr
(("PlaceHolder", _), [ G.E { e = G.L (G.Int (Some n, tk)); _ } ]) ->
let param_name = Printf.sprintf "&%Ld" n in
let param_id = (param_name, tk) in
G.N (G.Id (param_id, G.empty_id_info ())) |> G.e
| _ -> super#visit_expr env expr
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II: make class and reuse instance.

Comment on lines +152 to +156
val current_class : G.name option ref = ref None
val parent_path : IL.name option list ref = ref []

method! visit_definition f ((ent, def_kind) as def) =
match def_kind with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II:

  • make current_class and parent_path part of env (together with f)
  • make stateless and reuse 1 instance of the class.

Corneliu Hoffman added 3 commits December 22, 2025 15:45
- removed retries
- removed the need of fixpoint_TIMEOUT by delegating to the fixpoint
graph
- improved the graph by adding HOFs to it as well as top level stuff
Comment on lines +498 to +504
match shared_call_graph with
| Some (graph, _shared_mappings) -> graph
| None ->
(* Compute call graph as before *)
Function_call_graph.build_call_graph ~lang
~object_mappings:all_object_mappings ast
in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen? Where is it stored after it's calculated?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II: Check if this can really happen, at least debug log if it does.

Comment on lines +877 to +880
(* Fallback: try qualified function name (Module.function for Elixir, etc.) *)
let qualified_name =
{
ident = (class_str, snd method_name.ident);
ident = (fst obj.ident ^ "." ^ fst method_name.ident, snd method_name.ident);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II: Can we use IdQualified here? We might have to extend things a bit.

Comment on lines +1908 to +1919
| IL.Unnamed ({ e = Fetch lval; _ } as lambda_exp) ->
(* Single Fetch argument - check if it's a lambda by looking at its shape *)
(match Lval_env.find_lval env.lval_env lval with
| Some (S.Cell (_, shape)) ->
(match shape with
| S.Fun _fun_sig ->
(* It's a function/lambda! *)
Some (e, lambda_exp)
| _ -> None)
| None -> None)
| _ -> None)
| _ -> None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II: Simplify this with a more refined pattern (else None)

Comment on lines 2873 to +2874
let timeout =
if taint_inst.options.taint_intrafile then base_timeout *. 10.0
if taint_inst.options.taint_intrafile then base_timeout *. 20.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II: timeout is not used any more (see Dataflow_core.ml). Remove.

Comment on lines +2946 to +2947
(* Create assumptions for lambda parameters *)
(* Create assumptions for lambda parameters using Fold_IL_params *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II: Unify comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Check for Part II

Comment on lines +169 to +170
(* Give the parameter an Arg shape so it can be used in HOF *)
let new_env = add_param_to_env il_lval taint_set taint_arg env in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Part II:
See below case of IL.PatternParm.
More cases needed probably.

Copy link
Collaborator

@dimitris-m dimitris-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's merge this.

We have to make a follow-up, especially now that there is no taint_fixpoint_timeout, we need to remove the rule option and all usage of the parameter.

There are several TODO for part II and we should do these asap.

@corneliuhoffman corneliuhoffman merged commit 1292b07 into main Dec 22, 2025
6 checks passed
@corneliuhoffman corneliuhoffman deleted the HOFS branch December 22, 2025 17:25
@maciejpirog maciejpirog mentioned this pull request Dec 31, 2025
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Jan 9, 2026
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [opengrep/opengrep](https://github.com/opengrep/opengrep) | minor | `v1.13.2` → `v1.14.1` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>opengrep/opengrep (opengrep/opengrep)</summary>

### [`v1.14.1`](https://github.com/opengrep/opengrep/releases/tag/v1.14.1): Opengrep 1.14.1

[Compare Source](opengrep/opengrep@v1.14.0...v1.14.1)

#### Improvements

- Clojure translation part II by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;517](opengrep/opengrep#517)
- C#: Allow implicit variables in properties to be taint sources by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;516](opengrep/opengrep#516)
- Add core flags `dump_rule` and `dump_patterns_of_rule` as options in the show command by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;519](opengrep/opengrep#519)

#### Bug fixes

- Fix: pass signature databaseb to lambda analysis, handle method mutation tainting by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;520](opengrep/opengrep#520)

#### Tech debt

- Fix CHANGELOG.md, OPENGREP.md, remove unused files by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;523](opengrep/opengrep#523)

**Full Changelog**: <opengrep/opengrep@v1.14.0...v1.14.1>

### [`v1.14.0`](https://github.com/opengrep/opengrep/releases/tag/v1.14.0): Opengrep 1.14.0

[Compare Source](opengrep/opengrep@v1.13.2...v1.14.0)

#### Improvements

- Support for higher-order functions in intrafile taint analysis by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;469](opengrep/opengrep#469) and [#&#8203;513](opengrep/opengrep#513)
- Clojure: Improved support for Clojure (incl. tainting) by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;501](opengrep/opengrep#501)
- Dart: Improved support for Dart by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;508](opengrep/opengrep#508)
- C#: Better handing of extension methods and extension blocks by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;514](opengrep/opengrep#514)

#### Fixes

- Bump cygwin install action by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;503](opengrep/opengrep#503) and [#&#8203;509](opengrep/opengrep#509)

**Full Changelog**: <opengrep/opengrep@v1.13.2...v1.14.0>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi42OS4yIiwidXBkYXRlZEluVmVyIjoiNDIuNjkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90IiwiYXV0b21hdGlvbjpib3QtYXV0aG9yZWQiLCJkZXBlbmRlbmN5LXR5cGU6Om1pbm9yIl19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants