Skip to content

Improve handling of patterns and AST_to_IL translation#483

Merged
dimitris-m merged 9 commits intomainfrom
dm/fix-tainting-encodings-patterns
Dec 9, 2025
Merged

Improve handling of patterns and AST_to_IL translation#483
dimitris-m merged 9 commits intomainfrom
dm/fix-tainting-encodings-patterns

Conversation

@dimitris-m
Copy link
Collaborator

@dimitris-m dimitris-m commented Dec 8, 2025

Fixes an number of issues, and improves matching and tainting:

  • In switch / match statements containing a list pattern of the shape [..., $P, ...] matching did not work as expected, specifically ... was ignored and it would only match lists of length 3. The same applies for tuples.

  • Patterns in switches and lambdas with destructuring (example: Elixir) did not introduce proper variables, so e.g. the Elixir lambda fn x -> sink(x) which is internally translated into a switch with cases did not report taint. This is now fixed in the IL translation. Almost all kinds of patterns are now translated and taint propagates through nested destructuring patterns. See 0850936.

  • Fixed a bug that caused stuff like Elixir lambdas and OCaml functions to not match at all for some rules. See e520986.

  • Improved matching so that e.g. we can create a search rule that matches the identity function or more generally where e.g. a parameter is returned. See cf44d63.

  • Changed the translation of Elixir lambdas so it uses better supported features, resulting in more fine-grained control of taint in guard conditions (previous translation was failing to distinguish that the last lambda in our test is actually ok, now this is captured).

See new and adapted files under tests/

These parameters are introduced in encodings (Elixir, OCaml) and they
are not expected to be found in the source files.

However, we were looking for them as if they were normal identifiers,
and the rules containing them were never applicable... hence no
findings.

From now on, it's important to only introduce such special identifiers
that begin with the above prefix, so that they are properly excluded.
- Makes `[... $P ...]` work as expected, previously patterns needed to
match exactly in length and ellipsis did not work as one would assume.
The same applies to tuples.
- A lot of AST_to_IL changes:
  - Ensures that destructuring values in Switch creates new variables
  like LetPattern, since a lot of encodings won't work for tainting
  without this modification (Elixir, OCaml, and more which are
  upcoming...).
  - Expands the patterns that can be encoded into IL so that tainting
  works in more cases (PatAs, PatWhen, and more).
  - Ensures that the custom pattern used in Elixir is processed in a
  better way, so that a lot of cases can now work (multiple
  destructuring branches in `fn` now work in tainting rules).
- Ensures that `PatAs(pat, x)` introduces as new variable `x` as
expected, intuitively.
This is in semgrep-rules, and we don't touch the submodule as a general
principle.
@dimitris-m dimitris-m removed the request for review from willem-delbare December 8, 2025 22:11
@dimitris-m dimitris-m added bug Something isn't working enhancement New feature or request taint labels Dec 8, 2025
- uses PatWhen;
- has better support for focus-metavariable that can capture taints
  in guard conditions.
This is no longer needed (see previous commit).
* ignored, since they are not expected to be found in the source file. *)
let implicit_param = "!!_implicit_param!"
let implicit_param_id tk = (implicit_param, tk)
let is_implicit_param s = String.starts_with ~prefix:"!!_implicit_param!" s
Copy link
Contributor

@maciejpirog maciejpirog Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use an ID flag hidden for this? The comment in https://github.com/opengrep/opengrep/blob/main/libs/ast_generic/IdFlags.mli#L10 suggests this is exactly its purpose. The way it is done now, we have two mechanisms for the same thing, or am I confusing the use-cases for the hidden flag?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the TODO in Pattern.ml... maybe we can do this on a next PR?

* this kind of pattern. But I wonder if it would be pointless
* to bother with it. *)
(env, lval, pat_stmts @ guard_stmts)
(* XXX: This is for Elixir lambdas, and it's incomplete...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my previous attempt :)

@maciejpirog
Copy link
Contributor

Do we know if this is related to #452 ?

@dimitris-m dimitris-m merged commit 902030e into main Dec 9, 2025
6 checks passed
@dimitris-m dimitris-m deleted the dm/fix-tainting-encodings-patterns branch December 9, 2025 14:15
This was referenced Dec 9, 2025
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Dec 18, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [opengrep/opengrep](https://github.com/opengrep/opengrep) | minor | `v1.12.1` -> `v1.13.2` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>opengrep/opengrep (opengrep/opengrep)</summary>

### [`v1.13.2`](https://github.com/opengrep/opengrep/releases/tag/v1.13.2): Opengrep 1.13.2

[Compare Source](opengrep/opengrep@v1.13.1...v1.13.2)

#### Improvements

- C#: Add matching on function argument modifiers (ref, in, scoped, etc.) by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;494](opengrep/opengrep#494)
- C#: Support extension blocks by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;496](opengrep/opengrep#496)

#### Release process

- Validate tag input on release by [@&#8203;lae](https://github.com/lae) in [#&#8203;493](opengrep/opengrep#493)

**Full Changelog**: <opengrep/opengrep@v1.13.1...v1.13.2>

### [`v1.13.1`](https://github.com/opengrep/opengrep/releases/tag/v1.13.1): Opengrep 1.13.1

[Compare Source](opengrep/opengrep@v1.12.1...v1.13.1)

#### Improvements

- Improve handling of patterns and `AST_to_IL` translation by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;483](opengrep/opengrep#483)
- Improve rust tainting by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;485](opengrep/opengrep#485)
- Dump generic AST to HTML by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;484](opengrep/opengrep#484)
- Modernise C# by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;487](opengrep/opengrep#487)

#### Bug fixes

- Fix for kotlin double-annotation bug by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;480](opengrep/opengrep#480)
- Fix PCRE2 test making OSX build fail by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;486](opengrep/opengrep#486)
- Fix: in `LetPattern(pat, e)`, `e` should be visited first by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;488](opengrep/opengrep#488)

#### CI fixes

- Force python 3.13 for osx binary workflow by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;490](opengrep/opengrep#490)

##### Notes

- Version 1.13.0 ([#&#8203;489](opengrep/opengrep#489)) intentionally skipped due to CI errors, fixed in [#&#8203;490](opengrep/opengrep#490).

**Full Changelog**: <opengrep/opengrep@v1.12.1...v1.13.1>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi41Ny4xIiwidXBkYXRlZEluVmVyIjoiNDIuNTcuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request taint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants