Skip to content

Clojure translation part II#517

Merged
dimitris-m merged 12 commits intomainfrom
dm/clojure-part-ii
Jan 6, 2026
Merged

Clojure translation part II#517
dimitris-m merged 12 commits intomainfrom
dm/clojure-part-ii

Conversation

@dimitris-m
Copy link
Collaborator

@dimitris-m dimitris-m commented Jan 2, 2026

Changes

  • Removes synthetic variables from the intermediate variables, since the tokens used are real tokens with different content in the target file. This improves dataflow traces in text / json / sarif.

  • Improves printing of matches which in some cases failed with macroexpanded code.

  • Encodes the special ops as is done in other languages.

  • Adds loop and recur (this can be used with loop or functions, in tail position). Note: I did not add a test for defn + recur because it made me realise we have other issues that are related and need to be fixed, and these are orthogonal to the translation.

  • Improves patterns:

    • (...) matches a block and not a Call(..., []);
    • ..., when function parameters are expected, becomes [...];
    • ..., when a single name is expected, becomes $_.
  • Fixes 1.14.0 - Rule parse error in rule clojure.lang.security.documentbuilderfactory-xxe.documentbuilderfactory-xxe #518.

  • Removes the IL macroexpansion mechanics; this approach ended up being too problematic compared to macroexpansion at AST generic translation time.

  • Adds comments and TODOs: some will be dealt with in this PR.

  • There will be Part III.

@dimitris-m dimitris-m removed the request for review from willem-delbare January 2, 2026 17:04
@dimitris-m dimitris-m added taint lang Add or improve language support labels Jan 2, 2026
In macroexpansion for `cond->` and `cond->>`, we introduce fake
variables in synthetic let bindings, and we don't want these to pollute
the `intermediate_variables` produced in outputs.

This is because these fake variables reuse real tokens that have no
relevance in terms of content, but are the closest and most reasonable
choices for such variables. Showing them is bound to confuse tooling
downstream.
Without this edit, adding `--dataflow-traces` will sometimes fail to
print traces where a location spans more than 1 line. This happens with
clojure macroexpanded code.
This fixes an small issue where intermediate variable tokens derived
from qualified x/y in associative destructuring included the slash, and
printed in json and sarif as /y.
Note that `($F ...)` will no longer match such expressions;
but `(:K ...)` will, and similarly for `(::kwd exp)` and `(::$K ...)`.
If we are to do this, we need to cover more cases as detailed in the
comment.
@dimitris-m dimitris-m force-pushed the dm/clojure-part-ii branch 6 times, most recently from 3721920 to 2968876 Compare January 6, 2026 00:36
@dimitris-m dimitris-m changed the title WIP: Clojure translation part ii Clojure translation part II Jan 6, 2026
@dimitris-m dimitris-m force-pushed the dm/clojure-part-ii branch 2 times, most recently from 6e9a2aa to a65c5b2 Compare January 6, 2026 02:09
let is_macroexpandable (todo_kind : string) =
match todo_kind with
| "->" | "->>" | "cond->" | "cond->>" | "as->" | "ShortLambda"
| "as->" | "ShortLambda"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait so the -> liek things are no longer macroexpandable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, but at translation to generic.
There were 2 methods: during IL translation and during to-generic.

Now what is kept is to-generic. That means we shuffle CST stuff not generic AST.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until before this commit I had both methods there, hoping to resurrect the IL method but I decided to abandon it.


let qualified_name_regex_str = "^\\(.+\\)/\\(.+\\)$"

let fake_variable_ident = "G__1111"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh common, call it G_666

Copy link
Contributor

@corneliuhoffman corneliuhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really liked the details in the recur/loop stuff ... this is better and better, soon will be a flag language for us

@dimitris-m dimitris-m force-pushed the dm/clojure-part-ii branch 2 times, most recently from c1319bf to c48b272 Compare January 6, 2026 13:11
We now relax tha parsing of clojure functions, to ensure that
semgrep-rules don't fail to get loaded because of parsing error.

The change is moderate:

- Translate `...` to `[...]` when expecting function arguments.
  This enables the pattern `(defn $F ... ...)` which otherwise fails
  to parse.
- Convert `...` to `$_` in contexts where a single name is expected,
  for example in function name position.
- We interpret `( ... )` as a block, not as function call Call(..., []).
  Similarly for ( ... e1 e2) etc.
Decided to abandon this avenue. Too many issues.
@dimitris-m dimitris-m merged commit 5dd8f3e into main Jan 6, 2026
3 checks passed
@dimitris-m dimitris-m deleted the dm/clojure-part-ii branch January 6, 2026 13:24
@dimitris-m dimitris-m mentioned this pull request Jan 6, 2026
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Jan 9, 2026
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [opengrep/opengrep](https://github.com/opengrep/opengrep) | minor | `v1.13.2` → `v1.14.1` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>opengrep/opengrep (opengrep/opengrep)</summary>

### [`v1.14.1`](https://github.com/opengrep/opengrep/releases/tag/v1.14.1): Opengrep 1.14.1

[Compare Source](opengrep/opengrep@v1.14.0...v1.14.1)

#### Improvements

- Clojure translation part II by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;517](opengrep/opengrep#517)
- C#: Allow implicit variables in properties to be taint sources by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;516](opengrep/opengrep#516)
- Add core flags `dump_rule` and `dump_patterns_of_rule` as options in the show command by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;519](opengrep/opengrep#519)

#### Bug fixes

- Fix: pass signature databaseb to lambda analysis, handle method mutation tainting by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;520](opengrep/opengrep#520)

#### Tech debt

- Fix CHANGELOG.md, OPENGREP.md, remove unused files by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;523](opengrep/opengrep#523)

**Full Changelog**: <opengrep/opengrep@v1.14.0...v1.14.1>

### [`v1.14.0`](https://github.com/opengrep/opengrep/releases/tag/v1.14.0): Opengrep 1.14.0

[Compare Source](opengrep/opengrep@v1.13.2...v1.14.0)

#### Improvements

- Support for higher-order functions in intrafile taint analysis by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;469](opengrep/opengrep#469) and [#&#8203;513](opengrep/opengrep#513)
- Clojure: Improved support for Clojure (incl. tainting) by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;501](opengrep/opengrep#501)
- Dart: Improved support for Dart by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;508](opengrep/opengrep#508)
- C#: Better handing of extension methods and extension blocks by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;514](opengrep/opengrep#514)

#### Fixes

- Bump cygwin install action by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;503](opengrep/opengrep#503) and [#&#8203;509](opengrep/opengrep#509)

**Full Changelog**: <opengrep/opengrep@v1.13.2...v1.14.0>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi42OS4yIiwidXBkYXRlZEluVmVyIjoiNDIuNjkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90IiwiYXV0b21hdGlvbjpib3QtYXV0aG9yZWQiLCJkZXBlbmRlbmN5LXR5cGU6Om1pbm9yIl19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lang Add or improve language support taint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1.14.0 - Rule parse error in rule clojure.lang.security.documentbuilderfactory-xxe.documentbuilderfactory-xxe

3 participants