Skip to content

Clojure translation that supports tainting#501

Merged
dimitris-m merged 33 commits intomainfrom
dm/unraw-clojure
Dec 31, 2025
Merged

Clojure translation that supports tainting#501
dimitris-m merged 33 commits intomainfrom
dm/unraw-clojure

Conversation

@dimitris-m
Copy link
Collaborator

@dimitris-m dimitris-m commented Dec 23, 2025

Pre-requisites

Description

  • Each commit is self-contained.
  • We support (reasonably well) a large part of clojure syntax including common macros like ->, ->>, as-> etc.
  • There will be part 2 soon.

@dimitris-m dimitris-m removed the request for review from willem-delbare December 23, 2025 11:32
@dimitris-m dimitris-m force-pushed the dm/unraw-clojure branch 7 times, most recently from d630851 to 289b1c5 Compare December 26, 2025 16:19
@dimitris-m dimitris-m force-pushed the dm/unraw-clojure branch 2 times, most recently from ec3d9af to ae94211 Compare December 30, 2025 21:46
* is not a valid binding form. We could simply expand once here
* for such cases... but it's simply not worth the effort.
*)
let insert_threaded insert_pos v e =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh man can we rename this e ? {e with e =...} messes with my head

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes 😅

Macroexpansion_error ("Threading binding forms is not implemented.",
G.E e)
| G.Lambda _ ->
(* Actually ((->> 4 (fn [x])) 1) => 4 but is there a sane person
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not unheard of if you chain a lot of functions that are thread-first but at one point one of them is thread_last

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed his works now with the eager approach to macroexpansion for threaded forms.

match List.rev tests_and_exprs with
| G.E last_expr :: G.E last_test :: rest_rev ->
(last_expr, last_test, List.rev rest_rev)
| _ -> assert false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some Log of why it failed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, here we could in fact have 1 expression or test (user error) I think, need to look into this.

But: this is not used now, in favor of direct macroexpansion (see map_cond_thread_first_last_form in Parse_clojure_tree_sitter.ml).

G.OtherExpr (("ExprBlock", G.fake "cond-threaded-block"), body_exprs)
|> G.e
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit hard to read and has several similar assert false. If I underand this correctly you do the following:
separate the (last_test, last_expr) pair, then split the begining into pairs (test, cond), then traverse that list creating "lets" and "ifs" as needed and at the end you make a final if.

Is it not more natural to split the entire list into a list (test, expr) like you do (with a log if it is odd length) and then recursively run on it and if it is long you create the let and if from the head and add to the result of recursion and if it is only one element you close? I think it will be shorter and easier to read perhaps? This is of course my nitpick, not necessary now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is...

It's not used now, there's an equally unreadable version in the main Parse_ file!

| [] -> [if_case]
| [ G.E else_ ] ->
G.case_of_pat_and_expr
(G.PatLiteral (G.Null (G.fake "nil")), else_) :: if_case :: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why the else is tested first?

Copy link
Collaborator Author

@dimitris-m dimitris-m Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because with improvements in tainting / constant propagation etc, the Null can be caught, while for the other case there is no test (except if we added a more explicit "not nil").

| _ ->
G.PatId (param_id, G.empty_id_info ()))
in
(* What is the number of parameters in a pattern like #($_ ... %2 ...)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit worried about the #($_ ... % ...) catching any number of params? Is that the case?

Copy link
Collaborator Author

@dimitris-m dimitris-m Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= 1 params since % is %1

let id_after = (after, t_after) in
GH.name_of_ids ?name_top (ids_before @ [ id_after ])
else
let should_add_colon =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit of repetition here

(* ------------------------------------------------------------------------- *)

let exprstmt e = s (ExprStmt (e, sc))
let exprstmt e =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry why is this here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is something I thought I could optimise, but it did not work, and decided to document the situation...

let arg_list_unwrapped =
List_.map (function
| G.Arg expr -> expr
| _ -> failwith "Expected G.Arg")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not happen.

(G.List, Tok.unsafe_fake_bracket arg_list_unwrapped)
|> G.e) ]
in
call_generic env ~void tok eorig e (Tok.unsafe_fake_bracket arg_container)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so all functio calls have one argument which is a list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.... this is because of the encoding where we have 1 implicit parameter which we pattern match on with Switch.

Copy link
Contributor

@corneliuhoffman corneliuhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments

method! visit_resolved_name _env (_, sid) =
ok <- ok && ff sid;
if not ok then raise Exit
method! visit_resolved_name ff (_, sid) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)))))

if not (ff sid) then raise Exit
end

let no_cycles_in_sym_prop_visitor_instance = new no_cycles_in_sym_prop_visitor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason this is outside the function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! Why create an object each time if we can share one?

super#visit_pattern store pat
end

let pat_id_visitor_instance = new pat_id_visitor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as before

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as before

Be aware: this commit will not stay for long as is!
The `some->` translation is not accurate but better than nothing for
now.
We now convert a pattern:

```
(-> e e1 ... e2)
```

not to:

`(e2 (... (e1 e)))`

but to:

`(e2 (<... (e1 e) ...>))`

which is closer to the intention.

That is, the macroexpansion of the `->` requires this ellipsis to become
deep ellipsis.
We can now match arbitrary sequences of test & expr pairs in cond-> /
cond->>.
This allows to catch more intrafile / hof tainting cases.
This allows to capture more tainting; see test.
This is a small improvement: we don't always have to use ... ... to
match a sequence of test-expr pairs.
@dimitris-m dimitris-m changed the title WIP: Clojure translation that supports tainting Clojure translation that supports tainting Dec 31, 2025
@dimitris-m dimitris-m force-pushed the dm/unraw-clojure branch 2 times, most recently from 755299c to 9a45ba5 Compare December 31, 2025 12:30
This is because of keyboxd issues that suddenly appeared, will be fixed
later.
@dimitris-m dimitris-m merged commit 1ad74d5 into main Dec 31, 2025
1 check passed
@dimitris-m dimitris-m deleted the dm/unraw-clojure branch December 31, 2025 12:34
@maciejpirog maciejpirog mentioned this pull request Dec 31, 2025
@dimitris-m dimitris-m added taint lang Add or improve language support labels Dec 31, 2025
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Jan 9, 2026
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [opengrep/opengrep](https://github.com/opengrep/opengrep) | minor | `v1.13.2` → `v1.14.1` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>opengrep/opengrep (opengrep/opengrep)</summary>

### [`v1.14.1`](https://github.com/opengrep/opengrep/releases/tag/v1.14.1): Opengrep 1.14.1

[Compare Source](opengrep/opengrep@v1.14.0...v1.14.1)

#### Improvements

- Clojure translation part II by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;517](opengrep/opengrep#517)
- C#: Allow implicit variables in properties to be taint sources by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;516](opengrep/opengrep#516)
- Add core flags `dump_rule` and `dump_patterns_of_rule` as options in the show command by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;519](opengrep/opengrep#519)

#### Bug fixes

- Fix: pass signature databaseb to lambda analysis, handle method mutation tainting by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;520](opengrep/opengrep#520)

#### Tech debt

- Fix CHANGELOG.md, OPENGREP.md, remove unused files by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;523](opengrep/opengrep#523)

**Full Changelog**: <opengrep/opengrep@v1.14.0...v1.14.1>

### [`v1.14.0`](https://github.com/opengrep/opengrep/releases/tag/v1.14.0): Opengrep 1.14.0

[Compare Source](opengrep/opengrep@v1.13.2...v1.14.0)

#### Improvements

- Support for higher-order functions in intrafile taint analysis by [@&#8203;corneliuhoffman](https://github.com/corneliuhoffman) in [#&#8203;469](opengrep/opengrep#469) and [#&#8203;513](opengrep/opengrep#513)
- Clojure: Improved support for Clojure (incl. tainting) by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;501](opengrep/opengrep#501)
- Dart: Improved support for Dart by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;508](opengrep/opengrep#508)
- C#: Better handing of extension methods and extension blocks by [@&#8203;maciejpirog](https://github.com/maciejpirog) in [#&#8203;514](opengrep/opengrep#514)

#### Fixes

- Bump cygwin install action by [@&#8203;dimitris-m](https://github.com/dimitris-m) in [#&#8203;503](opengrep/opengrep#503) and [#&#8203;509](opengrep/opengrep#509)

**Full Changelog**: <opengrep/opengrep@v1.13.2...v1.14.0>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi42OS4yIiwidXBkYXRlZEluVmVyIjoiNDIuNjkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90IiwiYXV0b21hdGlvbjpib3QtYXV0aG9yZWQiLCJkZXBlbmRlbmN5LXR5cGU6Om1pbm9yIl19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lang Add or improve language support taint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants