Add `setjmp` Analysis by bottbenj · Pull Request #764 · goblint/analyzer

bottbenj · 2022-06-21T06:28:57Z

Adds an analysis to handle the setjmp and longjmp functions.

The setjmp analysis collects local state of the other analyses with the Longjmp and VolatileLocals query and then returns it when required with the Setjmp event. Due to the path insensitive nature of analysis globals it is not necessary to touch them.

The setjump analysis is able to collect arbitrary data from all analyses by using a heterogeneous map, but this also means it is not able to affect any specific state and can only perform domain operations on the whole map. This map domain is based on the hmap library, but included directly as it is pretty abandoned and it needed to add various functions to be able to implement a domain based on it.

This is currently only used by base, and i am not exactly sure which other analyses also benefit from support for it.

michael-schwarz · 2022-06-21T06:52:01Z

Thank you for the PR, it is nice to see Goblint getting this feature, I think it sets us apart from a lot of the competition.

src/domains/mapDomainHeterogeneous.ml

+
+  let leq a b =
+    let result = leq_with_fct { equal = (fun (type a) k a b -> let module R = (val (fst (M.Key.info k) : a mt)) in R.leq a b) } a b in
+    if Messages.tracing && not (M.is_empty a && M.is_empty b) then Messages.tracel "hetmap" "%s\n%a\n%a\n%b\n" "leq" pretty a pretty b result;


src/domains/mapDomainHeterogeneous.ml

+
+  let logable str f a b =
+    let result = f a b in
+    if Messages.tracing && not (M.is_empty a && M.is_empty b) then Messages.tracel "hetmap" "%s\n%a\n%a\n%a\n" str pretty a pretty b pretty result;


sim642

Thanks for the support of this weird C feature. Allows us to tick another soundiness box: http://soundiness.org/.

sim642 · 2022-06-21T07:15:50Z

src/analyses/base.ml

+  let setjmp_return_token = HM.token (module VD) (VD.zero_init_value (TInt (IInt, []))) "base: return"
+  let setjmp_globals_token = HM.token (module D) (D.bot ()) "base: globals"
+  let setjmp_volatiles_token = HM.token (module D) (D.bot ()) "base: volatiles"


HM.token has hidden internal state, which makes it incompatible with our incremental analysis since lookups from the hmap are based on those internally given IDs, which must match in order for the lookups to find anything at all.
I suppose it could coincidentally work for now since these are the only tokens created in a fixed order, but even in that case a TODO comment would be useful here pointing out the hidden global state, in case we ever encounter issues with it.

sim642 · 2022-06-21T07:19:36Z

src/analyses/libraryFunctions.ml

+    ("longjmp", special [__ "env" []; __ "value" [r]] @@ fun env value -> Longjmp { env; value } );
+    ("_longjmp", special [__ "env" []; __ "value" [r]] @@ fun env value -> Longjmp { env; value } );


The value/status argument of longjmp has type int, which is not a pointer. Therefore the r specification for them is spurious: function call arguments are always read, but the specification is about accesses through pointer arguments.

sim642 · 2022-06-21T07:23:30Z

src/analyses/setjmpAnalysis.ml

+
+  let name () = "setjmp"
+
+  module D = Queries.LS (* if the local method has performed a setjmp *)


What does this mean? "if the local method has performed a setjmp" describes the local state as a boolean, but it's actually a set of lvalues.

What is the reason for keeping the set of jumpbuffers in the local state?

sim642 · 2022-06-21T07:27:29Z

src/analyses/setjmpAnalysis.ml

+        (** explicit join with the old value is required or 0 can be lost from the possible setjmp return values. *)
+        let getg = (ctx.global var) in
+        let g = (HM.join data getg) in
+        if M.tracing then M.tracel "setjmp" "Longjmp: %a\n+ %a\n-> %a\n" HM.pretty data HM.pretty getg HM.pretty g;
+        ctx.sideg var g)


Side effects are by definition always joined into the existing state of the variable within the solver, so this seems unnecessary. If that isn't happening as intended, maybe there's some problem in the implementation of HM as a lattice.

sim642 · 2022-06-21T07:31:01Z

src/analyses/setjmpAnalysis.ml

+      (* join the state from all the setjump buffers, env may refer to (generally just one global) *)
+      let vars = ctx.ask (Queries.MayPointTo env) in
+      if M.tracing then M.traceli "setjmp" "Setjmp vars: [%a]\n" D.pretty vars;
+      let data = List.fold HM.join (HM.top ()) (List.map ctx.global (D.elements vars |> List.map fst)) in


This fold is a HM.join and the identity element for join is HM.bot, so there's something off here. Joining with top (as it happens here) always should yield top, which would make this entire fold useless.

sim642 · 2022-06-21T07:43:49Z

tests/regression/57-setjmp/01-setjmp-return.c

@@ -0,0 +1,30 @@
+// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --set solvers.td3.side_widen never


Many of the added tests disable side effect widening completely, is it really necessary for them to pass and this feature to work? Not widening side effects poses a big problem for termination of the solver.

sim642 · 2022-06-21T07:47:51Z

tests/regression/57-setjmp/05-counting-return-one-method.c

@@ -0,0 +1,17 @@
+// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --enable exp.earlyglobs


What is this earlyglobs for?

sim642 · 2022-06-21T07:52:00Z

tests/regression/57-setjmp/13-counting-local.c

@@ -0,0 +1,25 @@
+// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --set solvers.td3.side_widen never --disable exp.volatiles_are_top


There is another test named "counting-local" as 57/04, so it would be good to change the name of this test based on how it differs from the other one.

sim642 · 2022-06-21T07:52:58Z

tests/regression/57-setjmp/14-counting-return-one-method.c

@@ -0,0 +1,17 @@
+// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --set solvers.td3.side_widen never


Same here, there's another test with the exact same name.

sim642 · 2022-06-21T08:04:35Z

src/analyses/base.ml

+    | Q.Longjmp exp ->
+      HM.singleton setjmp_return_token (eval_rv (ask_of_ctx ctx) ctx.global ctx.local exp)
+      |> HM.add setjmp_globals_token { ctx.local with cpa = CPA.filter (fun k v -> Basetype.Variables.is_global k) ctx.local.cpa }
+    | Q.VolatileLocals ->
+      HM.singleton setjmp_volatiles_token { ctx.local with cpa = CPA.filter (fun k v -> (not (Basetype.Variables.is_global k)) && Ciltools.is_volatile_tp k.vtype) ctx.local.cpa }


Here Basetype.Variables.is_global is used, which is a purely syntactic check. However, the local state of base analysis can also contain other things which are not syntactically global, but are effectively handled as globals, e.g. alloc variables and escaped variables. Shouldn't some of them also be passed around via these jumps?

sim642 · 2022-06-21T08:23:05Z

This is currently only used by base, and i am not exactly sure which other analyses also benefit from support for it.

Is it just a question of benefit or is it rather a question of soundness? For example, doesn't the apron analysis also need to handle this and do this extra join to be sound?

Probably a harder question, but what happens with locksets across long jumps (if the combination of threads and long jumps is defined at all)? Because if locksets persist over long jumps, then the mutex analysis would also need such extra join for soundness.

The setjump analysis is able to collect arbitrary data from all analyses by using a heterogeneous map

The use of hmap is certainly an interesting approach to allow your analysis to manipulate states of other analyses. Likely not worth redoing at this point, but another technique that's used in Goblint is to use a Spec lifting functor instead of an analysis. Such a functor would similarly have access to the underlying analysis domains and operate on them as whole. Such a lifter would then apply to the analysis as a whole as well, instead of per-analysis, although some extra operation might still be needed on the analyses to choose the correct data to join across the long jump.

jerhard · 2022-06-24T14:52:21Z

Is it just a question of benefit or is it rather a question of soundness?

This is indeed a soundness question. With the implementation as an analysis, one would have to add some default join action to all the analyses. Maybe this could be reduced to a one-line for all the analyses by including a module/functor with some default implementation? Otherwise, one would like to come up with a solution where changing the existing analyses would not be necessary. One possible solution would be to change the implementation to be a functor taking a Spec, like @sim642 suggests.

You can find an example implementation for such a functor e.g. with PathSensitive2.

jerhard · 2022-06-24T14:33:45Z

src/analyses/setjmpAnalysis.ml

+    | _ -> ctx.local
+
+  let enter ctx lval fn args =
+    collect_volatiles ctx;


Why do the volatiles have to be collected on every enter?

sim642 · 2022-06-24T15:14:41Z

One possible solution would be to change the implementation to be a functor taking a Spec, like @sim642 suggests.

You can find an example implementation for such a functor e.g. with PathSensitive2.

That requires rethinking how the data is passed though because events are not available for those functors. So it might require adding one or two additional transfer functions to Spec.
On the other hand, it allows using a Either for V and G to split globals into inner spec globals and long jump data globals that don't require hmap.

michael-schwarz · 2022-08-10T11:59:30Z

Thank you for your PR, it is has been very helpful to help us explore the design space and highlighted the intricacies of such an analysis. We have decided against merging this for now and will instead try to pursue a more generic approach.

Co-authored-by: Benjamin Bott<bottbenj@users.noreply.github.com>

bottbenj added 5 commits June 20, 2022 16:10

add heterogeneous map domain

15fa5a4

add setjmp/longjmp library functions

11aacbe

implement setjmp analysis

7a38228

use setjmp analysis in base

2dc2129

Add sejmp tests

fb7954e

sim642 self-requested a review June 21, 2022 06:36

sim642 added feature student-job labels Jun 21, 2022

michael-schwarz self-requested a review June 21, 2022 06:51

jerhard self-requested a review June 21, 2022 06:52

github-advanced-security bot found potential problems Jun 21, 2022

View reviewed changes

sim642 reviewed Jun 21, 2022

View reviewed changes

jerhard reviewed Jun 24, 2022

View reviewed changes

michael-schwarz closed this Aug 10, 2022

michael-schwarz mentioned this pull request Nov 4, 2022

Soundly handle setjmp and longjmp #887

Closed

michael-schwarz added a commit that referenced this pull request Jan 22, 2023

Add tests from #764

4aeb47e

Co-authored-by: Benjamin Bott<bottbenj@users.noreply.github.com>

		("longjmp", special [__ "env" []; __ "value" [r]] @@ fun env value -> Longjmp { env; value } );
		("_longjmp", special [__ "env" []; __ "value" [r]] @@ fun env value -> Longjmp { env; value } );


		let name () = "setjmp"

		module D = Queries.LS (* if the local method has performed a setjmp *)

		@@ -0,0 +1,30 @@
		// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --set solvers.td3.side_widen never

		@@ -0,0 +1,17 @@
		// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --enable exp.earlyglobs

		@@ -0,0 +1,25 @@
		// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --set solvers.td3.side_widen never --disable exp.volatiles_are_top

Conversation

bottbenj commented Jun 21, 2022

Uh oh!

michael-schwarz commented Jun 21, 2022

Uh oh!

Check warning

Check warning

sim642 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sim642 commented Jun 21, 2022

Uh oh!

jerhard commented Jun 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sim642 commented Jun 24, 2022

Uh oh!

michael-schwarz commented Aug 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jerhard commented Jun 24, 2022 •

edited

Loading