Skip to content

Add setjmp Analysis#764

Closed
bottbenj wants to merge 5 commits intogoblint:masterfrom
bottbenj:nonlocal-jumps
Closed

Add setjmp Analysis#764
bottbenj wants to merge 5 commits intogoblint:masterfrom
bottbenj:nonlocal-jumps

Conversation

@bottbenj
Copy link
Copy Markdown

Adds an analysis to handle the setjmp and longjmp functions.

The setjmp analysis collects local state of the other analyses with the Longjmp and VolatileLocals query and then returns it when required with the Setjmp event. Due to the path insensitive nature of analysis globals it is not necessary to touch them.

The setjump analysis is able to collect arbitrary data from all analyses by using a heterogeneous map, but this also means it is not able to affect any specific state and can only perform domain operations on the whole map. This map domain is based on the hmap library, but included directly as it is pretty abandoned and it needed to add various functions to be able to implement a domain based on it.

This is currently only used by base, and i am not exactly sure which other analyses also benefit from support for it.

@sim642 sim642 self-requested a review June 21, 2022 06:36
@michael-schwarz michael-schwarz self-requested a review June 21, 2022 06:51
@michael-schwarz
Copy link
Copy Markdown
Member

Thank you for the PR, it is nice to see Goblint getting this feature, I think it sets us apart from a lot of the competition.

@jerhard jerhard self-requested a review June 21, 2022 06:52

let leq a b =
let result = leq_with_fct { equal = (fun (type a) k a b -> let module R = (val (fst (M.Key.info k) : a mt)) in R.leq a b) } a b in
if Messages.tracing && not (M.is_empty a && M.is_empty b) then Messages.tracel "hetmap" "%s\n%a\n%a\n%b\n" "leq" pretty a pretty b result;

Check warning

Code scanning / Semgrep

trace functions should only be called if tracing is enabled at compile time

trace functions should only be called if tracing is enabled at compile time

let logable str f a b =
let result = f a b in
if Messages.tracing && not (M.is_empty a && M.is_empty b) then Messages.tracel "hetmap" "%s\n%a\n%a\n%a\n" str pretty a pretty b pretty result;

Check warning

Code scanning / Semgrep

trace functions should only be called if tracing is enabled at compile time

trace functions should only be called if tracing is enabled at compile time
Copy link
Copy Markdown
Member

@sim642 sim642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the support of this weird C feature. Allows us to tick another soundiness box: http://soundiness.org/.

Comment on lines +79 to +81
let setjmp_return_token = HM.token (module VD) (VD.zero_init_value (TInt (IInt, []))) "base: return"
let setjmp_globals_token = HM.token (module D) (D.bot ()) "base: globals"
let setjmp_volatiles_token = HM.token (module D) (D.bot ()) "base: volatiles"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HM.token has hidden internal state, which makes it incompatible with our incremental analysis since lookups from the hmap are based on those internally given IDs, which must match in order for the lookups to find anything at all.
I suppose it could coincidentally work for now since these are the only tokens created in a fixed order, but even in that case a TODO comment would be useful here pointing out the hidden global state, in case we ever encounter issues with it.

Comment on lines +22 to +23
("longjmp", special [__ "env" []; __ "value" [r]] @@ fun env value -> Longjmp { env; value } );
("_longjmp", special [__ "env" []; __ "value" [r]] @@ fun env value -> Longjmp { env; value } );
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value/status argument of longjmp has type int, which is not a pointer. Therefore the r specification for them is spurious: function call arguments are always read, but the specification is about accesses through pointer arguments.


let name () = "setjmp"

module D = Queries.LS (* if the local method has performed a setjmp *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? "if the local method has performed a setjmp" describes the local state as a boolean, but it's actually a set of lvalues.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for keeping the set of jumpbuffers in the local state?

Comment on lines +45 to +49
(** explicit join with the old value is required or 0 can be lost from the possible setjmp return values. *)
let getg = (ctx.global var) in
let g = (HM.join data getg) in
if M.tracing then M.tracel "setjmp" "Longjmp: %a\n+ %a\n-> %a\n" HM.pretty data HM.pretty getg HM.pretty g;
ctx.sideg var g)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side effects are by definition always joined into the existing state of the variable within the solver, so this seems unnecessary. If that isn't happening as intended, maybe there's some problem in the implementation of HM as a lattice.

(* join the state from all the setjump buffers, env may refer to (generally just one global) *)
let vars = ctx.ask (Queries.MayPointTo env) in
if M.tracing then M.traceli "setjmp" "Setjmp vars: [%a]\n" D.pretty vars;
let data = List.fold HM.join (HM.top ()) (List.map ctx.global (D.elements vars |> List.map fst)) in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fold is a HM.join and the identity element for join is HM.bot, so there's something off here. Joining with top (as it happens here) always should yield top, which would make this entire fold useless.

@@ -0,0 +1,30 @@
// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --set solvers.td3.side_widen never
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the added tests disable side effect widening completely, is it really necessary for them to pass and this feature to work? Not widening side effects poses a big problem for termination of the solver.

@@ -0,0 +1,17 @@
// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --enable exp.earlyglobs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this earlyglobs for?

@@ -0,0 +1,25 @@
// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --set solvers.td3.side_widen never --disable exp.volatiles_are_top
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another test named "counting-local" as 57/04, so it would be good to change the name of this test based on how it differs from the other one.

@@ -0,0 +1,17 @@
// PARAM: --enable ana.int.interval --enable ana.int.enums --set "ana.activated[+]" setjmp --set solvers.td3.side_widen never
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, there's another test with the exact same name.

Comment on lines +1188 to +1192
| Q.Longjmp exp ->
HM.singleton setjmp_return_token (eval_rv (ask_of_ctx ctx) ctx.global ctx.local exp)
|> HM.add setjmp_globals_token { ctx.local with cpa = CPA.filter (fun k v -> Basetype.Variables.is_global k) ctx.local.cpa }
| Q.VolatileLocals ->
HM.singleton setjmp_volatiles_token { ctx.local with cpa = CPA.filter (fun k v -> (not (Basetype.Variables.is_global k)) && Ciltools.is_volatile_tp k.vtype) ctx.local.cpa }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here Basetype.Variables.is_global is used, which is a purely syntactic check. However, the local state of base analysis can also contain other things which are not syntactically global, but are effectively handled as globals, e.g. alloc variables and escaped variables. Shouldn't some of them also be passed around via these jumps?

@sim642
Copy link
Copy Markdown
Member

sim642 commented Jun 21, 2022

This is currently only used by base, and i am not exactly sure which other analyses also benefit from support for it.

Is it just a question of benefit or is it rather a question of soundness? For example, doesn't the apron analysis also need to handle this and do this extra join to be sound?

Probably a harder question, but what happens with locksets across long jumps (if the combination of threads and long jumps is defined at all)? Because if locksets persist over long jumps, then the mutex analysis would also need such extra join for soundness.

The setjump analysis is able to collect arbitrary data from all analyses by using a heterogeneous map

The use of hmap is certainly an interesting approach to allow your analysis to manipulate states of other analyses. Likely not worth redoing at this point, but another technique that's used in Goblint is to use a Spec lifting functor instead of an analysis. Such a functor would similarly have access to the underlying analysis domains and operate on them as whole. Such a lifter would then apply to the analysis as a whole as well, instead of per-analysis, although some extra operation might still be needed on the analyses to choose the correct data to join across the long jump.

@jerhard
Copy link
Copy Markdown
Member

jerhard commented Jun 24, 2022

Is it just a question of benefit or is it rather a question of soundness?

This is indeed a soundness question. With the implementation as an analysis, one would have to add some default join action to all the analyses. Maybe this could be reduced to a one-line for all the analyses by including a module/functor with some default implementation? Otherwise, one would like to come up with a solution where changing the existing analyses would not be necessary. One possible solution would be to change the implementation to be a functor taking a Spec, like @sim642 suggests.

You can find an example implementation for such a functor e.g. with PathSensitive2.

| _ -> ctx.local

let enter ctx lval fn args =
collect_volatiles ctx;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the volatiles have to be collected on every enter?

@sim642
Copy link
Copy Markdown
Member

sim642 commented Jun 24, 2022

One possible solution would be to change the implementation to be a functor taking a Spec, like @sim642 suggests.

You can find an example implementation for such a functor e.g. with PathSensitive2.

That requires rethinking how the data is passed though because events are not available for those functors. So it might require adding one or two additional transfer functions to Spec.
On the other hand, it allows using a Either for V and G to split globals into inner spec globals and long jump data globals that don't require hmap.

@michael-schwarz
Copy link
Copy Markdown
Member

Thank you for your PR, it is has been very helpful to help us explore the design space and highlighted the intricacies of such an analysis. We have decided against merging this for now and will instead try to pursue a more generic approach.

michael-schwarz added a commit that referenced this pull request Jan 22, 2023
Co-authored-by: Benjamin Bott<bottbenj@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants