Skip to content

Fix "escaping" via recursion#310

Merged
sim642 merged 26 commits intomasterfrom
fixes/issue-252
Sep 6, 2021
Merged

Fix "escaping" via recursion#310
sim642 merged 26 commits intomasterfrom
fixes/issue-252

Conversation

@michael-schwarz
Copy link
Copy Markdown
Member

The issue we discussed in #252 is that pointers to the locals of some invocation of the same function further down the callstack may be passed to a function, which may in turn modify this value.

int rec(int i,int* ptr) {
int top;
int x = 17;
if(i == 0) {
rec(5,&x);
// Recursive call may have modified x
assert(x == 17); //UNKNOWN!
// If we analyse this with int contexts, there is no other x that is reachable, so this
// update is strong
x = 17;
assert(x == 17);
} else {
x = 31;
// ptr points to the outer x, it is unaffected by this assignment
// and should be 17
assert(*ptr == 31); //UNKNOWN!
if(top) {
ptr = &x;
}
// ptr may now point to both the inner and the outer x
*ptr = 12;
assert(*ptr == 12); //UNKNOWN!
assert(x == 12); //UNKNOWN!
// Another copy of x is reachable, so we are conservative and do a weak update
x = 31;
assert(x == 31); // UNKNOWN
}
return 0;
}
int main() {
int t;
rec(0,&t);
return 0;
}

In #252, @jerhard and I proposed a very elaborate solution, that would introduce a new variable x_outer for such a local variable when it is passed to the function where it is defined. Then, all references to the original x in the whole CPA would be patched to point to x_outer. This would then be undone in the combine.
If the situation where an x_outer is already reachable for a function call to its defining function, x_outer would be marked as multiple, and weak updates would be performed to it.

This PR takes a somewhat more simple approach: Enhance CPA with a set of variables that must be weakly updated, because an outer instance of this variable may also be reachable. This way, one avoids introducing x_outer and patching address sets, as the variable x now abstracts all instances of this variable in the whole call stack insofar as they might be reachable from the current invocation.

It works as follows:

  • In [[enter]]# for a fundec:

    • add those locals of fundec that are reachable via an argument to the set of values to weakly update in the callee

    analyzer/src/analyses/base.ml

    Lines 1872 to 1875 in 5e05144

    let reachable_other_copies = List.filter (fun v -> try (match Cilfacade.find_scope_fundec v with | Some fdec -> fdec.svar.vid = fundec.svar.vid |_ -> false) with _ -> false) reachable in
    let (nv, w) = CPA.add_list_fun reachable (fun v -> CPA.find v st.cpa) new_cpa in
    let w' = (CPA.WeakUpdates.of_list reachable_other_copies) in
    let new_cpa = (nv,CPA.WeakUpdates.join w w') in

  • When updating the value for a variable:

    • Check if it needs to be weakly updated, if yes, join new and previous value

      analyzer/src/analyses/base.ml

      Lines 1070 to 1079 in 5e05144

      let update_offset old_value =
      let new_value = VD.update_offset a old_value offs value lval_raw ((Var x), cil_offset) t in
      let (v,w) = st.cpa in
      if CPA.WeakUpdates.mem x w then
      VD.join old_value new_value
      else if invariant then
      (* without this, invariant for ambiguous pointer might worsen precision for each individual address to their join *)
      VD.meet old_value new_value
      else
      new_value
  • In [[return]]#

    • Instead of removing all locals from the CPA, only remove those that do not represent multiple concrete variables (i.e. that do not need to be weakly updated)

    let locals = List.filter (fun v -> not (CPA.WeakUpdates.mem v w)) (fundec.sformals @ fundec.slocals) in

  • In [[combine]]#

    • For those locals that are still in the return value of the callee: Update the local copy to their value
    • Proceed with the set of variables to be weakly updated from the caller

This is a bit less precise than the other solution that was proposed in #252, as it assumes that any writes to the local may also be writes via pointers to the outer (reachable) values.

// Another copy of x is reachable, so we are conservative and do a weak update
x = 31;
assert(x == 31); // UNKNOWN

On the other hand, I think it is simpler as it does not require complicated patching of the address sets.

The code obviously still needs cleanup, but I think we can already discuss if we want that solution, before investing time to make it pretty.
What do you think @sim642 @vogler @vesalvojdani @jerhard ?

@michael-schwarz
Copy link
Copy Markdown
Member Author

michael-schwarz commented Aug 6, 2021

It seems like the issue here is that this data structure sometimes also escapes to globals via the CPA for some of the more elaborate privatization schemes. So I'll reimplement it a level higher in the base domain.

@michael-schwarz michael-schwarz marked this pull request as ready for review August 8, 2021 13:35
| MayBeLess: exp * exp -> MayBool.t t (* may exp1 < exp2 ? *)
| HeapVar: VI.t t
| IsHeapVar: varinfo -> MayBool.t t (* TODO: is may or must? *)
| MayBeMultiple: varinfo -> MayBool.t t
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this query quite confusing. I see the only use for this right now is to allow vareq to ask base about its WeakUpdates set.

The issue of a single varinfo representing multiple concrete variables is highly specific to base (and I guess vareq) because they pass reachable variables into called functions, allowing for this situation to arise. However, this isn't fully generic in the sense that every analysis should be considering MayBeMultiple because naively one might also expect octApron to consider this information to make weak updates. But it doesn't have to (AFAIK) because it never passes local state into called functions in the same way as base does, so the confusion doesn't arise. Also, it somehow assumes that vareq passes reachable locals into called functions in exactly the same way as base, but is that really the case? So it seems to me like this notion is specific to each analysis wrt how much it passes down.

Moreover, if it were fully generic, I'd expect it to be possible to fix #316 using the same query by having mallocWrapper claim all of the varinfos that it has returned to represent multiple concrete values (until we implement more precise uniqueness for mallocs). But again, this query wouldn't really work for that: mallocWrapper may answer the query for a malloced varinfo with true, but that wouldn't be in base's WeakUpdates set, so base answers with false and the meet of MayBool yields false. So for that kind of general use, it should be MustBool instead for a negated query like MustBeUnique, so if it's either in base's WeakUpdates or created by mallocWrapper, it wouldn't be unique any more.

But that doesn't really solve the first problem, that what is considered to be non-unique by one analysis isn't necessary non-unique for another.

Copy link
Copy Markdown
Member Author

@michael-schwarz michael-schwarz Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is highly specific to base (and I guess vareq) because they pass reachable variables into called functions, allowing for this situation to arise

I don't really think it is. As soon as the base analysis treats something as Multiple all other analyses must also do so or not answer EvalInt queries. This is because the points-to information that is used from base does not distinguish between inner and outer instances

because naively one might also expect octApron to consider this information to make weak updates

EDIT: In principle it would have to, but it does not track variables that have their address taken, escaping this conundrum.

If it ever were enhanced to also track such variables:

// PARAM: --sets ana.activated[+] "var_eq" --sets ana.activated[+] "octApron"
int rec(int i,int* ptr) {
    int top;
    int x = 17;

    if(i == 0) {
        rec(5,&x);
        // Recursive call may have modified x
        assert(x == 17); //UNKNOWN!
    } else {
        x = 31;

        // ptr points to the outer x, it is unaffected by this assignment
        // and should be 17
        assert(*ptr == 31); //UNKNOWN!
    }
    return 0;
}

int main() {
    int t;
    rec(0,&t);

    return 0;
}

The question is just whether to fix all analyzes in this PR or make separate ones. The issue is that the octApron has x = 31 after the assignment, the address domain in base has ptr -> {&x} and asks for the value of x via the query system, to which octApron answers [31,31].

Copy link
Copy Markdown
Member Author

@michael-schwarz michael-schwarz Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after recompiling 🙃, it does seem to work with octApron, as it does not have x as a tracked local variable, because it does not track variables who have their address taken.
So the reason one does not need to take it into account in OctApron is precisely that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else must be wrong with octApron in this case because x has its address taken in rec(5,&x), so octApron shouldn't be tracking x at all.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your comment overlapped with mine, I have since edited it. It works precisely because it does not track variables that have their address taken.

Copy link
Copy Markdown
Member Author

@michael-schwarz michael-schwarz Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be a comment in octApron then, explaining that due to vaddrof it doesn't need to consider it at function calls etc.

I'll add a comment

Still, I think it would make sense to flip the query to make it possible for mallocWrapper to also indicate the alloc variables as multiple/non-unique.

Sure, I'll flip it!

Actually, would it even be possible to pull this WeakUpdates stuff into a separate analysis instead of baking it into base, if other analyses should be considering it regardless of what base is doing? It

That is very complicated, it requires the reachable may-point-to sets (from the params) at function call. And if we query the base analysis for that, it is not too different from just embedding it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this query again and how it would behave if mallocWrapper also answered it such that any varinfo that it created is not unique, but somehow it still doesn't make sense the way I thought it would.

MayBeMultiple (before)

Variable Base MallocWrapper Meet (and, MayBool)
Escaped true (in WeakUpdates) true (not created by mallocWrapper, top of MayBool) true (:heavy_check_mark:)
Malloced false (not in WeakUpdates) true (created by mallocWrapper) false (:x:)

MustBeUnique (now)

Variable Base MallocWrapper Meet (or, MustBool)
Escaped false (in WeakUpdates negated) false (not created by mallocWrapper, top of MustBool) false (:heavy_check_mark:)
Malloced true (not in WeakUpdates negated) false (created by mallocWrapper) true (:x:)

MustBeMultiple (should be?)

Variable Base MallocWrapper Meet (or, MustBool)
Escaped true (in WeakUpdates) false (not created by mallocWrapper, top of MustBool) true (:heavy_check_mark:)
Malloced false (not in WeakUpdates) true (created by mallocWrapper) true (:heavy_check_mark:)

It seems to me now like the query should be MustBeMultiple because otherwise one of the varinfo multiplicity/uniqueness analyses needs to somehow subsume the other: base's WeakUpdates should also automagically contain all malloced variables?

Copy link
Copy Markdown
Member Author

@michael-schwarz michael-schwarz Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

MustBeMultiple is also somehow not the right thing though. If something is in this WeakUpdates set, this just means that an outer copy may be reachable not that it must be.
So the name MustBeMultiple would be very confusing, since we need to weakly update all things that may be multiple, not just those that must be.

Regardless of this, I agree that MustBool is the correct lattice. I'll see how to make this change while not making it confusing. (Maybe some comment?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe calling it IsMultiple is the right thing to do? It doesn't demand anything about must-multiplicity, but rather just asks each analysis whether it has made the varinfo multiple. And then MustBool is just the right lattice to correctly combine the answers of different analyses.

It muddies the water though because at some point we wanted all the queries to clearly be one or the other by name.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the solution from the last comment now!

@michael-schwarz michael-schwarz changed the title Base for Discussions: Fix "escaping" via recursion Fix "escaping" via recursion Aug 20, 2021
@sim642 sim642 self-requested a review August 31, 2021 14:09
@sim642
Copy link
Copy Markdown
Member

sim642 commented Sep 1, 2021

Funny how GitHub claimed that there's a conflict with octApron.apron.ml that is so complicated it cannot even show the conflict in the web interface. On this branch you added a comment, on master the file was renamed to apronAnalysis.apron.ml, which I guess is the conflict. But somehow locally merging works automatically and your comment is properly merged into apronAnalysis.apron.ml just fine.

@michael-schwarz
Copy link
Copy Markdown
Member Author

michael-schwarz commented Sep 6, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants